Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Nuttall, Brandon C <bnuttall <at> uky.edu>
Subject: Re: StdErr Problem with Gary Strangman's linregress function
Newsgroups: gmane.comp.python.scientific.user
Date: Monday 11th January 2010 20:07:02 UTC (over 7 years ago)
OK, I think I've figured it out.

The numpy covariance function doesn't seem to return the actual sample
variances (it returns a population variance?). What this means is that for
the linregress() function in the stats.py source file, the quantity
sterrest is not calculated correctly and needs to be adjusted to the sample
variance. In addition, it includes the quantity ssxm, sum of squares for x
(?) and I can't find documentation for its inclusion.

# as implemented
# sterrest = np.sqrt((1-r*r)*ssym / ssxm / df)
# should be corrected to
sterrest = np.sqrt((1-r*r)*(ssym*n)/df)

Having made this correction, both the example provided and the example in
Crow, Davis, and Maxfield (Table 6.1, p. 154) provide the same value for
the standard error of the estimate and the value matches what is calculated
by Excel.

I don't know anything about SVN or submitting a correction, so someone will
have to help me out or do it for me.

Thanks.

Brandon

Brandon Nuttall, KRPG-1364
Kentucky Geological Survey
www.uky.edu/kgs<http://www.uky.edu/kgs>
[email protected] (KGS, Mo-We)
[email protected] (EEC, Th-Fr)
859-257-5500 ext 30544 (main)
859-323-0544 (direct)
859-684-7473 (cell)
859-257-1147 (FAX)

From: [email protected] [mailto:scipy-user-bounces[email protected]] On
Behalf Of [email protected]
Sent: Sunday, January 10, 2010 8:41 PM
To: SciPy Users List
Subject: Re: [SciPy-User] StdErr Problem with Gary Strangman's linregress
function


On Sun, Jan 10, 2010 at 8:21 PM, Bruce Southey
> wrote:

On Sun, Jan 10, 2010 at 3:35 PM,
> wrote:

Hello, Excel and scipy.stats.linregress are disagreeing on the standard
error of a regression.

I need to find the standard errors of a bunch of regressions, and prefer to
use pure Python than RPy. So I am going to scipy.stats.linregress, as
advised at:
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress


from scipy import stats
x = [5.05, 6.75, 3.21, 2.66]
y = [1.65, 26.5, -5.93, 7.96]
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
gradient
5.3935773611970186

intercept
-16.281127993087829

r_value
0.72443514211849758

r_value**2
0.52480627513624778

std_err
3.6290901222878866


The problem is that the std error calculation does not agree with what is
returned in Microsoft Excel's STEYX function (whereas all the other output
does). From Excel:

[cid:[email protected]]


Anybody knows what's going on? Any alternative way of getting the standard
error without going to R?



_______________________________________________
SciPy-User mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/scipy-user

The Excel help is rather cryptic by   :"Returns the standard error of the
predicted y-value for each x in the regression. The standard error is a
measure of the amount of error in the prediction of y for an individual x."
But clearly this is not the same as the standard error of the 'gradient'
(slope) returned by linregress. Without checking the formula, STEYX appears
returns the square root what most people call the mean square error (MSE).

Bruce

_______________________________________________
SciPy-User mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/scipy-user

>>> gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
>>> ((y-intercept-np.array(x)*gradient)**2).sum()/(4.-2.)
136.80611125682617
>>> np.sqrt(_)
11.6964144615701

I think this should be the estimate of the standard deviation of the
noise/error term.

Josef
 
CD: 4ms