OK, I think I've figured it out.
The numpy covariance function doesn't seem to return the actual sample
variances (it returns a population variance?). What this means is that for
the linregress() function in the stats.py source file, the quantity
sterrest is not calculated correctly and needs to be adjusted to the sample
variance. In addition, it includes the quantity ssxm, sum of squares for x
(?) and I can't find documentation for its inclusion.
# as implemented
# sterrest = np.sqrt((1-r*r)*ssym / ssxm / df)
# should be corrected to
sterrest = np.sqrt((1-r*r)*(ssym*n)/df)
Having made this correction, both the example provided and the example in
Crow, Davis, and Maxfield (Table 6.1, p. 154) provide the same value for
the standard error of the estimate and the value matches what is calculated
I don't know anything about SVN or submitting a correction, so someone will
have to help me out or do it for me.
Brandon Nuttall, KRPG-1364
Kentucky Geological Survey
[email protected] (KGS, Mo-We)
[email protected] (EEC, Th-Fr)
859-257-5500 ext 30544 (main)
From: [email protected] [mailto:[email protected]] On
Behalf Of [email protected]
Sent: Sunday, January 10, 2010 8:41 PM
To: SciPy Users List
Subject: Re: [SciPy-User] StdErr Problem with Gary Strangman's linregress
On Sun, Jan 10, 2010 at 8:21 PM, Bruce Southey
On Sun, Jan 10, 2010 at 3:35 PM,
Hello, Excel and scipy.stats.linregress are disagreeing on the standard
error of a regression.
I need to find the standard errors of a bunch of regressions, and prefer to
use pure Python than RPy. So I am going to scipy.stats.linregress, as
from scipy import stats
x = [5.05, 6.75, 3.21, 2.66]
y = [1.65, 26.5, -5.93, 7.96]
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
The problem is that the std error calculation does not agree with what is
returned in Microsoft Excel's STEYX function (whereas all the other output
does). From Excel:
Anybody knows what's going on? Any alternative way of getting the standard
error without going to R?
SciPy-User mailing list
The Excel help is rather cryptic by :"Returns the standard error of the
predicted y-value for each x in the regression. The standard error is a
measure of the amount of error in the prediction of y for an individual x."
But clearly this is not the same as the standard error of the 'gradient'
(slope) returned by linregress. Without checking the formula, STEYX appears
returns the square root what most people call the mean square error (MSE).
SciPy-User mailing list
>>> gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
I think this should be the estimate of the standard deviation of the