In regression and multivariate statistics, the notation ( S_xx ) comes from the idea of sums of squares and cross-products.
This notation system (often attributed to the “corrected sums of squares” approach) is standard in regression textbooks. The “S” stands for “Sum” (or sometimes “Corrected Sum”), and the subscript indicates which variables are involved.
Thus, Sxx is the most basic building block: the corrected sum of squares for a single variable. Sxx Variance Formula
import numpy as np
x = np.array([2, 4, 6, 8, 10])
Sxx = np.sum((x - np.mean(x))**2)
# Or: Sxx = np.sum(x**2) - (np.sum(x)**2)/len(x)
print(Sxx) # 40.0
Let’s start with a dataset: ( x_1, x_2, x_3, ..., x_n ).
The mean of these values is: [ \barx = \frac1n \sum_i=1^n x_i ] In regression and multivariate statistics, the notation (
The Sxx (often written as ( S_xx ) or ( SS_xx )) is defined as:
[ \boxedS_xx = \sum_i=1^n (x_i - \barx)^2 ] This notation system (often attributed to the “corrected
This formula takes each observation, subtracts the mean (giving the deviation), squares it, and sums across all observations. Because it uses the mean, Sxx is called the "corrected" sum of squares (as opposed to the raw sum of squares, ( \sum x_i^2 )).
The correlation ( r ) is: [ r = \fracS_xy\sqrtS_xx S_yy ] Here, ( S_yy = \sum (y_i - \bary)^2 ) is the same concept applied to variable y. Thus, Sxx and Syy normalize the covariance ( S_xy ).