Regression Analysis Course Slides Notes
1. regression methods
1.1. basic hypothesis
Def: Gaussian Markov Condition

1.2. basic least square
Def: basic linear model

Algorithm: least square

1.2.1. expectation & var
Qua: => expectation & var of least square

Theorem: => for all the linear unbiased, least square has least var


Note:

1.2.2. residual sum of squares ( RSS )
Def: RSS


Note:

Qua: => expectation of residual vector

Qua; => expectation of RSS

Qua: => generalized RSS

Theorem: => important of RSS (independency is the most important when we do hypothesis test)

1.3. centralized least square
Def: centralized linear model



Algorithm: centralized least square

Note:

1.3.1. expectation & var
Qua: => expectation

Note: more about centralization

Def: regression coefficient

1.3.2. MSE
Def: see probability theory
Qua: => MSE of centralized least square

Proof:

Note: if eigvalue small, then MSE large!

## standardized least square
Def: standardized linear model

Note: relationship bewteen standardized model and general model.


1.4. constraint least square
Def: contraint linear model
Usage: same model but an additional constraint equation

Algorithm: constrainde least square



Proof: prove that the min point does exists

1.5. generalized least square
Def: generalized linear model
Usage: model where covariance matrix is not identical but orthogonal matrix

Algorithm: generalized least square

Qua: => expectations

Note: apparently generalized model turn a random model to its standardized form, and it become the best via Gaussain-Markov condition

Example: a special form of generalized model


1.6. incomplete-data least square
Def: eliminate some row(s) of the data and see the difference of parameter vector .


Def: cook stats,
Usage: a metric to rank the influence ( when we eliminate certain row of data)

Theorem: => relationship between cook & student residual
Usage: we don't have to actually compute cook every now and then, using the theorem we can reduce the complexity by a large scale.

Note: intuition for cook.


1.7. ridge least square
Def: model is the basic model
Algorithm: algorithm is different when k $$ 0

Note: why we need ridge regression

Def: regularized linear model

Algorithm: least square for reguoarized linear model


Qua: => Var

Qua: => relationship with basic model

Note:

1.7.1. MSE
Qua: => relationship of MSE with basic model

Qua: => MSE < basic model
Usage: this is why we choose ridge regression

Note: this is a very important .

1.7.2. optimal K
Algorithm: Hoerl-Kennard equation

Algorithm: ridge plot

1.8. PCA regression
Def: PCA linear model
Usage: same as ridge regression, centralized!!

Def: first principle

Qua: => relationship between Z and eigvalue

Intro: intuition for PAC regression: when eigvalue is small we eliminate it.

Algorithm: PAC regression: similar to regularized model except that we eliminate some part of Z.


1.8.1. MSE
Theorem: we can also decrease MSE by PCA regression

Note: there's condition to this theorem

1.9. imcomplete-feature least square
Def: model

Algorithm: least square (notice the difference here between 1.9, in 1.10 we assume we don't know which is the correct model, in 1.9 we assume we know which is the correct model.)

1.9.1. expectation & var
Theorem: => biased E and Var

Proof: P103
1.9.2. MSE
Theorem: => MSE smaller

Proof:

Note: the condition(5.1.14) is not always correct:


1.9.3. prediction problem
Def: prediction problem

Qua: => using MSEP we yield this

Proof: see P108
1.9.4. conclusion
Note: conclusion for above 3 sections

1.10. non-linear regression
2. regression analysis
2.1. cook distance
Def: introduced in 1.7
Usage: strong influence/ outliers
2.2. VIF/ CI
Intro: in 1.3.2 we know that MSE is corresponding with eigvalue, now we show that what does eigvalue mean in linear model
Def: multicollinearity


Def: CI
Usage: tool to show how severe is multilinear

Def: VIF
Usage: same \[ \frac{1}{1-R_j^2} \]
2.3. student resitual map
Def: student residual
Usage: a standardized form of residual vector

Algorithm: student residual plot
Usage: all hypothesis


Note: 6 types of residual map

2.4. Box-cox transformation
Def: Box-cox transformation
Usage: all conditions

Algorithm: mle for optimizing the optimal \(\lambda\)


Note: basic tranformation to log maximization problem

Note: oveall brief algorithm


3. regression hypothesis test
3.1. linear test
Def: basic linear test

Note: basic idea to this test


Algorithm: the test

Note: we can simplify the test since \(RSS_h\) are hard to compute : by using a 约减 model (putting AX=b into the model and make it unconstraint)


3.2. model test
Def: model test
Usage: which is a special test of section 3.1, but we'll make it simpler

Note:

Algorithm: the same thing as 3.1


Note: the famout TSS = RSS +ESS equation


3.3. saliency test
Def: saliency test ( a special form of 3.1 but we'll give a simpler way)

Algorithm: saliency test

Note:

3.4. outlier test
Def: outlier test

Theorem:

Note:

Algorithm:

Proof:



3.5. the prediction problem
3.5.1. point estimation
Def:

Qua: => unbiased

Qua: => markov

Qua: difference between


3.5.2. interval estimation
Def:



4. regression feature selection
## metrics for selection
Def: Rssq ( the \(q_{th}\) time of selection)

Theorem: => Rss q > Rss q+1
Usage: which means the more feature we choose the more accuracy we'll get.

Proof:

Def: \(Rms_q\)

Note: the smaller RMSq, the better is the model

Def: MSEP


Def: CP

Qua: => no proof



Note: we can plot to see if selection is optimal

Def: AIC( an application of MLE)

Note: specifically in linear model

Proof:

Note: the smaller the better

## optimal selection
Def: the best features

Algorithm: Cp plot
4.1. step-wise selection
- Algorithm: P149, basically do F test every step.