Regression Analysis

Regression Analysis Course Slides Notes

1. regression methods

1.1. basic hypothesis

  • Def: Gaussian Markov Condition

1.2. basic least square

  • Def: basic linear model

  • Algorithm: least square

1.2.1. expectation & var

  • Qua: => expectation & var of least square

  • Theorem: => for all the linear unbiased, least square has least var

    Note:

1.2.2. residual sum of squares ( RSS )

  • Def: RSS

    Note:

    • Qua: => expectation of residual vector

    • Qua; => expectation of RSS

    • Qua: => generalized RSS

    • Theorem: => important of RSS (independency is the most important when we do hypothesis test)

1.3. centralized least square

  • Def: centralized linear model

  • Algorithm: centralized least square

    Note:

1.3.1. expectation & var

  • Qua: => expectation

    Note: more about centralization

  • Def: regression coefficient

1.3.2. MSE

  • Def: see probability theory

    • Qua: => MSE of centralized least square

      Proof:

      Note: if eigvalue small, then MSE large!

## standardized least square

  • Def: standardized linear model

    Note: relationship bewteen standardized model and general model.

1.4. constraint least square

  • Def: contraint linear model

    Usage: same model but an additional constraint equation

  • Algorithm: constrainde least square

    Proof: prove that the min point does exists

1.5. generalized least square

  • Def: generalized linear model

    Usage: model where covariance matrix is not identical but orthogonal matrix

  • Algorithm: generalized least square

    • Qua: => expectations

      Note: apparently generalized model turn a random model to its standardized form, and it become the best via Gaussain-Markov condition

      Example: a special form of generalized model

1.6. incomplete-data least square

  • Def: eliminate some row(s) of the data and see the difference of parameter vector .

  • Def: cook stats,

    Usage: a metric to rank the influence ( when we eliminate certain row of data)

    • Theorem: => relationship between cook & student residual

      Usage: we don't have to actually compute cook every now and then, using the theorem we can reduce the complexity by a large scale.

      Note: intuition for cook.

1.7. ridge least square

  • Def: model is the basic model

  • Algorithm: algorithm is different when k $$ 0

    Note: why we need ridge regression

  • Def: regularized linear model

    Algorithm: least square for reguoarized linear model

    • Qua: => Var

    • Qua: => relationship with basic model

      Note:

1.7.1. MSE

  • Qua: => relationship of MSE with basic model

  • Qua: => MSE < basic model

    Usage: this is why we choose ridge regression

    Note: this is a very important .

1.7.2. optimal K

  • Algorithm: Hoerl-Kennard equation

  • Algorithm: ridge plot

1.8. PCA regression

  • Def: PCA linear model

    Usage: same as ridge regression, centralized!!

    Def: first principle

    • Qua: => relationship between Z and eigvalue

  • Intro: intuition for PAC regression: when eigvalue is small we eliminate it.

    Algorithm: PAC regression: similar to regularized model except that we eliminate some part of Z.

1.8.1. MSE

  • Theorem: we can also decrease MSE by PCA regression

    Note: there's condition to this theorem

1.9. imcomplete-feature least square

  • Def: model

  • Algorithm: least square (notice the difference here between 1.9, in 1.10 we assume we don't know which is the correct model, in 1.9 we assume we know which is the correct model.)

1.9.1. expectation & var

  • Theorem: => biased E and Var

    Proof: P103

1.9.2. MSE

  • Theorem: => MSE smaller

    Proof:

    Note: the condition(5.1.14) is not always correct:

1.9.3. prediction problem

  • Def: prediction problem

    • Qua: => using MSEP we yield this

      Proof: see P108

1.9.4. conclusion

  • Note: conclusion for above 3 sections

1.10. non-linear regression

2. regression analysis

2.1. cook distance

  • Def: introduced in 1.7

    Usage: strong influence/ outliers

2.2. VIF/ CI

  • Intro: in 1.3.2 we know that MSE is corresponding with eigvalue, now we show that what does eigvalue mean in linear model

    Def: multicollinearity

  • Def: CI

    Usage: tool to show how severe is multilinear

  • Def: VIF

    Usage: same \[ \frac{1}{1-R_j^2} \]

2.3. student resitual map

  • Def: student residual

    Usage: a standardized form of residual vector

  • Algorithm: student residual plot

    Usage: all hypothesis

    Note: 6 types of residual map

2.4. Box-cox transformation

  • Def: Box-cox transformation

    Usage: all conditions

  • Algorithm: mle for optimizing the optimal \(\lambda\)

    Note: basic tranformation to log maximization problem

    Note: oveall brief algorithm

3. regression hypothesis test

3.1. linear test

  • Def: basic linear test

    Note: basic idea to this test

  • Algorithm: the test

    Note: we can simplify the test since \(RSS_h\) are hard to compute : by using a 约减 model (putting AX=b into the model and make it unconstraint)

3.2. model test

  • Def: model test

    Usage: which is a special test of section 3.1, but we'll make it simpler

    Note:

  • Algorithm: the same thing as 3.1

    Note: the famout TSS = RSS +ESS equation

3.3. saliency test

  • Def: saliency test ( a special form of 3.1 but we'll give a simpler way)

  • Algorithm: saliency test

    Note:

3.4. outlier test

  • Def: outlier test

  • Theorem:

    Note:

  • Algorithm:

    Proof:

3.5. the prediction problem

3.5.1. point estimation

  • Def:

    • Qua: => unbiased

    • Qua: => markov

    • Qua: difference between

3.5.2. interval estimation

  • Def:

4. regression feature selection

## metrics for selection

  • Def: Rssq ( the \(q_{th}\) time of selection)

    • Theorem: => Rss q > Rss q+1

      Usage: which means the more feature we choose the more accuracy we'll get.

      Proof:

  • Def: \(Rms_q\)

    Note: the smaller RMSq, the better is the model

  • Def: MSEP

    • Def: CP

      • Qua: => no proof

        Note: we can plot to see if selection is optimal

  • Def: AIC( an application of MLE)

    Note: specifically in linear model

    Proof:

    Note: the smaller the better

## optimal selection

  • Def: the best features

  • Algorithm: Cp plot

4.1. step-wise selection

  • Algorithm: P149, basically do F test every step.

4.1.1. forward selection

4.1.2. backward selection

5. other features

Title:Regression Analysis

Author:Benson

PTime:2019/11/19 - 12:11

LUpdate:2020/04/03 - 21:04

Link:https://steinsgate9.github.io/2019/11/19/regression-analysis/

Protocal: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Please keep the original link and author.