Numerical Statistics Course Slides Notes
1. sample space & stats
1.1. sample space
Def: sample space
Qua: => duality of sample space
Def: iid
Def: parameter space
Def: distribution group
1.2. important stats
Def: statistic
1.2.1. single
1.2.1.1. sample mean
Def: sample mean
Qua: => mean & sum \[ \sum_{n=1}^{n}(X_i-\bar{X}) = 0 \]
Qua: => mean is the best for real mean
1.2.1.2. sample variance
Def: sample variance
Qua: => quatratic
1.2.1.3. sample covariance
Def: sample covariance
1.2.1.4. sample moment
Def: sample moment
1.2.1.5. U-stats
Def: U-stats (2-2)
Note:
Example: (2-2)
Example: (2-3)
Example: novel u stats (2-3)
Def: two-sample U-stats (2-3)
Example:
Qua: => variance of U-stats (2-3)
Proof:
(2-4)
Qua: => Var upper bound (2-4)
Qua: => asympototc normality (2-4)
1.2.1.6. M-estimator and Z estimator
- Def: ULLN (2-9)
1.2.1.6.1. M-stats
Def: M-stats (2-9)
Note: Mstats and Zstats are not the same : use U() to prove, not differentiable => z do not exist on this point; but m is the largest on this point, so M & Z not the same.
Theorm: => consistency of M stats
Usage: if M is uniformly consistent and have a well-separated maximum point, then we have sequence of \(\theta_n\) converge to \(\theta_0\)
Note: figure of the consitions
Proof:
1.2.1.6.2. Z-stats
Def: Z-stats (2-9)
1.2.1.7. order statistics
Def: order statistics
Qua: => distribution
Proof:
Note: when uniform(0, 1)
Qua: => joint distribution
Proof:
Note: when uniform(0, 1):
Def: sample median
Def: extremum of sample
Def: sample p-fractile
Def: sample range
Qua: distribution
Proof: using transformation trick
Note: when uniform (0, 1):
Proof:
1.2.1.8. sample coefficient of variation
Def: sample coefficient of variation
1.2.1.9. sample skeness
Def: sample skeness
1.2.1.10. sample kurtosis
Def: sample kurtosis
1.3. sufficient stats
Intro:
Def: suff stats
Theorm: element break, sufficient and necess condition
Qua: operation
1.4. complete stats
Def: complete stats
Qua: exp family & comp stats =>
Qua: => indepence & comp stats
Qua: operation
2. useful distributions
2.1. exp
Qua: => x2
2.2. gaussian
2.2.1. distribution
Qua: => X2
Qua: operation
2.2.2. stats & estimation
2.2.2.1. stats
Qua: mean & variance => independent
Proof: see book
Qua: Mean/Variance => distri
Qua: Mean-Mean/Variance => distri
Qua: Variance/ Variance => distri
2.2.2.2. estimation
2.3. X2
Def: central X2 distribution
Qua: => pdf
Proof: transformation, see book
Note:
Qua: => special & operation
Def: non-central X2
Qua: => pdf
Proof:
Qua: => distri & operation
2.4. gamma
Def: gamma distribution
Qua: gamma => X2
2.5. t
Def: central t distribution
Qua: pdf
Proof:
Note:
Qua: => E & Cauchy
Def: non-central t distribution
Qua: => pdf
Qua: => E, D
2.6. F
Def: F stats
Qua: => pdf
Proof:
Note:
Qua: => special & operation
Proof:
Def: non-central F
Qua: => pdf
Qua: => special & X2
2.7. exponential family
Def: exponential family
(Gaussian, +-Bio, Posson, Exp, Gamma)
Qua: => all distribution have same support set
Def: natural form & natrual space
Qua: => under natural form , natural space is Convex Set
Qua: => analytical stuff
3. parameter estimation (usage of stats)
Def: parameter estimation
3.1. point estimation
Def: point estimation
3.1.1. quality of estimation
Def: unbiased estimation
Def: efficiency
Def: consistency
(book)
(2-?)
Def: consisten asymptotic normal estimation
Note: both consistent & Gaussian
unbiased consistent gaussian(CAN) f_operation sufficient complete moment cond cond cond yes no no mle no cond cond yes yes umvue yes
3.1.2. moment estimation
Def: moment estimation
Usage: estimate \(\theta\) => turn \(\theta\) to \(f(moment, moment)\) => turn to \(f( \hat{moment}, \hat{moment_est})\)
Qua: => normally biased, sometimes not biased
Qua: => strong consistency
Qua: => CAN
Usage: normal situation
Qua => CAN
Usage: when \(\theta\) can be expressed with 1/2 central moment
3.1.3. mle estimation
Def: likelihood function
Note: this is the same as pdf, only in pdf x is variable, in likelihood \(\theta\) is variable.
3.1.3.1. parameter
Def: MLE (parameter), usage of M-estimator
Usage: given x & likelihood function, seek for \(\theta\) to make likelihood function the largest .
Theorem: conditions to make MLE solvable =>
Corollary: when distribution
Corollary: when distribution is exp family =>
Proof:
Qua: => sufficient stats
Qua => CAN
3.1.3.2. non-parameter
Def: MLE (非参数)(2-9)
Note: 因为是非参,所以需要求积分。(2-9)
Note:小于等于0因为这是KL散度的形式p0其实和p一回事。
Qua: p0 and MLE(pn)'s KL distance (2-9)
3.1.4. umvue
Def: estimatable function
Def: min MSE
Def: min Var & unbiased (UMVUE)
Usage: sometimes we can not find min MSE since the realm is too large, we can make it smaller by constrain it in unbiased family, and min -Var + unbiased = min MSE.
Lemma: => smaller Var
Usage: the lemma give us a hint to make Var smaller from an biased estimate by doing a conditional expectation on a sufficient stat.
Note:
3.1.4.1. 0-unbiased estimate
Theorem: Cov, E =>
Usage: sufficient condition for UMVUE
Note: we can't use this to construct a UMVUE, but we can 验证 if it is.
3.1.4.2. sufficient & complete estimate
Theorem: Lehmann - Scheffe, suff & complete =>
Corollary: exp family =>
#### CR inequality estimate
Intro: what is and why we need CR: it is a tool to determine if a stat is UMVUE.
Note: cons of CR
3.1.4.2.1. singular parameter
Def: CR regular family
Def: CR inequality
Note: CR can be viewed as a tool for 验证 ·UMVUE.
Theorem: cases of exp family =>
Note:
Theorem: despite exp family, when will C-R reach equation =>
Note:
Def: fisher information function ( for a distribution )
Note: the larger \(I(\theta)\), the easier is X to estimate its parameter, the more information the model itsself provides.
Note: fisher function in the law of large number.
3.1.4.2.2. multi parameter
P120
3.2. interval estimation
Def: interval estimation
Usage: the range of possible $ $
3.2.1. quality of invertal estimation
Def: confidence coefficient
Note: the larger the better
Def: length (precision)
Note:
Def: condidence interval
Def : lower confidence limit
Note:
Theorem: relationship with interval
Def: >1 dimension interval, confidence region
3.2.2. Neyman estimation
Algorithm: from point estimation to interval estimation
Note:
3.2.2.0.1. small sample method
Example: Gaussian
see slides P129
Example: exp
Example: uniform
3.2.2.0.2. big data method
Using big data distribution estimation to get a Neyman interval estimation.
Example: Caughy
Example: bionomial distribution
P142
Example: Possion distribution
P143
Theorem: general methods
Usage : using MLE & information function to approximate distribution
Theorem : no parameter case, when we can't use parameter to construct MLE .
3.2.3. hypothesis
3.2.4. Fisher estimation
P143
3.2.5. Tolerance estimation
P15
###Bayes estimation
4. hypothesis test
4.1. parameter hypothesis
Def: parameter hypothesis
Def: null hypothesis & alternative hypothesis
Note:
Def: reject region & accept region
Def: 检验函数
Usage: when we decline H0
Def: randomized test
Def: critical value
4.1.1. two types of error
Def: two types of error
Def: power function
Note: when \(\theta\) is fixed, the possibility we decline H0
Note: using power function to indicate two types of error.
Note: figure of power function
4.1.2. Neyman-Pearson protocal
Def: Neyman-Pearson protocal
Note: first consider first type of error then second, we set H0 as solid hypothesis, we don't consider it wrong unless necessary.
Def: level of hypothesis
Note: how to set the level
4.1.3. general test
Def: general methods
Example: Gaussian
P170
4.1.4. uniformly most powerful test(UMPT)
What is the best way to do test?
Def: UMPT
Theorem: NP theorem existence of UMPT =>
Note:
Note:
Intro:
Theorem: for the above special hypothesis, we have UMPT =>
Note:
Intro: a reversed version also exists
Theorem: for the above special hypothesis, we have UMPT =>
Note:
4.1.5. likelihood ratio test
Def: likelihood ratio test
Algorithm:
Theorem: => distribution estimation
4.1.6. sequential probability ratio test
Def: SPRT
4.2. non-parameter hypothesis
4.2.1. sign test
P234
4.2.2. signed rank test
P238
Def: rank
5. Bayes method
Def: prior distribution
Def: posterior distribution
5.1. parameter estimate
5.1.1. point estimate
Def: bayes point estimate
Def: P-MSE
Usage: measurement of bayes point estimate
5.1.2. interval estimate
Def: bayers credible estimate
Note: difference between traditional
5.2. hypothesis test
Def: general methods
5.3. bayes decision theory
Def: decision problem
Def: decision rule
5.3.1. risk functions
Risk functions are basically different ways to get Expectation of loss function.
Def: risk function
Def: optimal decision rule
Note:
Def: bayes risk function
Usage: since the general risk function do not always have optimal decision rule, we introduce bayes risk function
to conquer this issue .
Def: bayes optimal decision rule
Def: posterior risk
Note: relationship between posterior risk and bayes risk
Theorem: posterior risk & bayes risk yield the same thing .
5.3.2. loss functions
5.3.2.1. L2 loss
Theorem:
5.3.2.2. weighed L2 loss
Theorem:
5.3.2.3. L1 loss
Theorem:
6. sample space & stats
6.1. important stats
6.1.1. sample mean
Def: sample mean
6.1.2. sample A
Def:
6.1.3. sample correlation
Def: sample correlation
6.1.4. sample coefficient
Def:
7. useful distributions
7.1. gaussian
Def: multi-Gaussian
Qua: single =>
7.1.1. distribution
Def: distribution
Qua: operation
Qua: operation
7.1.1.1. transformation
Qua: 1
Qua: 2
Qua:
Qua:
Qua:
Qua:
Qua:
Qua:
Qua:
7.1.2. special function
Def: expectation
Def: charc
7.1.3. independence
Theorem: of a vector
Corollary:
7.1.4. conditional
Def: conditional pdf
Corollary:
7.1.5. stats & estimations
7.1.5.1. stats
7.1.5.1.1. sample mean & variation
Theorem :
7.1.5.1.2. sample
7.1.5.2. estimations
Qua: =>
Theorem:
7.2. Wishart
Def:
7.2.1. distribution
7.2.1.1. tranformation
Qua:
Qua:
Qua:
Qua:
Qua:
Qua:
Qua:
7.2.2. special function
Def:
7.3. \(T^2\)
Def:
7.3.1. distribution
7.3.1.1. transformation
Qua:
Qua:
Qua:
Qua:
Qua:
7.4. Wilks
Def:
7.4.1. distribution
7.4.1.1. tranformation
Qua:
Qua:
Qua:
Qua:
Qua:
Qua:
Qua:
8. parameter estimation
9. hypothesis test
P67