일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
- Rigid Body Dynamics
- Taylor expansion
- Electromagnetics
- Quantum electodynamics
- Theories of Education
- Quantum Mechanics
- bayesian
- General Relativity
- Pen Art
- Complex analysis
- Lagrangian Mechanics
- Classical Mechanics
- Noether's thm
- 프레이리
- RG flow
- spinor QED
- Dirichlet problem
- Quantum Field Theory
- induced emf
- Covariant derivatives
- Paulo Freire
- Hamiltonian Mechanics
- electromotive force
- Tensor Calculus
- Topological Tensor
- higher derivatives
- scalar QED
- Conformal Mapping
- pedagogy
- Orbital Dynamics
- Today
- Total
Physvillain
MLE and MAP 본문
In this post, I'm planning to cover the basic statistics required for ML, which is maximum likelihood estimator(MLE) & maximum a posteriori(MAP) estimator. The latter one is based on the Bayesian approach.
[Definition 1] If $X_1, \cdots, X_n$ are independent and identically distributed(call it i.i.d. for convenience) random variables with the pdf or pmf $f(x\vert\theta)$, then the function of $\theta$ defined by $L(\theta\vert\mathbf{x}):=f(\mathbf{x}\vert\theta)=\prod_{i=1}^{n} f\left(x_i \vert \theta \right)$, $\mathbf{x}=\left( x_1, \cdots, x_n \right)$ is called the likelihood function. Also, the maximum likelihood estimator of $\theta$ is defined by $\theta_{MLE}(\mathbf{x}):=\mathrm{argmax}_\theta L\left(\theta\vert\mathbf{x}\right)$, i.e., the value of $\theta$ that maximizes the likelihood function $L\left( \theta \vert \mathbf{x} \right)$.
[Definition 2] Let $X_1, \cdots, X_n$ be i.i.d. random variables with pdf of pmf $f(x\vert\theta)$. In the Bayesian statistics, the variation of $\theta$ can be described by a probability distribution $\pi(\theta)$ which is called prior. Then, with the sampling distribution $f\left( \mathbf{x}\vert\theta \right) = \prod_{i=1}^{n} f\left( x_i \vert \theta \right)$, we get the posterior distribution
$$\pi(\theta\vert\mathbf{x})=\frac{f\left(\mathbf{x}\vert\theta\right)\pi(\theta)}{\int f\left( \mathbf{x} \vert \theta \right) \pi(\theta) d\theta}$$
Also, the maximum a posteriori estimator of $\theta$ is defined by $\theta_{MAP} (\mathbf{x}) := \mathrm{argmax}_{\theta} \pi(\theta\vert\mathbf{x})$.
We often calcuate MLE and MAP as $\theta_{MLE}=\mathrm{argmax}_\theta \log\left( L(\theta\vert\mathbf{x}) \right)$ and $\theta_{MAP}(\mathbf{x})=\mathrm{argmax}_\theta \log \left( \pi \left( \theta\vert\mathbf{x} \right)\right)$. This holds since log is a monotonically increasing function. Also, for MAP, note $\int f(\mathbf{x}\vert\theta) \pi(\theta) d\theta$ is not a function of $\theta$, so it does not affect the calculation of $\mathrm{argmax}_\theta$. Here are some examples.
[Example 1] Let $X_1, \cdots, X_n$ be i.i.d. exponential random variables with pdf $f(x\vert \lambda)=\lambda e^{-\lambda x} I(\{x<0\})$. Then MLE of $\lambda$ would be
$$\begin{align*} L(\lambda\vert\mathbf{x}) & = \lambda^n \prod_{i=1}^{n} e^{-\lambda x_i} I(\{x_i >0\}) \\ l(\lambda\vert\mathbf{x}) & = \log \left( L(\lambda \vert \mathbf{x})\right) = n \log \lambda - \sum_{i=1}^{n} \left( \lambda x_i -\log \left( I(\{x_i > 0\}) \right) \right) \\ \frac{dl}{d\lambda} & = \frac{n}{\lambda} - \sum_{i=1}^{n} x_i \end{align*}$$
Since log is monotonically increasing, finding MLE of $\lambda$ is equivalent to finding $\lambda$ that maximizes $l(\lambda\vert\mathbf{x})$. Thus we can get $\lambda_{MLE} (\mathbf{x}) = \frac{1}{n} \sum x_i$.
[Example 2] Let $Y$ be a $\chi_1^2$ random variable and $X_1, \cdots, X_n$ be i.i.d. random variables with the pdf that is equal to the pdf of $\theta Y$ with $\theta>0$. First we need to know about the distribution of $\theta Y$. Since the pdf of $Y$ is
$$f_Y(y)=\frac{1}{\Gamma \left(\frac{1}{2}\right)\sqrt{2}} y^{-\frac{1}{2}} e^{-\frac{y}{2}} I\left(\{y>0\}\right)$$
using $X=\theta Y$, we get
$$ f_X(x\vert\theta)=\frac{1}{\Gamma\left(\frac{1}{2}\right)\sqrt{2}} \left(\frac{x}{\theta}\right)^{-\frac{1}{2}} \frac{e^{-\frac{x}{2\theta}}}{\theta} I\left(\{x>0\}\right)=\frac{1}{\Gamma\left(\frac{1}{2}\right)(2\theta)^{\frac{1}{2}}} x^{-\frac{1}{2}} e^{-\frac{x}{2\theta}} I\left( \{ x>0 \} \right)$$
First, we can find the MLE of $\theta$. Since the pdf of $X_i$ is equal to the pdf of $\theta Y$,
$$\begin{align*} L(\theta\vert\mathbf{x}) & =\prod_{i} f_{X_i}\left(x_i \vert \theta \right) = \left( \Gamma \left(\frac{1}{2}\right) (2\theta)^{\frac{1}{2}} \right)^{-n} \prod_i x_i^{-\frac{1}{2}} e^{-\frac{x_i}{2\theta}} I\left(\{x_i>0\}\right) \\ l(\theta\vert\mathbf{x}) & =\log \left( L(\theta\vert\mathbf{x})\right) = C - \frac{n}{2} \log\theta + \sum_i -\frac{x_i}{2\theta} \\ \frac{dl}{d\theta} &= -\frac{n}{2\theta} + \frac{1}{2\theta^2}\sum_i x_i \end{align*}$$
To maximize it, we easily get $\theta_{MLE}(\mathbf{x})=\frac{1}{n}\sum_i x_i$.
Now, suppose $\theta$ have a prior distribution $\pi(\theta)=\frac{1}{\theta^2} e^{-\frac{1}{\theta}} I\left(\{\theta>0\}\right)$. In this case (for any given prior generally), we can find not only the posterior distribution of $\theta$ also the MAP estimator of $\theta$. The posterior can be aquired trivially by the definition.
$$\begin{align*} \pi(\theta\vert\mathbf{x}) & =\frac{f(\mathbf{x}\vert\theta)\pi(\theta)}{\int f(\mathbf{x}\vert\theta) \pi(\theta) d\theta} = C'\theta^{-\frac{n}{2}} \left( \prod_{i} e^{-\frac{x_i}{2\theta}} \right) \frac{e^{-\frac{1}{\theta}}}{\theta^2} I\left( \{ \theta>0 \}\right) \\ &= C' \theta^{-\frac{n}{2}+2}e^{-((\sum_i x_i/2)+1)/\theta} I\left(\{ \theta>0 \}\right) \end{align*}$$
Note that $\int \pi(\theta\vert\mathbf{x}) d\theta=1$ and $\int_0^{\infty} x^{-\alpha-1}e^{-1/{\beta x}} dx = \Gamma(\alpha) \beta^\alpha$. Using these identities, we get
$$C'=\Gamma\left(\frac{n+3}{2}\right)^{-1} \left(\left(\sum_i \frac{x_i}{2}\right) +1 \right)^{\frac{n+3}{2}}$$
Therefore, the posterior distribution of $\theta$ is
$$\pi(\theta\vert\mathbf{x})=\Gamma\left(\frac{n+3}{2}\right)^{-1} \left(\left(\sum_i \frac{x_i}{2}\right) +1 \right)^{\frac{n+3}{2}} \theta^{-\frac{n+4}{2}} e^{-\frac{\left(\sum_i \frac{x_i}{2}\right)+1}{\theta}} I\left(\{ \theta > 0 \}\right)$$
And the MAP estimator of $\theta$ is
$$\begin{align*} \theta_{MAP}(\mathbf{x}) & =\mathrm{argmax}_\theta \log \left(\pi(\theta\vert\mathbf{x})\right) \\ & = \mathrm{argmax}_\theta \left[ -\frac{n+4}{2} \log\theta - \frac{1}{\theta}\left( \frac{1}{2}\sum_i x_i + 1 \right) \right] \end{align*}$$
So, to maximize this, we have to calculate $\frac{d\text{(inside the argmax)}}{d\theta}$. Then we easily get $\theta_{MAP}(\mathbf{x})=\frac{\sum_i x_i +2}{n+4}$.
'Machine Learning' 카테고리의 다른 글
Concrete Random Variable & Gumbel-Max Trick (0) | 2021.08.03 |
---|---|
A Solution to Overfit : Regularization (feat. Lagrange Multiplier) (3) | 2021.07.26 |