Importance Sampling

Importance Sampling is not actually a method for sampling from a probability distribution as the name might suggest. In fact, it is a variant of Monte Carlo approximation so it’s actually an approximation method.

In Monte Carlo approximation we approximate the expected value of $f(x)$ :

$E(f(x)) \approx \frac{1}{n}\sum^n_{i=1}f(x_i)$ . Here $x_i$ is distributed according to the same distribution as X ( $X~P, x_i~P$ ). But this was kind of a big assumption that we can efficiently draw samples from the true distribution P of this random variable X. What happens if we can’t do that? That certainly can happen that we might not be able to draw those samples efficiently.

Let’s focus on the density p.

$E(f(x))=\int f(x)p(x)dx=\int f(x)\frac{p(x)}{q(x)}$ $\forall q(PDF) s.t. q(x)=0 \rightarrow p(x)=0$

We can view the formula like this

$E(f(x))=\int f(x)p(x)dx=\int(f(x)\frac{p(x)}{q(x)})q(x)dx$

$\frac{p(x)}{q(x)}$ is our new ‘f(x)’ function, and q(x) is the probability distribution. So we can approximate this by $\frac{1}{n}\sum^n_{i=1}f(x_i)\frac{p(x_i)}{q(x_i)}$ , and $x_i$ are drawn according to q before they are drawn according to p.

This is the importance sampling trick.

If we write the expectation like this: $E(f(x))\approx\frac{1}{n}\sum^n_{i=1}f(x_i)w(x)i)$ where $x_i~q, w(x)=\frac{p(x)}{q(x)}$ . w is called importance weight.