How can a probability distribution diverge?

Question

How can for instance the Gamma distribution diverge near zero (for an appropriate set of scale and shape parameters, say shape $=0.1$ and scale $=10$), and still have its area equal to one?

As I understand it, the area of a probability density distribution should always be equal to one. If you take the dirac delta distribution, which diverges at zero but is zero anywhere else, you have an area equal to one.

Somehow, if you would take the area of a diverging Gamma distribution, you could express it as the area of a dirac delta distribution, plus something more since it has non zero weight at $x\neq0$, so it would be bigger than one.

Can someone explain me where my reasoning goes wrong?

There are lots of distributions (like the normal distribution) that are defined on the entire real-line, are $> 0$ everywhere, yet have finite area. Take (the main quadrant of) that distribution and flip it about the line $x=y$. Now you have a distribution with the same area which diverges at $x=0$. — BlueRaja - Danny Pflughoeft, 15 hours ago
Look up "Zeno's Paradox" -- it might be interesting to you here — ssdecontrol, 10 hours ago

Stephan Kolassa · Answer 1 · 2016-04-05 12:57:34Z

The Dirac delta is really not overly helpful here (although it is interesting), because the Gamma distribution has a continuous density, whereas the Dirac is about as non-continuous as you can get.

You are right that the integral of a probability density must be one (I'll stick to densities defined on the positive axis only),

$$ \int_0^\infty f(x)\,dx =1.$$

In the Gamma case, the density $f(x)$ diverges as $x\to 0$, so we have what is called an improper integral. In such a case, the integral is defined as the limit as the integration boundaries approach the point where the integrand is not defined,

$$ \int_0^\infty f(x)\,dx := \lim_{a\to 0}\int_a^\infty f(x)\,dx,$$

as long as this limit exists.

(Incidentally, we use the same abuse of notation to give a meaning to the symbol "$\int^\infty$", which is defined as the limit of the integral $\int^b$ as $b\to\infty$, again as long as this limit exists. So in this particular case, we have two problematic points - $0$, where the integrand is not defined, and $\infty$, where we can't evaluate the integral directly. We need to work with limits in both cases.)

For the Gamma distribution specifically, we kind of side-step the problem. We first define the Gamma function as follows:

$$\Gamma(k) := \int_0^\infty y^{k-1}e^{-y}\,dy.$$

We next prove that this definition actually makes sense, in the sense of the different limits outlined above. For simplicity, we can here stick to $k>0$, although the definition can be extended to (many) complex values $k$ as well. This check is a standard application of calculus and a nice exercise.

Next, we substitute $x:=\theta y$ for $\theta>0$ and by the change of variables formula obtain

$$\Gamma(k) = \int_0^\infty \frac{x^{k-1}e^{-\frac{x}{\theta}}}{\theta^k}\,dx,$$

from which we get that

$$1 = \int_0^\infty \frac{x^{k-1}e^{-\frac{x}{\theta}}}{\Gamma(k)\theta^k}\,dx.$$

That is, the integrand integrates to one and is therefore a probability density. We call it the Gamma distribution with shape $k$ and scale $\theta$.

Now, I realize that I really passed the buck here. The meat of the argument lies in the fact that the Gamma function definition above does make sense. However, this is straightforward calculus, not statistics, so I only feel very slightly guilty in referring you to your favorite calculus textbook and the gamma-function tag at Math.SO, especially this question and this question.

Glen_b · Answer 2 · 2016-04-06 00:38:34Z

Consider a standard exponential density $f(x)=\exp(-x)\,,\:x>0$ and consider a plot of $y=f(x)$ vs $x$ (left panel in the diagram below).

Presumably you don't find it unfathomable that there's positive density for all $x>0$ yet the area is nonetheless $1$.

Now let's exchange $x$ and $y$ ... that is let $x=\exp(-y)$, or $y = -\ln(x)$, for $0<x\leq 1$. Now this is a valid density, which asymptotes to the $y$ axis (so it's unbounded as $x\to 0$), but its area is clearly identical to the exponential (i.e. the area under the curve must still be 1 - all we did was reflect the shape, and reflection is area-preserving).

Clearly, then, densities can be unbounded but have area 1.

Aksakal · Answer 3 · 2016-04-05 13:43:49Z

This is really a calculus question, rather than statistics. You're asking how a function that goes to infinity at some values of its argument can still have a finite area under the curve?

It's a valid question. For instance, if instead of Gamma function you took a hyperbole: $y=1/x$, for $x=[0,\infty)$ then the area under the curve doesn't converge, it's infinite.

So, it's quite miraculous that a weighted sum of very large or even infinite numbers some how converge to a finite number. The sum is weighted because if you look at the Riemann's integral definition, it could be a sum like this: $$\int_0^\infty 1/x dx=\lim_{n\rightarrow\infty} \sum_{i=0}^n \frac{\Delta x_i}{x_i}$$ So, depending on which points $x_i$ you pick, the weights $\Delta x_i$ could be small or large. When you get closer to 0, $1/x_i$ get larger, but so do $\Delta x_i$ get smaller. In this competition $1/x_i$ wins, and the integral doesn't converge.

For Gamma distribution it happens so that $\Delta x_i$ shrink faster than Gamma PDF grows, and the area ends up being finite. It's straight calculus to see how exactly it converges to 1.

David Z · Answer 4 · 2016-04-05 18:15:27Z

Somehow, if you would take the area of a diverging Gamma distribution, you could express it as the area of a dirac delta distribution, plus something more since it has non zero weight at $x \neq 0$, so it would be bigger than one.

That's where your reasoning goes wrong: you can't automatically express any function which is infinite at $x = 0$ as a delta distribution plus something more. After all, if you could do this with $\delta(x)$, who's to say you couldn't also do it with $2\delta(x)$? Or $10^{-10}\delta(x)$? Or any other coefficient? It's just as valid to say that those distributions are zero for $x\neq 0$ and infinite at $x = 0$; why not use the same reasoning with them?

Actually, distributions (in the mathematical sense of distribution theory) should be thought of more like functions of functions - you put in a function and get out a number. For the delta distribution specifically, if you put in the function $f$, you get out the number $f(0)$. Distributions are not normal number-to-number functions. They're more complicated, and more capable, than such "ordinary" functions.

Probability distributions can be thought of the same way, in fact. A probability distribution maps some function of your possible outcomes to the weighted average of the possible values of that function. For example, if the function of the outcomes is just the identity function, $f(x) = x$, the corresponding weighted average is $$\int P(x)\,x\ \mathrm{d}x = E[x]$$ Or if the function of the outcomes is the squared deviation, $f(x) = (x - E[x])^2$, the corresponding weighted "average" is the variance: $$\int P(x)\,(x - E[x])^2\ \mathrm{d}x = \sigma_x^2$$

It's natural to try representing the function-to-number mapping of a delta distribution the same way. Basically, is there some function $\delta(x)$ that allows you to represent the action of a delta distribution like this? $$f\to \int \delta(x)\, f(x)\ \mathrm{d}x$$ Well, you can easily establish that if there is such a function, it has to be equal to $0$ at every $x\neq 0$. But you can't get a value for $\delta(0)$ in this way. And that's to be expected, because who says you should be able to write the delta distribution in this way in the first place?

The point is that there's more to the delta distribution than just this: $$\begin{cases}0, & x\neq 0 \\ \infty, & x = 0\end{cases}$$ That "$\infty$" is misleading. It stands in for a whole extra set of information about the delta distribution that normal functions just can't represent. And that's why you can't meaningfully say that the gamma distribution is "more" than the delta distribution. Sure, at any $x > 0$, the value of the gamma distribution is more than the value of the delta distribution, but all the useful information about the delta distribution is locked up in that point at $x = 0$, and that information is not something you can put in an ordering.

Chris Rackauckas · Answer 5 · 2016-04-05 15:57:32Z

Look at the following example. Notice that for any finite $N$,

$$ \int_0^N \frac{1}{x} dx = \log(N)-\log(0) $$

but $\log(0)$ is undefined so the integral is $\infty$ in some sense (this has a limit in there, but ignore it). But

$$ \int_0^N \frac{1}{\sqrt{x}} dx = \sqrt{N} - \sqrt{0} = \sqrt{N} $$

In general, this is based on the idea that

$$ \int \frac{1}{x^p} dx = x^{1-p} $$

so if $1-p>0$ the fundamental theorem of calculus tells you the integral is finite. So the idea is that it diverges slow enough (where $p$ is the speed) that the area is still bounded.

This is similar to the convergence of series. Recall that by the p-test we have that

$$ \sum_0^\infty \frac{1}{x^p} $$

converges if and only if $p>1$. In this case we need $x^p \rightarrow \infty $ fast enough, where once again $p$ is the speed and $1$ is the turning point.

Why can this be an actual thing? Think about the Koch snowflake. In this example you keep on adding the the perimeter of the snowflake in such a way that the area is growing slowly. This is due to the fact that if you make an equilateral triangle with sides of size $\frac{1}{3}$, the perimeter is 1 while the area is $\frac{1}{12\sqrt{3}}\sim 0.05$. Since the area is so much smaller than the perimeter (it is the multiplication of two small numbers instead of the addition!) you can choose to add triangles in such a way that the perimeter goes to infinity while the area stays finite. To do so you have to choose a speed at which the triangles go to zero, and as you probably guessed by now, there is a speed where it switches from being too slow and giving infinite area to being fast enough to giving finite area.

In total, calculus tells us that not all singularities (that what these "go to infinity points" like zero are) are the same. There are huge differences based on the "local speed" of the singularity. $\Gamma$ simply has a singularity which is "slow enough" that the area if finite. If you want to learn more about the "why" singularities work like this, you can delve into a lot more detail in Complex Analysis and its study of the singularities of complex analytic functions (of which $\Gamma$ is).

asked	today
viewed	643 times
active	today

current community

your communities

more stack exchange communities

How can a probability distribution diverge?

5 Answers 5

Your Answer

Not the answer you're looking for? Browse other questions tagged probability distributions or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

How can a probability distribution diverge?

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged probability distributions or ask your own question.

Related

Hot Network Questions