The Power Law

2 min readAug 3, 2023

A while back, I immersed myself in a paper about the Power Law, also known as Zipf’s Law or Pareto distribution — the well-known 80/20 principle. Interestingly enough, this article was recommended to me by TikTok!

The paper explores the properties of this probability distribution, offering a range of examples and discussing the mechanisms that may cause it to occur. Although the paper didn’t particularly enhance my intuition, I walked away with a few noteworthy points:

For many typical power law exponents (< 2), the distribution has an infinite expectation. This means that while any finite sample will obviously have a finite mean, it doesn’t converge to anything — the law of large numbers doesn’t hold here. And even when the expectation is finite, almost all examples have infinite variance. Practically, this implies you should avoid using metrics that follow a power law distribution. Although I had a prior idea about this fact, it’s worth emphasizing.
Once, I tried to ascertain whether an important metric of a service I worked on followed a power law distribution. I remembered that one needs to plot the density distribution on a logarithmic scale, and if it’s genuinely a power law, a straight line would be visible. To draw the density, I used a histogram. I also had to fine-tune the bin sizes (perhaps geometrically too) for it to look just right. However, the SQL-like tool I was using at the time incorrectly constructed the histograms for densities (not normalized for bin size). In the article, the authors suggested a simpler approach: transition from the distribution density (or PMF for a discrete case) to its cumulative distribution function (CDF), and plot 1 — CDF on a logarithmic scale. You’ll find your straight line there, without any need for histograms, bins, and so forth. Simpler and less noisy.
Furthermore, to estimate the exponent of the distribution, it’s tempting to find the slope of the straight line approximating the density or 1 — CDF in the logarithmic scale, intuitively using the least squares method due to its simplicity. However, it turns out that this approach leads to a biased estimation of the distribution exponent. The minimization of the sum of squares on a logarithmic scale inevitably causes some bias. Instead, one should use a formula from Wikipedia. And then for error estimation, use (α — 1)/sqrt(n).

The Power Law

Written by Michael Roizner

Responses (1)