In “The Art of Doing Science and Engineering,” Richard Hamming had the following beautiful passage:
It is well known the drunken sailor who staggers to the left or right with n independent random steps will, on the average, end up about √n steps from the origin. But if there is a pretty girl in one direction, then his steps will tend to go in that direction and he will go a distance proportional to n. In a lifetime of many, many independent choices, small and large, a career with a vision will get you a distance proportional to n, while no vision will get you only the distance √n. In a sense, the main difference between those who go far and those who do not is some people have a vision and the others do not and therefore can only react to the current events as they happen.
Only a brilliant mathematician could explain the importance of vision using a metaphor that invokes random walks and orders of magnitude. Richard Hamming, of course, was one. For his foundational contributions—which include self-correcting Hamming codes—Hamming received the Turing Award.
We’re still in January, that time of the year when we drink less alcohol, eat less meat, and make ambitious plans that we hope to achieve before the Earth finishes another revolution around the Sun. What better time to dive into the mathematics of vision, right?
So, let’s take a closer look at where those square roots and distances all come from. It’s fun, I promise.
The drunkard with no vision
Let’s start with the simplest possible random walk. You start at zero, and each period you move left or right by ε units. For analytical ease, let’s assume that ε is normally distributed1 with mean zero and variance 1:
Our goal is to calculate how far you’ll have come after n steps of this process.
Now, after n steps, your location is given by
Since all the ε’s are independent and have mean zero, we have that
If—like me—you’re not very careful, you may think you’re done. Variance is just distance squared, so you take the square root and that’s it, right?
Unfortunately, no, because
In fact, by Jensen’s inequality, the expected distance is strictly less than √n.
Here’s the correct approach. After n steps, your location is normally distributed with mean zero and variance n. The distance from the origin point is the absolute value of your location. Distance, therefore, follows the folded normal distribution.
We can use the formula for the expected value of the folded normal distribution to get the answer we need. That turns out to be
That’s less than √n, as it should be by Jensen’s inequality. Just as Richard Hamming promised, the distance is proportional to √n.
The drunkard finds his vision
Now, suppose you add some direction—as with the drunkard and the lady:
Here, ω is some number between 0 and 1. Think of ω as your “sense of direction.” If ω is 0, you wander without direction, and we get the previous model. If ω is 1, you have perfect direction, and with each step, you move by 1 unit.
After n steps, you find yourself at
Hence, your location after n steps is normally distributed with mean n ω and variance n (1 - ω)².
We now again use the formula for the expected value of a folded normal random variable to find that
Here, Φ(x) denotes the standard normal cdf.
The formula got a lot more complicated. However, looking at it closely, we see that for large n and ω strictly positive
Hence, the distance is proportional to n.2
Taking stock
That’s the math behind Hamming’s beautiful passage on vision. You get an √n improvement if you know where you’re going.
In fact, it’s even better than that. The drunkard may get √n steps away from the starting point on average. However, there’s no guarantee where he will end up; that’s totally up to chance. With vision, not only do you get further, but you also get to a place of your choosing.
So, here’s my wish to you: May your ω be as close to 1 as possible and ε’s don’t get you too far off your path.
In other words, have a great year.
Actually, the normal distribution is not necessary. We could work with a more general distribution, take a detour via the Central Limit Theorem, and still arrive at the same destination. However, while I may torture you with some math, I’m not going to torture you that much.
We could generalize this model to multiple dimensions and still get nice formulas. This requires recognizing that, in the multidimensional case, distance is proportional to a non-central chi-distributed random variable. Left as an exercise for the adventurous.
great - this is exactly what i need for the next management meeting ;)