The connection between weird odds and p-values
Understanding p-values at a formal level requires some statistical background, however we could get a high level explanation by looking at certain daily events.
Imagine that we let a glass cup fall to the ground so it breaks into a lot of small pieces that scatter over the kitchen floor. Because that has happened to us before we know that we obtain more or less a random distribution of pieces, both in sizes and locations. So we could drop lots of glass cups and we would get completely different random arrangements, that is what we expect. Now imagine that you drop your glass cup and it breaks, and all the little glass pieces scatter on the floor but this time forming a recognizable pattern let’s say your name. So there you have it our broken cup pieces are spelling “LEO” on the ground, would you say that this is something remarkable ? or just a coincidence that could happen because the pieces could potentially form *any* shape when they break?
To answer that we invented p-values. That is our way to quantify if an extraordinary occurrence could be explained by a more simple explanation, or if in fact the result we got is remarkable and statistically different from all of the other ones. So as you can see p-values are connected to random events (and also random variables) and probabilities.
If we have some process that produces results that we think (or we know) are random then we can describe it concisely by certain parameters, the most well known parameter of a random process is the mean, which is precisely that: the average value of all the random values produced by the process. The mean is a great way to consolidate a bunch of random values, but it is not enough, we also need another parameter that is called variance. Variance is a measure of dispersion, in other words how far apart are the random values from the mean we just computed. A process with a variance of 0 basically produces the same value over and over and over, the higher the variance of a process the more distant the values are from the mean (scattered all over the place).
There are plenty of examples of random processes, but we’ll stick with the one that we know the most: The Stock Market. I know that many people over the world want to argue that stock prices must obey some kind of secret law, and if we discover it then we could be immensely rich by anticipating the price moves and profiting from that advance knowledge. However for the rest of us mortals (like me in particular) we can safely consider stock prices like some kind of random process, we have a value today, then another one tomorrow, and there doesn’t seem to be any connection whatsoever between them, they are just random values for us.
So the stock market is a random value generator, we could stick to the SPX index and consider all the values that it takes every day as just random values that were provided by some device. Because a lot of people has studied this field for a long time, we describe this process not in terms of price but instead in terms of log returns. A log return is just the following:
Log return = log (today’s close / yesterday’s close)
If you compute the log returns in SPX for a long period of time (let’s say 50 years) you will notice that they indeed form a random distribution, now what kind of random distribution ? we don’t exactly know yet, but at certain short scales it looks like a normal (also known as Gaussian) distribution, and at long scales it doesn’t. It doesn’t matter if a random distribution is Gaussian or not, we can always describe it by its mean and variance. The mean of the log returns of the market conveniently give us the mean return for a certain period of time (in this case about 7% per year since 1950) and it is very simple to compute. The variance is also simple to compute however no one uses it as variance but instead they take its square root and get the standard deviation, but in a twist of the financial world the standard deviation of log returns is called volatility for some reason (and I have no idea why or who started it). So there you have it: The log returns of the SPX have a known mean, and also volatility (which is connected to the variance).
Back to a Glass Cup
So we are back to the glass cup example, so in this case instead of a broken glass, we get a weird return, perhaps a huge positive jump, or perhaps a huge drop (like a crash like event) and we want to know, is this log return something remarkable ? or it is just part of the usual returns we would get anyway from this random process. And here is when p-values are so helpful. First we need to define the period of time that we want to use. In this case we want to know if this return is normal when considering all the returns we have gotten since a certain date, then we compute the probability of seeing a return as extreme as the one we got using that distribution of random values. So that is what a p-value is, it is a probability, if we get a p-value of 0.009 (like the one we got today) then it means, that the returns we have seen today and yesterday in the market have a 0.9% probability of being part of the random process that generated the returns since Dec 19 (which is when the platou in SPX started). As you can see that is an extremely small probability, so we could say the opposite, we could say that the action we have seen yesterday and today, is in no way related to all the previous returns we have seen, therefore we are in a new market regime (new bullish trend perhaps).
The real meaning of p-values
In a formal mathematical world, a p-value is defined as the probability of making a Type-I error in our assessment. A type-I error is the error that we make when we reject an hypothesis when it is indeed true. So a p-value is the probability of being wrong in our rejection. In our case the hypothesis is: The market is doing the same that it was doing since Dec 19.
And now I say: I can reject that hypothesis and the odds that I’m wrong doing that rejection are 0.9% That is why you want to see low p-values so our odds of being wrong are very small. In social and economic sciences the threshold is 0.05 (or 5%) so as you can see we are way below that threshold already.