Statistical intuition fails

Warning: this post contains some mathematics. However, non-technical readers may ignore the equations and focus on the concepts, which are far more important in any case.

I am ashamed to say that my statistical intuition has failed me. I was doing some brainteasers and came across this one here:

In a country in which people only want boys every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country?

My initial thought was there should be more girls than boys. But this is completely false. As long as all births are independent and a boy or a girl remains equally likely, the ratio of boys to girls in the whole population will be 1 (assuming the population is large). The choice the parents make of when to stop having babies has no influence on the overall distribution of boys or girls. This is actually obvious if you think about it, but you can also prove it mathematically as follows:

Let X1… Xn be the number of girls in each household in the country. These have a iid geometric distributions and the ratio of boys to girls is

By the law of large numbers this tends to
which is 1.

In real life, and in financial markets, our statistical intuition also fails us. One of the reasons I think I will never be able to be a trader (or at least not a good one) is that I will never trust any decision I make. It’s only through meticulous analysis, by slowing down, that you can overcome your biases and that is exactly the opposite of trading.



I recently finished my masters thesis and I would now like to do something that most people never do. I am going to critique my work. I believe that being self-critical is essential not only in research, but in life in general. In research it is necessary to further the truth. It is not enough that others are critical of you (though it is necessary). You must be critical of yourself – only then will you be willing to remedy your flaws, change your convictions and pursue truth and goodness rather than your own prejudiced agenda.

I write this post as much to myself as for any reader who may come across it. It is another public reminder of a lesson I fear I may forget in the future, and of which I may need to be reminded. Even if you are not interested in my thesis, some of the points of criticism here may be useful for your own writing. I found in writing this post that self-criticism is hard. It’s really hard to come up with anything but weak flaws in your own work. I reckon it will take time to cultivate a truly self-critical nature.

My thesis was about momentum. I looked at time-series and cross-sectional momentum and their relationship with volatility and cross-sectional dispersion, specifically considering volatility weighting as a means of improving momentum strategies. If this sounds like Greek, do not fear, for I will be explaining all of this in later posts. Today I just want to list my critiques, starting with the more serious ones.

Subjective conclusions drawn from mixed results
I often found when I ran some numbers that I got results that were not clear-cut for any any one conclusion. Then I had to be satisfied with making a qualified conclusion if I thought there was enough support for it in the data. “Enough support” is subjective and others may feel differently. It may be a wiser course of action not to draw any conclusion at all, but it may also be far too conservative.

A lack of focus
Good academic research tends to focus on one thing and examine it thoroughly. My thesis, I think, tried to look at somewhat too much, and as a result ended up being huge and with not one aspect being treated quite as it deserved.

Strict assumptions (that I violate)
In order to prove things easily you need to make assumptions. Often you end up assuming independence or normality where it is clearly not the case in the data. In my case I needed to make such assumptions to prove things about volatility weighting and in at least one case I am not even certain there is a non-trivial (that is to say, interesting) process that satisfies my assumptions. I have little choice but to violate my assumptions (there are for instance no volatility estimators that satisfy the assumptions I had to make).

Overly detailed results that are hard to interpret
I have lots and lots and lots of tables in my thesis.  They are big and make your eyes sore. Ideally one should find a way of presenting just the right numbers, without hiding ambiguity and evidence that doesn’t support your conclusions. It is hard to read text referring to specific numbers in very large tables and keep track of what is happening. Making a graph is even better as it gives an immediate impression (but you may still want to report the numbers so that people can check them). I had relatively few graphs as I could think of no good way to convert my tables into something visual. This is a weakness. You may think that academics should be able to read such dense material, that they should take the time and effort. It is, however, a simple fact that academics are human and that they do not. Even if they do,  they are less likely to get the right picture if the information is not presented in accessible manner.

Use of advanced techniques without necessarily having the appropriate understanding
I used what are called “Robust regressions” in my thesis in order to cope with the fact that financial data contains so many extreme values. I had never used robust regressions before and only briefly looked up what the robust regressions did, then used them. I did not take the time to get well acquainted with their theory (as this would have been quite a task, I think) and I simply used a standard weighting function with a standard parameter.  Most likely this is still better than simply using OLS regressions (which I think are absolutely a no-go in financial research, except as a baseline comparison), but it is still possible that the version of robust regressions I used were not the most appropriate (deciding what is appropriate is of course more an art than a science) and it would have been preferable to have had more training in using them.

I used linear models for the theoretical and empirical investigations. One thing that is clear from finance, though, is that nothing is linear and so results from linear models can be misleading. We have, however, I think, only poor substitutes and thus linearity is still common in academia. This is thus only a weak criticism on my part, but I would like to see a move away from linear models, if only we could find an accessible and preferably tractable alternative.

Little thought for practicalities
I did not consider that I was basing my results on markets that closed at different times (essentially I assumed they closed at the same time); I did not include transaction costs, commissions, taxes, etc. This is not unreasonable. To include all these things meticulously would detract from the main purpose of the study. But they are important and their inclusion could potentially change the nature of the relationships found (though this is unlikely).

If you read my thesis and you think there are other criticisms, then please let me know. Perhaps I'll include them in a further post.