Tomorrow, on the First of May, many countries celebrate the so called International Workers’ Day (or Labour Day): time to talk about the unequal distribution of wealth again, so read on!
Continue reading “The Rich didn’t earn their Wealth, they just got Lucky”
In this post we are talking about one of the most unintuitive results in statistics: the so called false positive paradox which is an example of the so called base rate fallacy. It describes a situation where a positive test result of a very sensitive medical test shows that you have the respective disease… yet you are most probably healthy!
Continue reading “Base Rate Fallacy – or why No One is justified to believe that Jesus rose”
One of the problems of navigating an autonomous car through a city is to extract robust signals in the face of all the noise that is present in the different sensors. Just taking something like an arithmetic mean of all the data points could possibly end in a catastrophe: if a part of a wall looks similar to the street and the algorithm calculates an average trajectory of the two this would end in leaving the road and possibly crashing into pedestrians. So we need some robust algorithm to get rid of the noise. The area of statistics that especially deals with such problems is called robust statistics and the methods used therein robust estimation.
Continue reading “Separating the Signal from the Noise: Robust Statistics for Pedestrians”
Asset returns have certain statistical properties, also called stylized facts. Important ones are:
- Absence of autocorrelation: basically the direction of the return of one day doesn’t tell you anything useful about the direction of the next day.
- Fat tails: returns are not normal, i.e. there are many more extreme events than there would be if returns were normal.
- Volatility clustering: basically financial markets exhibit high-volatility and low-volatility regimes.
- Leverage effect: high-volatility regimes tend to coincide with falling prices and vice versa.
Data Science is all about building good models, so let us start by building a very simple model: we want to predict monthly income from age (in a later post we will see that age is indeed a good predictor for income).
Continue reading “Learning Data Science: Modelling Basics”
There are literally hundreds of programming languages out there, e.g. the whole alphabet of one letter programming languages is taken. In the area of data science, there are two big contenders: R and Python. Now, why is this blog about R and not Python?
Continue reading “Why R for Data Science – and Not Python!”