Like most people, you will have used a search engine lately, like Google. But have you ever thought about how it manages to give you the most fitting results? How does it order the results so that the best are on top? Read on to find out!
Continue reading “Google’s Eigenvector, or How a Random Surfer Finds the Most Relevant Webpages”
Base Rate Fallacy – or why No One is justified to believe that Jesus rose
In this post we are talking about one of the most unintuitive results in statistics: the so called false positive paradox which is an example of the so called base rate fallacy. It describes a situation where a positive test result of a very sensitive medical test shows that you have the respective disease… yet you are most probably healthy!
Continue reading “Base Rate Fallacy – or why No One is justified to believe that Jesus rose”
Separating the Signal from the Noise: Robust Statistics for Pedestrians
One of the problems of navigating an autonomous car through a city is to extract robust signals in the face of all the noise that is present in the different sensors. Just taking something like an arithmetic mean of all the data points could possibly end in a catastrophe: if a part of a wall looks similar to the street and the algorithm calculates an average trajectory of the two this would end in leaving the road and possibly crashing into pedestrians. So we need some robust algorithm to get rid of the noise. The area of statistics that especially deals with such problems is called robust statistics and the methods used therein robust estimation.
Continue reading “Separating the Signal from the Noise: Robust Statistics for Pedestrians”
Symbolic Regression, Genetic Programming… or if Kepler had R
A few weeks ago we published a post about using the power of the evolutionary method for optimization (see Evolution works!). In this post we will go a step further, so read on…
Continue reading “Symbolic Regression, Genetic Programming… or if Kepler had R”
Inverse Statistics – and how to create Gain-Loss Asymmetry plots in R
Asset returns have certain statistical properties, also called stylized facts. Important ones are:
- Absence of autocorrelation: basically the direction of the return of one day doesn’t tell you anything useful about the direction of the next day.
- Fat tails: returns are not normal, i.e. there are many more extreme events than there would be if returns were normal.
- Volatility clustering: basically financial markets exhibit high-volatility and low-volatility regimes.
- Leverage effect: high-volatility regimes tend to coincide with falling prices and vice versa.
Continue reading “Inverse Statistics – and how to create Gain-Loss Asymmetry plots in R”
Learning Data Science: Predicting Income Brackets
As promised in the post Learning Data Science: Modelling Basics we will now go a step further and try to predict income brackets with real world data and different modelling approaches. We will learn a thing or two along the way, e.g. about the so-called Accuracy-Interpretability Trade-Off, so read on…
Continue reading “Learning Data Science: Predicting Income Brackets”
Learning R: The Collatz Conjecture
In this post, we will see that a little bit of simple R code can go a very long way! So let’s get started!
Continue reading “Learning R: The Collatz Conjecture”
Evolution works!

Hamlet: Do you see yonder cloud that’s almost in shape of a camel?
Polonius: By the mass, and ’tis like a camel, indeed.
Hamlet: Methinks it is like a weasel.
from Hamlet by William Shakespeare
Customers who bought…
One of the classic examples in data science (called data mining at the time) is the beer and diapers example: when a big supermarket chain started analyzing their sales data they encountered not only trivial patterns, like toothbrushes and toothpaste being bought together but also quite strange combinations like beer and diapers. Now, the trivial ones are reassuring that the method works but what about the more extravagant ones? Does it mean that young parents are alcoholics? Or that instead of breastfeeding they give their babies beer? Obviously, they had to get to the bottom of this.
Continue reading “Customers who bought…”
To understand Recursion you have to understand Recursion…
Sorting values is one of the bread and butter tasks in computer science: this post uses it as a use case to learn what recursion is all about. It starts with some nerd humour… and ends with some more, so read on!
Continue reading “To understand Recursion you have to understand Recursion…”