The Small Data Rule: Infer the Big Picture from only Five Values!


Everybody is talking about big data but the real skill lies in the art of inferring useful information from only a handful of values!

If you want to learn how to determine the range of the typical value of a dataset (i.e. the median) with just five values and why this works, read on!
Continue reading “The Small Data Rule: Infer the Big Picture from only Five Values!”

Euro 2020: Will Switzerland kick out Spain too?


One of the big sensations of the UEFA Euro 2020 is that Switzerland kicked out world champion France. We take this as an opportunity to share with you a simple statistical model to predict football (soccer) results with R, so read on!
Continue reading “Euro 2020: Will Switzerland kick out Spain too?”

Financial X-Rays: Dissect any Price Series with a simple Payoff Diagram


Not many people understand the financial alchemy of modern financial investment vehicles, like hedge funds, that often use sophisticated trading strategies. But everybody understands the meaning of rising and falling markets. Why not simply translate one into the other?

If you want to get your hands on a simple R script that creates an easy-to-understand plot (a profit & loss profile or payoff diagram) out of any price series, read on!
Continue reading “Financial X-Rays: Dissect any Price Series with a simple Payoff Diagram”

Fame: Is Becoming a Star Written in the Stars?


I sometimes joke that as an Aries I don’t believe in zodiac signs. But could there still be some pattern, e.g. in the sense that people born in spring are more prone to success than those born during the winter months?

In this post, we will provide a definitive answer with one of the most fascinating datasets I have ever encountered, so read on!
Continue reading “Fame: Is Becoming a Star Written in the Stars?”

Learning Statistics: On Hot, Cool, and Large Numbers


My father-in-law used to write down the numbers drawn on the lottery to find patterns, especially whether some numbers were “due” because they hadn’t been drawn for a long time. He is not alone! And don’t they have a point? Shouldn’t the numbers balance after some time? Read on to find out!
Continue reading “Learning Statistics: On Hot, Cool, and Large Numbers”

The Solution to my Viral Coin Tossing Poll

Some time ago I conducted a poll on LinkedIn that quickly went viral. I asked which of three different coin tossing sequences were more likely and I received exactly 1,592 votes! Nearly 48,000 people viewed it and more than 80 comments are under the post (you need a LinkedIn account to fully see it here: LinkedIn Coin Tossing Poll).

In this post I will give the solution with some background explanation, so read on!
Continue reading “The Solution to my Viral Coin Tossing Poll”

Recidivism: Identifying the Most Important Predictors for Re-offending with OneR


In 2018 the renowned scientific journal science broke a story that researchers had re-engineered the commercial criminal risk assessment software COMPAS with a simple logistic regression (Science: The accuracy, fairness, and limits of predicting recidivism).

According to this article, COMPAS uses 137 features, the authors just used two. In this post, I will up the ante by showing you how to achieve similar results using just one simple rule based on only one feature which is found automatically in no-time by the OneR package, so read on!
Continue reading “Recidivism: Identifying the Most Important Predictors for Re-offending with OneR”

Parrondo’s Paradox in Finance: Combine two Losing Investments into a Winner


Wikipedia defines Parrondo’s paradox in game theory as

A combination of losing strategies becomes a winning strategy.

If you want to learn more about this fascinating topic and see an application in finance, read on!
Continue reading “Parrondo’s Paradox in Finance: Combine two Losing Investments into a Winner”

Pseudo-Randomness: Creating Fake Noise


In data science, we try to find, sometimes well-hidden, patterns (= signal) in often seemingly random data (= noise). Pseudo-Random Number Generators (PRNG) try to do the opposite: hiding a deterministic data generating process (= signal) by making it look like randomness (= noise). If you want to understand some basics behind the scenes of this fascinating topic, read on!
Continue reading “Pseudo-Randomness: Creating Fake Noise”