New Bundesliga Forecasting Tool: Can Underdog Herta Berlin beat Bayern Munich?


The Bundesliga is Germany’s primary football league. It is one of the most important football leagues in the world, broadcast on television in over 200 countries.

If you want to get your hands on a tool to forecast the result of any game (and perform some more statistical analyses), read on!
Continue reading “New Bundesliga Forecasting Tool: Can Underdog Herta Berlin beat Bayern Munich?”

Learning Path for “Data Science with R” – Part I


Over the course of the last two and a half years, I have written over one hundred posts for my blog “Learning Machines” on the topics of data science, i.e. statistics, artificial intelligence, machine learning, and deep learning.

I use many of those in my university classes and in this post, I will give you the first part of a learning path for the knowledge that has accumulated on this blog over the years to become a well-rounded data scientist, so read on!
Continue reading “Learning Path for “Data Science with R” – Part I”

The Small Data Rule: Infer the Big Picture from only Five Values!


Everybody is talking about big data but the real skill lies in the art of inferring useful information from only a handful of values!

If you want to learn how to determine the range of the typical value of a dataset (i.e. the median) with just five values and why this works, read on!
Continue reading “The Small Data Rule: Infer the Big Picture from only Five Values!”

Euler Coding Challenge: Build Maths’ Most Beautiful Formula in R


In this post, we will first give some intuition for and then demonstrate what is often called the most beautiful formula in mathematics, Euler’s identity, in R – first numerically with base R and then also symbolically, so read on!
Continue reading “Euler Coding Challenge: Build Maths’ Most Beautiful Formula in R”

Euro 2020: Will Switzerland kick out Spain too?


One of the big sensations of the UEFA Euro 2020 is that Switzerland kicked out world champion France. We take this as an opportunity to share with you a simple statistical model to predict football (soccer) results with R, so read on!
Continue reading “Euro 2020: Will Switzerland kick out Spain too?”

R Coding Challenge: 7 (+1) Ways to Solve a Simple Puzzle

This time we want to solve the following simple task with R: Take the numbers 1 to 100, square them, and add all the even numbers while subtracting the odd ones!

If you want to see how to do that in at least seven different ways in R, read on!
Continue reading “R Coding Challenge: 7 (+1) Ways to Solve a Simple Puzzle”

Learning Statistics: On Hot, Cool, and Large Numbers


My father-in-law used to write down the numbers drawn on the lottery to find patterns, especially whether some numbers were “due” because they hadn’t been drawn for a long time. He is not alone! And don’t they have a point? Shouldn’t the numbers balance after some time? Read on to find out!
Continue reading “Learning Statistics: On Hot, Cool, and Large Numbers”

The Solution to my Viral Coin Tossing Poll

Some time ago I conducted a poll on LinkedIn that quickly went viral. I asked which of three different coin tossing sequences were more likely and I received exactly 1,592 votes! Nearly 48,000 people viewed it and more than 80 comments are under the post (you need a LinkedIn account to fully see it here: LinkedIn Coin Tossing Poll).

In this post I will give the solution with some background explanation, so read on!
Continue reading “The Solution to my Viral Coin Tossing Poll”

Recidivism: Identifying the Most Important Predictors for Re-offending with OneR


In 2018 the renowned scientific journal science broke a story that researchers had re-engineered the commercial criminal risk assessment software COMPAS with a simple logistic regression (Science: The accuracy, fairness, and limits of predicting recidivism).

According to this article, COMPAS uses 137 features, the authors just used two. In this post, I will up the ante by showing you how to achieve similar results using just one simple rule based on only one feature which is found automatically in no-time by the OneR package, so read on!
Continue reading “Recidivism: Identifying the Most Important Predictors for Re-offending with OneR”