The *Kalman filter* is a very powerful algorithm to optimally include *uncertain information* from a *dynamically changing system* to come up with the best educated guess about the *current state of the system*. Applications include (car) navigation and stock forecasting. If you want to understand how a Kalman filter works and build a toy example in R, read on!

Continue reading “Kalman Filter as a Form of Bayesian Updating”

# Category: Statistics

Posts about statistics

## Time Series Analysis: Forecasting Sales Data with Autoregressive (AR) Models

Forecasting the future has always been one of man’s biggest desires and many approaches have been tried over the centuries. In this post we will look at a simple statistical method for *time series analysis*, called *AR* for *Autoregressive Model*. We will use this method to predict future sales data and will rebuild it to get a deeper understanding of how this method works, so read on!

Continue reading “Time Series Analysis: Forecasting Sales Data with Autoregressive (AR) Models”

## COVID-19: False Positive Alarm

In this post, we are going to replicate an analysis from the current issue of *Scientific American* about a common mathematical pitfall of Coronavirus antibody tests with R.

Many people think that when they get a positive result of such a test they are immune to the virus with high probability. If you want to find out why nothing could be further from the truth, read on!

Continue reading “COVID-19: False Positive Alarm”

## Learning Data Science: A/B Testing in Under One Minute

Google does it! Facebook does it! Amazon does it for sure!

Especially in the areas of web design and online advertising, everybody is talking about *A/B testing*. If you quickly want to understand what it is and how you can do it with R, read on!

Continue reading “Learning Data Science: A/B Testing in Under One Minute”

## Learning Statistics: Randomness is a Strange Beast

Our intuition concerning *randomness* is, strangely enough, quite limited. While we expect it to behave in certain ways (which it doesn’t) it shows some regularities that have unexpected consequences. In a series of seemingly random posts, I will highlight some of those regularities as well as consequences. If you want to learn something about randomness’ strange behaviour and gain some intuition read on!

Continue reading “Learning Statistics: Randomness is a Strange Beast”

## Lying with Statistics: One Beer a Day will Kill you!

About two years ago the renowned medical journal “The Lancet” came out with the rather sensational conclusion that there is no safe level of alcohol consumption, so every little hurts! For example, drinking a bottle of beer per day (half a litre) would increase your risk of developing a serious health problem within one year by a whopping 7%! When I read that I had to calm my nerves by having a drink!

Ok, kidding aside: in this post, you will learn how to lie with statistics by deviously mixing up *relative* and *absolute* changes in risks, so read on!

Continue reading “Lying with Statistics: One Beer a Day will Kill you!”

## ZeroR: The Simplest Possible Classifier… or: Why High Accuracy can be Misleading

In one of my most popular posts So, what is AI really? I showed that *Artificial Intelligence (AI)* basically boils down to autonomously learned rules, i.e. *conditional statements* or simply, *conditionals*.

In this post, I create the simplest possible *classifier*, called *ZeroR*, to show that even this classifier can achieve surprisingly high values for *accuracy* (i.e. the ratio of correctly predicted instances)… and why this is not necessarily a good thing, so read on!

Continue reading “ZeroR: The Simplest Possible Classifier… or: Why High Accuracy can be Misleading”

## Learning Data Science: Understanding ROC Curves

One widely used graphical plot to assess the quality of a machine learning classifier or the accuracy of a medical test is the *Receiver Operating Characteristic* curve, or *ROC* curve. If you want to gain an intuition and see how they can be easily created with base R read on!

Continue reading “Learning Data Science: Understanding ROC Curves”

## COVID-19 in the US: Back-of-the-Envelope Calculation of Actual Infections and Future Deaths

One of the biggest problems of the COVID-19 pandemic is that there are no reliable numbers of infections. This fact renders many model projections next to useless.

If you want to get to know a simple method how to roughly estimate the real number of infections and expected deaths in the US, read on!

Continue reading “COVID-19 in the US: Back-of-the-Envelope Calculation of Actual Infections and Future Deaths”

## Contagiousness of COVID-19 Part I: Improvements of Mathematical Fitting (Guest Post)

**Learning Machines** proudly presents a guest post by **Martijn Weterings** from the Food and Natural Products research group of the Institute of Life Technologies at the University of Applied Sciences of Western Switzerland in Sion.

Continue reading “Contagiousness of COVID-19 Part I: Improvements of Mathematical Fitting (Guest Post)”