ZeroR: The Simplest Possible Classifier, or Why High Accuracy can be Misleading


In one of my most popular posts So, what is AI really? I showed that Artificial Intelligence (AI) basically boils down to autonomously learned rules, i.e. conditional statements or simply, conditionals.

In this post, I create the simplest possible classifier, called ZeroR, to show that even this classifier can achieve surprisingly high values for accuracy (i.e. the ratio of correctly predicted instances)… and why this is not necessarily a good thing, so read on!
Continue reading “ZeroR: The Simplest Possible Classifier, or Why High Accuracy can be Misleading”

COVID-19: Analyze Mobility Trends with R


The global lockdown has slowed down mobility considerably. This can be seen in the data produced by our ubiquitous mobile phones.

Apple is kind enough to make those anonymized and aggregated data available to the public. If you want to learn how to get a handle on those data and analyze trends with R read on!
Continue reading “COVID-19: Analyze Mobility Trends with R”

Learning Data Science: Understanding ROC Curves


One widely used graphical plot to assess the quality of a machine learning classifier or the accuracy of a medical test is the Receiver Operating Characteristic curve, or ROC curve. If you want to gain an intuition and see how they can be easily created with base R read on!
Continue reading “Learning Data Science: Understanding ROC Curves”

COVID-19 in the US: Back-of-the-Envelope Calculation of Actual Infections and Future Deaths


One of the biggest problems of the COVID-19 pandemic is that there are no reliable numbers of infections. This fact renders many model projections next to useless.

If you want to get to know a simple method how to roughly estimate the real number of infections and expected deaths in the US, read on!
Continue reading “COVID-19 in the US: Back-of-the-Envelope Calculation of Actual Infections and Future Deaths”

Contagiousness of COVID-19 Part I: Improvements of Mathematical Fitting (Guest Post)


Learning Machines proudly presents a guest post by Martijn Weterings from the Food and Natural Products research group of the Institute of Life Technologies at the University of Applied Sciences of Western Switzerland in Sion.
Continue reading “Contagiousness of COVID-19 Part I: Improvements of Mathematical Fitting (Guest Post)”

Collider Bias, or Are Hot Babes Dim and Eggheads Ugly?


Correlation and its associated challenges don’t lose their fascination: most people know that correlation doesn’t imply causation, not many people know that the opposite is also true (see: Causation doesn’t imply Correlation either) and some know that correlation can just be random (so-called spurious correlation).

If you want to learn about a paradoxical effect nearly nobody is aware of, where correlation between two uncorrelated random variables is introduced just by sampling, read on!
Continue reading “Collider Bias, or Are Hot Babes Dim and Eggheads Ugly?”

COVID-19: The Case of Germany

It is such a beautiful day outside, lot’s of sunshine, spring at last… and we are now basically all grounded and sitting here, waiting to get sick.

So, why not a post from the new epicentre of the global COVID-19 pandemic, Central Europe, more exactly where I live: Germany?! Indeed, if you want to find out what the numbers tell us how things might develop here, read on!
Continue reading “COVID-19: The Case of Germany”

The One Question you should ask your Partner before Marrying!


Valentine’s Day is around the corner and love is in the air… but, shock horror, nearly every second marriage ends in a divorce! Unfortunately, I can tell you first hand that this is an experience you’d rather not have. In this post, we see how data science, in the form of the OneR package and an interesting new data set, might potentially help you to avoid that tragedy… so read on!
Continue reading “The One Question you should ask your Partner before Marrying!”

Epidemiology: How contagious is Novel Coronavirus (2019-nCoV)?


A new invisible enemy, only 30kb in size, has emerged and is on a killing spree around the world: 2019-nCoV, the Novel Coronavirus!

It has already killed more people than the SARS pandemic and its outbreak has been declared a Public Health Emergency of International Concern (PHEIC) by the World Health Organization (WHO).

If you want to learn how epidemiologists estimate how contagious a new virus is and how to do it in R read on!
Continue reading “Epidemiology: How contagious is Novel Coronavirus (2019-nCoV)?”

Does Australia need More Fires (but the Right Kind)? A Multi-Agent Simulation


We have all watched with great horror the catastrophic fires in Australia. Over many years scientists have been studying simulations to understand the underlying dynamics better. They tell us, that “what Australia needs is more fires, but of the right kind”. What do they mean by that?

One such simulation of fire is based on Multi-Agent Systems (MAS), also called Agent-Based Modelling (ABM). An excellent piece of free software (and in fact the de facto standard) is NetLogo. Even better is that NetLogo can be fully controlled by R… and we will use this feature to learn some crucial lessons!

If you want to understand more about the dynamics of fire in particular and about some fascinating properties of dynamical systems in general via controlling NetLogo with R, read on!
Continue reading “Does Australia need More Fires (but the Right Kind)? A Multi-Agent Simulation”