Learning Data Science: Understanding and Using k-means Clustering


A few months ago I published a quite popular post on Clustering the Bible… one well known clustering algorithm is k-means. If you want to learn how k-means works and how to apply it in a real-world example, read on…
Continue reading “Learning Data Science: Understanding and Using k-means Clustering”

Reinforcement Learning: Life is a Maze


It can be argued that the most important decisions in life are some variant of an exploitation-exploration problem. Shall I stick with my current job or look for a new one? Shall I stay with my partner or seek a new love? Shall I continue reading the book or watch the movie instead? In all of those cases, the question is always whether I should “exploit” the thing I have or whether I should “explore” new things. If you want to learn how to tackle this most basic trade-off read on…
Continue reading “Reinforcement Learning: Life is a Maze”

Teach R to read handwritten Digits with just 4 Lines of Code


What is the best way for me to find out whether you are rich or poor, when the only thing I know is your address? Looking at your neighbourhood! That is the big idea behind the k-nearest neighbours (or KNN) algorithm, where k stands for the number of neighbours to look at. The idea couldn’t be any simpler yet the results are often very impressive indeed – so read on…
Continue reading “Teach R to read handwritten Digits with just 4 Lines of Code”

Creating a Movie with Data from Outer Space in R

Source: Wikimedia

The Rosetta mission of the European Space Agency (ESA) is one of the greatest (yet underappreciated) triumphs of humankind: it was launched in 2004 and landed the spacecraft Philae ten years later on a small comet, named 67P/Churyumov–Gerasimenko (for the whole timeline of the mission see here: Timeline of Rosetta spacecraft).

ESA provided the world with datasets of the comet which we will use to create an animated gif in R… so read on!
Continue reading “Creating a Movie with Data from Outer Space in R”

Understanding AdaBoost – or how to turn Weakness into Strength


Many of you might have heard of the concept “Wisdom of the Crowd”: when many people independently guess some quantity, e.g. the number of marbles in a jar glass, the average of their guesses is often pretty accurate – even though many of the guesses are totally off.

The same principle is at work in so-called ensemble methods, like bagging and boosting. If you want to know more about boosting and how to turn pseudocode of a scientific paper into valid R code read on…
Continue reading “Understanding AdaBoost – or how to turn Weakness into Strength”

From Coin Tosses to p-Hacking: Make Statistics Significant Again!


One of the most notoriously difficult subjects in statistics is the concept of statistical tests. We will explain the ideas behind it step by step to give you some intuition on how to use (and misuse) it, so read on…
Continue reading “From Coin Tosses to p-Hacking: Make Statistics Significant Again!”

Learning R: Permutations and Combinations with Base R


The area of combinatorics, the art of systematic counting, is dreaded territory for many people, so let us bring some light into the matter: in this post we will explain the difference between permutations and combinations, with and without repetitions (also called replacements), will calculate the number of possibilities and present efficient R code to enumerate all of them, so read on…
Continue reading “Learning R: Permutations and Combinations with Base R”

Learning R: Painting with Fire


A few months ago I published a post on recursion: To understand Recursion you have to understand Recursion…. In this post we will see how to use recursion to fill free areas of an image with colour, the caveats of recursion and how to transform a recursive algorithm into a loop-based version using a queue – so read on…
Continue reading “Learning R: Painting with Fire”