Cambridge Analytica: Microtargeting or How to catch voters with the LASSO


The two most disruptive political events of the last few years are undoubtedly the Brexit referendum to leave the European Union and the election of Donald Trump. Both are commonly associated with the political consulting firm Cambridge Analytica and a technique known as Microtargeting.

If you want to understand the data science behind the Cambridge Analytica/Facebook data scandal and Microtargeting (i.e. LASSO regression) by building a toy example in R read on!
Continue reading “Cambridge Analytica: Microtargeting or How to catch voters with the LASSO”

Learning Data Science: The Supermarket knows you are pregnant before your Dad does


A few month ago I posted about market basket analysis (see Customers who bought…), in this post we will see another form of it, done with Logistic Regression, so read on…
Continue reading “Learning Data Science: The Supermarket knows you are pregnant before your Dad does”

Learning Data Science: Understanding and Using k-means Clustering


A few months ago I published a quite popular post on Clustering the Bible… one well known clustering algorithm is k-means. If you want to learn how k-means works and how to apply it in a real-world example, read on…
Continue reading “Learning Data Science: Understanding and Using k-means Clustering”

Reinforcement Learning: Life is a Maze


It can be argued that the most important decisions in life are some variant of an exploitation-exploration problem. Shall I stick with my current job or look for a new one? Shall I stay with my partner or seek a new love? Shall I continue reading the book or watch the movie instead? In all of those cases, the question is always whether I should “exploit” the thing I have or whether I should “explore” new things. If you want to learn how to tackle this most basic trade-off read on…
Continue reading “Reinforcement Learning: Life is a Maze”

Teach R to read handwritten Digits with just 4 Lines of Code


What is the best way for me to find out whether you are rich or poor, when the only thing I know is your address? Looking at your neighbourhood! That is the big idea behind the k-nearest neighbours (or KNN) algorithm, where k stands for the number of neighbours to look at. The idea couldn’t be any simpler yet the results are often very impressive indeed – so read on…
Continue reading “Teach R to read handwritten Digits with just 4 Lines of Code”

Understanding AdaBoost – or how to turn Weakness into Strength


Many of you might have heard of the concept “Wisdom of the Crowd”: when many people independently guess some quantity, e.g. the number of marbles in a jar glass, the average of their guesses is often pretty accurate – even though many of the guesses are totally off.

The same principle is at work in so-called ensemble methods, like bagging and boosting. If you want to know more about boosting and how to turn pseudocode of a scientific paper into valid R code read on…
Continue reading “Understanding AdaBoost – or how to turn Weakness into Strength”

Learning R: The Ultimate Introduction (incl. Machine Learning!)


There are a million reasons to learn R (see e.g. Why R for Data Science – and not Python?), but where to start? I present to you the ultimate introduction to bring you up to speed! So read on…
Continue reading “Learning R: The Ultimate Introduction (incl. Machine Learning!)”

Google’s Eigenvector… or how a Random Surfer finds the most relevant Webpages


Like most people, you will have used a search engine lately, like Google. But have you ever thought about how it manages to give you the most fitting results? How does it order the results so that the best are on top? Read on to find out!
Continue reading “Google’s Eigenvector… or how a Random Surfer finds the most relevant Webpages”

Symbolic Regression, Genetic Programming… or if Kepler had R


A few weeks ago we published a post about using the power of the evolutionary method for optimization (see Evolution works!). In this post we will go a step further, so read on…
Continue reading “Symbolic Regression, Genetic Programming… or if Kepler had R”