There seems to be some revolution going on in the R sphere… people seem to be jumping at what is commonly known as the tidyverse, a collection of packages developed and maintained by the Chief Scientist of RStudio, Hadley Wickham.
In this post, I explain what the tidyverse is and why I resist using it, so read on!
Continue reading “Why I don’t use the Tidyverse”
Data Scientists know that about 80% of a Data Science project consists of preparing the data so that they can be analyzed. Building Machine Learning models is the fun part that only comes afterwards!
This process is called Data Wrangling (or Data Munging). If you want to use some Base R data wrangling techniques in a fun game to hack a password read on!
Continue reading “Learning R: Data Wrangling in Password Hacking Game”
It has been an old dream to teach a computer to see, i.e. to hold something in front of a camera and let the computer tell you what it sees. For decades it has been exactly that: a dream – because we as human beings are able to see, we just don’t know how we do it, let alone be precise enough to put it into algorithmic form.
Enter machine learning!
Continue reading “Teach R to see by Borrowing a Brain”
Customer Relationship Management (CRM) is not only about acquiring new customers but especially about retaining existing ones. That is because acquisition is often much more expensive than retention. In this post, we learn how to analyze the reasons of customer churn (i.e. customers leaving the company). We do this with a very convenient point-and-click interface for doing data science on top of R, so read on!
Continue reading “Data Science on Rails: Analyzing Customer Churn”
Today the biggest book fair of the world starts again in Frankfurt, Germany. I thought this might be a good opportunity to do you some good!
Springer is one of the most renowned scientific publishing companies in the world. Normally, their books are quite expensive but also in the publishing business Open Access is a megatrend.
If you want to use R in a little fun project to find the latest additions of open access books to their program read on!
Continue reading “Finding free Science Books from Springer”
The two most disruptive political events of the last few years are undoubtedly the Brexit referendum to leave the European Union and the election of Donald Trump. Both are commonly associated with the political consulting firm Cambridge Analytica and a technique known as Microtargeting.
If you want to understand the data science behind the Cambridge Analytica/Facebook data scandal and Microtargeting (i.e. LASSO regression) by building a toy example in R read on!
Continue reading “Cambridge Analytica: Microtargeting or How to catch voters with the LASSO”
A few month ago I posted about market basket analysis (see Customers who bought…), in this post we will see another form of it, done with Logistic Regression, so read on…
Continue reading “Learning Data Science: The Supermarket knows you are pregnant before your Dad does”
A few months ago I published a quite popular post on Clustering the Bible… one well known clustering algorithm is k-means. If you want to learn how k-means works and how to apply it in a real-world example, read on…
Continue reading “Learning Data Science: Understanding and Using k-means Clustering”
One of the most notoriously difficult subjects in statistics is the concept of statistical tests. We will explain the ideas behind it step by step to give you some intuition on how to use (and misuse) it, so read on…
Continue reading “From Coin Tosses to p-Hacking: Make Statistics Significant Again!”
The area of combinatorics, the art of systematic counting, is dreaded territory for many people, so let us bring some light into the matter: in this post we will explain the difference between permutations and combinations, with and without repetitions, will calculate the number of possibilities and present efficient R code to enumerate all of them, so read on…
Continue reading “Learning R: Permutations and Combinations with Base R”