One widely used graphical plot to assess the quality of a machine learning classifier or the accuracy of a medical test is the Receiver Operating Characteristic curve, or ROC curve. If you want to gain an intuition and see how they can be easily created with base R read on!
Continue reading “Learning Data Science: Understanding ROC Curves”
Valentine’s Day is around the corner and love is in the air… but, shock horror, nearly every second marriage ends in a divorce! Unfortunately, I can tell you first hand that this is an experience you’d rather not have. In this post, we see how data science, in the form of the
OneR package and an interesting new data set, might potentially help you to avoid that tragedy… so read on!
Continue reading “The One Question you should ask your Partner before Marrying!”
We already covered the so-called Accuracy-Interpretability Trade-Off which states that oftentimes the more accurate the results of an AI are the harder it is to interpret how it arrived at its conclusions (see also: Learning Data Science: Predicting Income Brackets).
This is especially true for Neural Networks: while often delivering outstanding results, they are basically black boxes and notoriously hard to interpret (see also: Understanding the Magic of Neural Networks).
There is a new hot area of research to make black-box models interpretable, called Explainable Artificial Intelligence (XAI), if you want to gain some intuition on one such approach (called LIME), read on!
Continue reading “Explainable AI (XAI)… Explained! Or: How to whiten any Black Box with LIME”
There seems to be some revolution going on in the R sphere… people seem to be jumping at what is commonly known as the tidyverse, a collection of packages developed and maintained by the Chief Scientist of RStudio, Hadley Wickham.
In this post, I explain what the tidyverse is and why I resist using it, so read on!
Continue reading “Why I don’t use the Tidyverse”
As we have already seen in former posts simple methods can be surprisingly successful in yielding good results (see e.g Learning Data Science: Predicting Income Brackets or Teach R to read handwritten Digits with just 4 Lines of Code).
If you want to learn how some simple mathematics, known as Naive Bayes, can help you find out the sentiment of texts (in this case movie reviews) read on!
Continue reading “Learning Data Science: Sentiment Analysis with Naive Bayes”
It has been an old dream to teach a computer to see, i.e. to hold something in front of a camera and let the computer tell you what it sees. For decades it has been exactly that: a dream – because we as human beings are able to see, we just don’t know how we do it, let alone be precise enough to put it into algorithmic form.
Enter machine learning!
Continue reading “Teach R to see by Borrowing a Brain”
Customer Relationship Management (CRM) is not only about acquiring new customers but especially about retaining existing ones. That is because acquisition is often much more expensive than retention. In this post, we learn how to analyze the reasons of customer churn (i.e. customers leaving the company). We do this with a very convenient point-and-click interface for doing data science on top of R, so read on!
Continue reading “Data Science on Rails: Analyzing Customer Churn”
In 1965 the University of Chicago rejected Kurt Vonnegut’s college thesis, which claimed that all stories shared common structures, or “shapes”, including “Man in a Hole”, “Boy gets Girl” and “Cinderella”. Many years later the then already legendary Vonnegut gave a hilarious lecture on this idea – before continuing to read on please watch it here (about 4 minutes):
Continue reading “Extracting basic Plots from Novels: Dracula is a Man in a Hole”
The two most disruptive political events of the last few years are undoubtedly the Brexit referendum to leave the European Union and the election of Donald Trump. Both are commonly associated with the political consulting firm Cambridge Analytica and a technique known as Microtargeting.
If you want to understand the data science behind the Cambridge Analytica/Facebook data scandal and Microtargeting (i.e. LASSO regression) by building a toy example in R read on!
Continue reading “Cambridge Analytica: Microtargeting or How to catch voters with the LASSO”
A few month ago I posted about market basket analysis (see Customers who bought…), in this post we will see another form of it, done with Logistic Regression, so read on…
Continue reading “Learning Data Science: The Supermarket knows you are pregnant before your Dad does”