Learning Path for “Data Science with R” – Part I


Over the course of the last two and a half years, I have written over one hundred posts for my blog “Learning Machines” on the topics of data science, i.e. statistics, artificial intelligence, machine learning, and deep learning.

I use many of those in my university classes and in this post, I will give you the first part of a learning path for the knowledge that has accumulated on this blog over the years to become a well-rounded data scientist, so read on!

Foundations

We start by explaining why R is the best choice to do data science. This is still one of my most popular posts!

Why R for Data Science – and not Python?


This post will get you up to speed by introducing you to important concepts of R.

Learning R: The Ultimate Introduction (incl. Machine Learning!)


Data science is all about building good models, so we start by building a very simple linear model here and learn about the important concept of overfitting.

Learning Data Science: Modelling Basics


In all of data science the concept of correlation is an important one, so you should develop some intuition about it.

Causation doesn’t imply Correlation either


This post shows that even with simple correlations powerful models can be built.

Customers who bought…



Machine Learning

The OneR package (which I developed) is a very simple classification algorithm which can serve as a great use case for machine learning.

One Rule (OneR) Machine Learning Classification in under One Minute


A few more examples of how to use the OneR classification algorithm to gain valuable insights from your data.

OneR – Fascinating Insights through Simple Rules


A consistent example where we apply three classification algorithms (OneR, decision trees, random forests) on the same data set to understand the Accuracy-Interpretability Trade-Off.

Learning Data Science: Predicting Income Brackets


We are now in a good position to really understand what Artifical Intelligence is all about!

So, what is AI really?



Diving even deeper into the concepts of Artificial Intelligence, Machine Learning and Deep Learning – written by an advanced AI!

The Most Advanced AI in the World explains what AI, Machine Learning, and Deep Learning are!


Everybody is talking about Neural Networks (a.k.a. deep learning), we explain how the underlying technology works!

Understanding the Magic of Neural Networks


One of the biggest problems of neural networks is that they are black boxes. One remedy is Explainable AI (XAI) which renders those systems interpretable.

Explainable AI (XAI)… Explained! Or: How to whiten any Black Box with LIME


We already covered simple linear regression. Here we introduce another linear model, logistic regression and apply it to a fascinating (and a little frightening) real-life example.

Learning Data Science: The Supermarket knows you are pregnant before your Dad does


This post provides the big picture on the connections between linear regression, logistic regression, and neural networks.

Logistic Regression as the Smallest Possible Neural Network


Another fascinating real-life application of yet another linear model, the LASSO, and how it got Trump elected (kind of)!

Cambridge Analytica: Microtargeting or How to catch voters with the LASSO


Ramge: PostdigitalBonus Post
Now that we are already talking about some controversial uses of AI, let us dive a little bit deeper into what AI could mean for us and for the society as a whole.

Thomas Ramge: Postdigital (Book Excerpt)


Many data science-techniques are based on some form of distance measure. Here we learn about the k-nearest neighbours (KNN) algorithm.

Teach R to read handwritten Digits with just 4 Lines of Code


Another distance measure-based algorithm in the area of unsupervised learning is the k-means clustering algorithm.

Learning Data Science: Understanding and Using k-means Clustering


Besides supervised and unsupervised learning there is a third category which is getting more and more popular (especially in combination with neural networks): Reinforcement Learning.

Reinforcement Learning: Life is a Maze


A glimpse into another important topic: Sentiment Analysis which is an important application area of Natural Language Processing (NLP).

Learning Data Science: Sentiment Analysis with Naive Bayes


(More) Real-World Applications

An important real-world application in the area of Customer Relationship Management (CRM): how to retain your customers!

Data Science on Rails: Analyzing Customer Churn


Another important real-world application in the area of Finance: how to get your money back!

Will I get my Money back? Credit Scoring with OneR


Quite a controversial application in the area of law enforcement: how to only let those guys free who have become good citizens again.

Recidivism: Identifying the Most Important Predictors for Re-offending with OneR


Medical research has become more and more important for the application of machine learning (I also provide a link to an open-source paper I published on Covid-19)!

OneR in Medical Research: Finding Leading Symptoms, Main Predictors and Cut-Off Points


Here we use clustering to find a pattern in the Greek Gospels only Biblical Scholars normally know about – fascinating stuff!

Clustering the Bible


With this application of sentiment analysis you can automatically extract plots from novels!

Extracting Basic Plots from Novels: Dracula is a Man in a Hole


Bonus Post
After all that nerdy stuff, some practical advice on how to date – which proves the point that data scientist is the sexiest job of the 21st century!

Cupid’s Arrow: How to Boost your Chances at Speed Dating!


If you work through this learning path you will have a good basic understanding of important foundations, techniques, and real-life applications of machine learning.

Yet this is only a selection of topics covered on this blog so far. Feel free to discover other gems around here… and please share your feedback and suggestions in the comment section below.

In the future, I plan to publish another post on a learning path for statistics which is also an important foundation for becoming a data scientist but which would have overwhelmed this post, so stay tuned!


UPDATE September 20, 2021
I created a video for this post (in German):

2 thoughts on “Learning Path for “Data Science with R” – Part I”

Leave a Reply

Your email address will not be published. Required fields are marked *

I accept that my given data and my IP address is sent to a server in the USA only for the purpose of spam prevention through the Akismet program.More information on Akismet and GDPR.

This site uses Akismet to reduce spam. Learn how your comment data is processed.