Over the course of the last two and a half years, I have written over one hundred posts for my blog “Learning Machines” on the topics of data science, i.e. statistics, artificial intelligence, machine learning, and deep learning.

I use many of those in my university classes and in this post, I will give you the first part of a learning path for the knowledge that has accumulated on this blog over the years to become a well-rounded data scientist, so read on!

## Foundations

We start by explaining why R is the best choice to do data science. This is still one of my most popular posts!

Why R for Data Science – and not Python?

This post will get you up to speed by introducing you to important concepts of R.

Learning R: The Ultimate Introduction (incl. Machine Learning!)

Data science is all about building good models, so we start by building a very simple *linear model* here and learn about the important concept of *overfitting*.

Learning Data Science: Modelling Basics

In all of data science the concept of *correlation* is an important one, so you should develop some intuition about it.

Causation doesn’t imply Correlation either

This post shows that even with simple correlations powerful models can be built.

## Machine Learning

The OneR package (which I developed) is a very simple *classification* algorithm which can serve as a great use case for machine learning.

One Rule (OneR) Machine Learning Classification in under One Minute

A few more examples of how to use the OneR classification algorithm to gain valuable insights from your data.

OneR – Fascinating Insights through Simple Rules

A consistent example where we apply three classification algorithms (OneR, *decision trees*, *random forests*) on the same data set to understand the *Accuracy-Interpretability Trade-Off*.

Learning Data Science: Predicting Income Brackets

We are now in a good position to really understand what *Artifical Intelligence* is all about!

Diving even deeper into the concepts of Artificial Intelligence, *Machine Learning* and *Deep Learning* – written by an advanced AI!

The Most Advanced AI in the World explains what AI, Machine Learning, and Deep Learning are!

Everybody is talking about *Neural Networks* (a.k.a. deep learning), we explain how the underlying technology works!

Understanding the Magic of Neural Networks

One of the biggest problems of neural networks is that they are *black boxes*. One remedy is *Explainable AI (XAI)* which renders those systems interpretable.

Explainable AI (XAI)… Explained! Or: How to whiten any Black Box with LIME

We already covered simple *linear regression*. Here we introduce another linear model, *logistic regression* and apply it to a fascinating (and a little frightening) real-life example.

Learning Data Science: The Supermarket knows you are pregnant before your Dad does

This post provides the big picture on the connections between *linear regression*, *logistic regression*, and *neural networks*.

Logistic Regression as the Smallest Possible Neural Network

Another fascinating real-life application of yet another linear model, the *LASSO*, and how it got Trump elected (kind of)!

Cambridge Analytica: Microtargeting or How to catch voters with the LASSO

**Bonus Post**

Now that we are already talking about some controversial uses of AI, let us dive a little bit deeper into what AI could mean for us and for the society as a whole.

Thomas Ramge: Postdigital (Book Excerpt)

Many data science-techniques are based on some form of *distance measure*. Here we learn about the *k-nearest neighbours (KNN)* algorithm.

Teach R to read handwritten Digits with just 4 Lines of Code

Another distance measure-based algorithm in the area of *unsupervised learning* is the *k-means clustering algorithm*.

Learning Data Science: Understanding and Using k-means Clustering

Besides supervised and unsupervised learning there is a third category which is getting more and more popular (especially in combination with neural networks): *Reinforcement Learning*.

Reinforcement Learning: Life is a Maze

A glimpse into another important topic: *Sentiment Analysis* which is an important application area of *Natural Language Processing (NLP)*.

Learning Data Science: Sentiment Analysis with Naive Bayes

## (More) Real-World Applications

An important real-world application in the area of *Customer Relationship Management (CRM)*: how to retain your customers!

Data Science on Rails: Analyzing Customer Churn

Another important real-world application in the area of Finance: how to get your money back!

Will I get my Money back? Credit Scoring with OneR

Quite a controversial application in the area of law enforcement: how to only let those guys free who have become good citizens again.

Recidivism: Identifying the Most Important Predictors for Re-offending with OneR

Medical research has become more and more important for the application of machine learning (I also provide a link to an open-source paper I published on Covid-19)!

OneR in Medical Research: Finding Leading Symptoms, Main Predictors and Cut-Off Points

Here we use clustering to find a pattern in the Greek Gospels only Biblical Scholars normally know about – fascinating stuff!

With this application of sentiment analysis you can automatically extract plots from novels!

Extracting Basic Plots from Novels: Dracula is a Man in a Hole

**Bonus Post**

After all that nerdy stuff, some practical advice on how to date – which proves the point that data scientist is the sexiest job of the 21st century!

Cupid’s Arrow: How to Boost your Chances at Speed Dating!

If you work through this learning path you will have a good basic understanding of important foundations, techniques, and real-life applications of machine learning.

Yet this is only a selection of topics covered on this blog so far. Feel free to discover other gems around here… and please share your feedback and suggestions in the comment section below.

In the future, I plan to publish another post on a learning path for *statistics* which is also an important foundation for becoming a data scientist but which would have overwhelmed this post, so stay tuned!

**UPDATE September 20, 2021**

I created a video for this post (in German):

## One thought on “Learning Path for “Data Science with R” – Part I”