# Learning Path for “Data Science with R” – Part I

Over the course of the last two and a half years, I have written over one hundred posts for my blog “Learning Machines” on the topics of data science, i.e. statistics, artificial intelligence, machine learning, and deep learning.

I use many of those in my university classes and in this post, I will give you the first part of a learning path for the knowledge that has accumulated on this blog over the years to become a well-rounded data scientist, so read on!

## Foundations

We start by explaining why R is the best choice to do data science. This is still one of my most popular posts!

This post will get you up to speed by introducing you to important concepts of R.

Data science is all about building good models, so we start by building a very simple linear model here and learn about the important concept of overfitting.

In all of data science the concept of correlation is an important one, so you should develop some intuition about it.

This post shows that even with simple correlations powerful models can be built.

## Machine Learning

The OneR package (which I developed) is a very simple classification algorithm which can serve as a great use case for machine learning.

A few more examples of how to use the OneR classification algorithm to gain valuable insights from your data.

A consistent example where we apply three classification algorithms (OneR, decision trees, random forests) on the same data set to understand the Accuracy-Interpretability Trade-Off.

We are now in a good position to really understand what Artifical Intelligence is all about!

Diving even deeper into the concepts of Artificial Intelligence, Machine Learning and Deep Learning – written by an advanced AI!

Everybody is talking about Neural Networks (a.k.a. deep learning), we explain how the underlying technology works!

One of the biggest problems of neural networks is that they are black boxes. One remedy is Explainable AI (XAI) which renders those systems interpretable.

We already covered simple linear regression. Here we introduce another linear model, logistic regression and apply it to a fascinating (and a little frightening) real-life example.

This post provides the big picture on the connections between linear regression, logistic regression, and neural networks.

Another fascinating real-life application of yet another linear model, the LASSO, and how it got Trump elected (kind of)!

Bonus Post
Now that we are already talking about some controversial uses of AI, let us dive a little bit deeper into what AI could mean for us and for the society as a whole.

Many data science-techniques are based on some form of distance measure. Here we learn about the k-nearest neighbours (KNN) algorithm.

Another distance measure-based algorithm in the area of unsupervised learning is the k-means clustering algorithm.

Besides supervised and unsupervised learning there is a third category which is getting more and more popular (especially in combination with neural networks): Reinforcement Learning.

A glimpse into another important topic: Sentiment Analysis which is an important application area of Natural Language Processing (NLP).

## (More) Real-World Applications

An important real-world application in the area of Customer Relationship Management (CRM): how to retain your customers!

Another important real-world application in the area of Finance: how to get your money back!

Quite a controversial application in the area of law enforcement: how to only let those guys free who have become good citizens again.

Medical research has become more and more important for the application of machine learning (I also provide a link to an open-source paper I published on Covid-19)!

Here we use clustering to find a pattern in the Greek Gospels only Biblical Scholars normally know about – fascinating stuff!

With this application of sentiment analysis you can automatically extract plots from novels!

Bonus Post
After all that nerdy stuff, some practical advice on how to date – which proves the point that data scientist is the sexiest job of the 21st century!

If you work through this learning path you will have a good basic understanding of important foundations, techniques, and real-life applications of machine learning.

Yet this is only a selection of topics covered on this blog so far. Feel free to discover other gems around here… and please share your feedback and suggestions in the comment section below.

In the future, I plan to publish another post on a learning path for statistics which is also an important foundation for becoming a data scientist but which would have overwhelmed this post, so stay tuned!

UPDATE September 20, 2021
I created a video for this post (in German):

## 2 thoughts on “Learning Path for “Data Science with R” – Part I”

This site uses Akismet to reduce spam. Learn how your comment data is processed.