Over the course of the last two and a half years, I have written over one hundred posts for my blog “Learning Machines” on the topics of data science, i.e. statistics, artificial intelligence, machine learning, and deep learning.
I use many of those in my university classes and in this post, I will give you the first part of a learning path for the knowledge that has accumulated on this blog over the years to become a well-rounded data scientist, so read on!
Foundations
We start by explaining why R is the best choice to do data science. This is still one of my most popular posts!
Why R for Data Science – and not Python?
This post will get you up to speed by introducing you to important concepts of R.
Learning R: The Ultimate Introduction (incl. Machine Learning!)
Data science is all about building good models, so we start by building a very simple linear model here and learn about the important concept of overfitting.
Learning Data Science: Modelling Basics
In all of data science the concept of correlation is an important one, so you should develop some intuition about it.
Causation doesn’t imply Correlation either
This post shows that even with simple correlations powerful models can be built.
Machine Learning
The OneR package (which I developed) is a very simple classification algorithm which can serve as a great use case for machine learning.
One Rule (OneR) Machine Learning Classification in under One Minute
A few more examples of how to use the OneR classification algorithm to gain valuable insights from your data.
OneR – Fascinating Insights through Simple Rules
A consistent example where we apply three classification algorithms (OneR, decision trees, random forests) on the same data set to understand the Accuracy-Interpretability Trade-Off.
Learning Data Science: Predicting Income Brackets
We are now in a good position to really understand what Artifical Intelligence is all about!
Diving even deeper into the concepts of Artificial Intelligence, Machine Learning and Deep Learning – written by an advanced AI!
The Most Advanced AI in the World explains what AI, Machine Learning, and Deep Learning are!
Everybody is talking about Neural Networks (a.k.a. deep learning), we explain how the underlying technology works!
Understanding the Magic of Neural Networks
One of the biggest problems of neural networks is that they are black boxes. One remedy is Explainable AI (XAI) which renders those systems interpretable.
Explainable AI (XAI)… Explained! Or: How to whiten any Black Box with LIME
We already covered simple linear regression. Here we introduce another linear model, logistic regression and apply it to a fascinating (and a little frightening) real-life example.
Learning Data Science: The Supermarket knows you are pregnant before your Dad does
This post provides the big picture on the connections between linear regression, logistic regression, and neural networks.
Logistic Regression as the Smallest Possible Neural Network
Another fascinating real-life application of yet another linear model, the LASSO, and how it got Trump elected (kind of)!
Cambridge Analytica: Microtargeting or How to catch voters with the LASSO
Bonus Post
Now that we are already talking about some controversial uses of AI, let us dive a little bit deeper into what AI could mean for us and for the society as a whole.
Thomas Ramge: Postdigital (Book Excerpt)
Many data science-techniques are based on some form of distance measure. Here we learn about the k-nearest neighbours (KNN) algorithm.
Teach R to read handwritten Digits with just 4 Lines of Code
Another distance measure-based algorithm in the area of unsupervised learning is the k-means clustering algorithm.
Learning Data Science: Understanding and Using k-means Clustering
Besides supervised and unsupervised learning there is a third category which is getting more and more popular (especially in combination with neural networks): Reinforcement Learning.
Reinforcement Learning: Life is a Maze
A glimpse into another important topic: Sentiment Analysis which is an important application area of Natural Language Processing (NLP).
Learning Data Science: Sentiment Analysis with Naive Bayes
(More) Real-World Applications
An important real-world application in the area of Customer Relationship Management (CRM): how to retain your customers!
Data Science on Rails: Analyzing Customer Churn
Another important real-world application in the area of Finance: how to get your money back!
Will I get my Money back? Credit Scoring with OneR
Quite a controversial application in the area of law enforcement: how to only let those guys free who have become good citizens again.
Recidivism: Identifying the Most Important Predictors for Re-offending with OneR
Medical research has become more and more important for the application of machine learning (I also provide a link to an open-source paper I published on Covid-19)!
OneR in Medical Research: Finding Leading Symptoms, Main Predictors and Cut-Off Points
Here we use clustering to find a pattern in the Greek Gospels only Biblical Scholars normally know about – fascinating stuff!
With this application of sentiment analysis you can automatically extract plots from novels!
Extracting Basic Plots from Novels: Dracula is a Man in a Hole
Bonus Post
After all that nerdy stuff, some practical advice on how to date – which proves the point that data scientist is the sexiest job of the 21st century!
Cupid’s Arrow: How to Boost your Chances at Speed Dating!
If you work through this learning path you will have a good basic understanding of important foundations, techniques, and real-life applications of machine learning.
Yet this is only a selection of topics covered on this blog so far. Feel free to discover other gems around here… and please share your feedback and suggestions in the comment section below.
In the future, I plan to publish another post on a learning path for statistics which is also an important foundation for becoming a data scientist but which would have overwhelmed this post, so stay tuned!
UPDATE September 20, 2021
I created a video for this post (in German):
2 thoughts on “Learning Path for “Data Science with R” – Part I”