Machine Learning – Learning Machines

Learning Data Science: Why a High R^2 Can Be Misleading

A high $R^2$ can make a regression model look impressively accurate — but this number can be deceptive. If you want to understand why a high $R^2$ is not always a sign of a good model, read on!

Continue reading “Learning Data Science: Why a High R^2 Can Be Misleading”

The Magic of In-Context Learning (ICL): When Your Model Already Knows Your Data

Have you ever looked at a freshly plotted scatter plot and immediately thought, “Ah, this is clearly a logarithmic curve with some heteroskedastic noise,” without running a single line of modeling code? How do you do that? You don’t perform gradient descent in your head. You use your intuition!
Continue reading “The Magic of In-Context Learning (ICL): When Your Model Already Knows Your Data”

Building Your Own Mini-ChatGPT with R: From Markov Chains to Transformers!

Remember our journey so far? We started with simple Markov chains showing how statistical word prediction works, then dove into the core concepts of word embeddings, self-attention, and next word prediction. Now, it’s time for the grand finale: if you want to build your own working transformer language model in R, read on!
Continue reading “Building Your Own Mini-ChatGPT with R: From Markov Chains to Transformers!”

Artificial Intelligence in Academic Theses: An Opportunity, Not a Threat

In an era where artificial intelligence (AI) is increasingly permeating various aspects of our lives, the academic world is also faced with the challenge of dealing with this rapid technological development. This is particularly true regarding final theses and term papers, raising the question of how we, as educational institutions, should handle the use of foundation models like ChatGPT, Google Gemini, and other language-based models (LLMs).
Continue reading “Artificial Intelligence in Academic Theses: An Opportunity, Not a Threat”

Attention! What lies at the Core of ChatGPT? (Also as a Video!)

Word embedding, self-attention, and next-word prediction lie at the core of LLMs like ChatGPT. If you are curious about how these techniques work and want to see a simple example in R, read on!
Continue reading “Attention! What lies at the Core of ChatGPT? (Also as a Video!)”

ChatGPT can Create Datasets, Program in R… and when it makes an Error it can Fix that too!

ChatGPT from OpenAI leaves me speechless over and over again. I have been in the AI industry for many decades now and it has been a long time since I last had this feeling of utter fascination mixed with disbelief mixed with anxiety.

This is only a quick post in the context of R programming which I wanted to share with you, so read on!
Continue reading “ChatGPT can Create Datasets, Program in R… and when it makes an Error it can Fix that too!”

Learning Data Science: Predictive Maintenance with Decision Trees

Predictive Maintenance is one of the big revolutions happening across all major industries right now. Instead of changing parts regularly or even only after they failed it uses Machine Learning methods to predict when a part is going to fail.

If you want to get an introduction to this fascinating developing area, read on!
Continue reading “Learning Data Science: Predictive Maintenance with Decision Trees”

Please Subscribe to My New (German) Data Science YouTube Channel!

I am in the middle of creating a new German YouTube channel that is centered around data science and R! I put a lot of effort into it to serve the interests of the community.

If you want to be a part of the process, watch interesting videos with data-based analyses and look behind the scenes, please consider subscribing to the channel!

The number of subscriptions is also vital for the YouTube algorithm to recommend the videos to other viewers on the platform!
Continue reading “Please Subscribe to My New (German) Data Science YouTube Channel!”

Learning Path for “Data Science with R” – Part I

Over the course of the last two and a half years, I have written over one hundred posts for my blog “Learning Machines” on the topics of data science, i.e. statistics, artificial intelligence, machine learning, and deep learning.

I use many of those in my university classes and in this post, I will give you the first part of a learning path for the knowledge that has accumulated on this blog over the years to become a well-rounded data scientist, so read on!
Continue reading “Learning Path for “Data Science with R” – Part I”

Will I get my Money back? Credit Scoring with OneR

More and more decisions by banks on who gets a loan are being made by artificial intelligence. The terms being used are credit scoring and credit decisioning.

They base their decisions on models whether the customer will pay back the loan or will default, i.e. determine their creditworthiness. If you want to learn how to build such a model in R yourself (with the latest R ≥ 4.1.0 syntax as a bonus), read on!
Continue reading “Will I get my Money back? Credit Scoring with OneR”