The Pólya Urn Model: A simple Simulation of “The Rich get Richer”


What is the “opposite” of sampling without replacement? In a classical urn model sampling without replacement means that you don’t replace the ball that you have drawn. Therefore the probability of drawing that colour becomes smaller. How about the opposite, i.e. that the probability becomes bigger? Then you have a so-called Pólya urn model!

Many real-world processes have this self-reinforcing property, e.g. leading to the distribution of wealth or the number of followers on social media. If you want to learn how to simulate such a process with R and encounter some surprising results, read on!
Continue reading “The Pólya Urn Model: A simple Simulation of “The Rich get Richer””

Understanding Public-Key Cryptography by Mixing Colours!


Public-key cryptography is one of the foundations of our modern digital life. Normally it is quite hard to understand but with our literally colourful explanation it is a walk in the park. At the end we also give the nerd version, so read on!
Continue reading “Understanding Public-Key Cryptography by Mixing Colours!”

The Most Advanced AI in the World explains what AI, Machine Learning, and Deep Learning are!


This is our 101’st blog post here on Learning Machines and we have prepared something very special for you!

Oftentimes the different concepts of data science, namely artificial intelligence (AI), machine learning (ML), and deep learning (DL) are confused… so we asked the most advanced AI in the world, OpenAI GPT-3, to write a guest post for us to provide some clarification on their definitions and how they are related.

We are most delighted to present this very impressive (and only slightly redacted) essay to you – enjoy!
Continue reading “The Most Advanced AI in the World explains what AI, Machine Learning, and Deep Learning are!”

Recidivism: Identifying the Most Important Predictors for Re-offending with OneR


In 2018 the renowned scientific journal science broke a story that researchers had re-engineered the commercial criminal risk assessment software COMPAS with a simple logistic regression (Science: The accuracy, fairness, and limits of predicting recidivism).

According to this article, COMPAS uses 137 features, the authors just used two. In this post, I will up the ante by showing you how to achieve similar results using just one simple rule based on only one feature which is found automatically in no-time by the OneR package, so read on!
Continue reading “Recidivism: Identifying the Most Important Predictors for Re-offending with OneR”

Pseudo-Randomness: Creating Fake Noise


In data science, we try to find, sometimes well-hidden, patterns (= signal) in often seemingly random data (= noise). Pseudo-Random Number Generators (PRNG) try to do the opposite: hiding a deterministic data generating process (= signal) by making it look like randomness (= noise). If you want to understand some basics behind the scenes of this fascinating topic, read on!
Continue reading “Pseudo-Randomness: Creating Fake Noise”

ELIZA Chatbot in R: Build Yourself a Shrink


More and more companies use chatbots for engaging with their customers. Often the underlying technology is not too sophisticated, yet many people are stunned at how human-like those bots can appear. The earliest example of this was an early natural language processing (NLP) computer program called Eliza created 1966 at the MIT Artificial Intelligence Laboratory by Professor Joseph Weizenbaum.

Eliza was supposed to simulate a psychotherapist and was mainly created as a method to show the superficiality of communication between man and machine. Weizenbaum was surprised by the number of individuals who attributed human-like feelings to the computer program, including his own secretary!

If you want to build a simple Eliza-like chatbot yourself with R read on!
Continue reading “ELIZA Chatbot in R: Build Yourself a Shrink”

How to Catch a Thief: Unmasking Madoff’s Ponzi Scheme with Benford’s Law

One of my starting points into quantitive finance was Bernie Madoff’s fund. Back then because Bernie was in desperate need of money to keep his Ponzi scheme running there existed several so-called feeder funds.

One of them happened to approach me to offer me a once in a lifetime investment opportunity. Or so it seemed. Now, there is this old saying that when something seems too good to be true it probably is. If you want to learn what Benford’s law is and how to apply it to uncover fraud, read on!
Continue reading “How to Catch a Thief: Unmasking Madoff’s Ponzi Scheme with Benford’s Law”

“You Are Here”: Understanding How GPS Works


Last week, I showed you a method of how to find the fastest path from A to B: Finding the Shortest Path with Dijkstra’s Algorithm. To make use of that, we need a method to determine our position at any point in time.

For that matter, many devices use the so-called Global Positioning System (GPS). If you want to understand how it works and do some simple calculations in R, read on!
Continue reading ““You Are Here”: Understanding How GPS Works”

Finding the Shortest Path with Dijkstra’s Algorithm


I have to make a confession: when it comes to my sense of orientation I am a total failure… sometimes it feels like GPS and Google maps were actually invented for me!

Well, nowadays anybody uses those practical little helpers. But how do they actually manage to find the shortest path from A to B?

If you want to understand the father of all routing algorithms, Dijkstra’s algorithm, and want to know how to program it in R read on!
Continue reading “Finding the Shortest Path with Dijkstra’s Algorithm”

Local Differential Privacy: Getting Honest Answers on Embarrassing Questions

Do you cheat on your partner? Do you take drugs? Are you gay? Are you an atheist? Did you have an abortion? Will you vote for the right-wing candidate? Not all people feel comfortable answering those kinds of questions in every situation honestly.

So, is there a method to find the respective proportion of people without putting them on the spot? Actually, there is! If you want to learn about randomized response (and how to create flowcharts in R along the way) read on!
Continue reading “Local Differential Privacy: Getting Honest Answers on Embarrassing Questions”