Correlation and its associated challenges don’t lose their fascination: most people know that correlation doesn’t imply causation, not many people know that the opposite is also true (see: Causation doesn’t imply Correlation either) and some know that correlation can just be random (so-called spurious correlation).
If you want to learn about a paradoxical effect nearly nobody is aware of, where correlation between two uncorrelated random variables is introduced just by sampling, read on!
Continue reading “Collider Bias, or Are Hot Babes Dim and Eggheads Ugly?”
It is such a beautiful day outside, lot’s of sunshine, spring at last… and we are now basically all grounded and sitting here, waiting to get sick.
So, why not a post from the new epicentre of the global COVID-19 pandemic, Central Europe, more exactly where I live: Germany?! Indeed, if you want to find out what the numbers tell us how things might develop here, read on!
Continue reading “COVID-19: The Case of Germany”
Valentine’s Day is around the corner and love is in the air… but, shock horror, nearly every second marriage ends in a divorce! Unfortunately, I can tell you first hand that this is an experience you’d rather not have. In this post, we see how data science, in the form of the
OneR package and an interesting new data set, might potentially help you to avoid that tragedy… so read on!
Continue reading “The One Question you should ask your Partner before Marrying!”
A new invisible enemy, only 30kb in size, has emerged and is on a killing spree around the world: 2019-nCoV, the Novel Coronavirus!
It has already killed more people than the SARS pandemic and its outbreak has been declared a Public Health Emergency of International Concern (PHEIC) by the World Health Organization (WHO).
If you want to learn how epidemiologists estimate how contagious a new virus is and how to do it in R read on!
Continue reading “Epidemiology: How contagious is Novel Coronavirus (2019-nCoV)?”
We have all watched with great horror the catastrophic fires in Australia. Over many years scientists have been studying simulations to understand the underlying dynamics better. They tell us, that “what Australia needs is more fires, but of the right kind”. What do they mean by that?
One such simulation of fire is based on Multi-Agent Systems (MAS), also called Agent-Based Modelling (ABM). An excellent piece of free software (and in fact the de facto standard) is NetLogo. Even better is that NetLogo can be fully controlled by R… and we will use this feature to learn some crucial lessons!
If you want to understand more about the dynamics of fire in particular and about some fascinating properties of dynamical systems in general via controlling NetLogo with R, read on!
Continue reading “Does Australia need More Fires (but the Right Kind)? A Multi-Agent Simulation”
Modern movies use a lot of mathematics for their computer animations and CGI effects, one of them is what is known under the name fractals.
In this post, we will use this technique to create islands with coastlines that look extraordinarily realistic. If you want to do that yourself read on!
Continue reading “Create realistic-looking Islands with R”
The two most disruptive political events of the last few years are undoubtedly the Brexit referendum to leave the European Union and the election of Donald Trump. Both are commonly associated with the political consulting firm Cambridge Analytica and a technique known as Microtargeting.
If you want to understand the data science behind the Cambridge Analytica/Facebook data scandal and Microtargeting (i.e. LASSO regression) by building a toy example in R read on!
Continue reading “Cambridge Analytica: Microtargeting or How to catch voters with the LASSO”
Like most people, you will have used a search engine lately, like Google. But have you ever thought about how it manages to give you the most fitting results? How does it order the results so that the best are on top? Read on to find out!
Continue reading “Google’s Eigenvector, or How a Random Surfer Finds the Most Relevant Webpages”
One of the problems of navigating an autonomous car through a city is to extract robust signals in the face of all the noise that is present in the different sensors. Just taking something like an arithmetic mean of all the data points could possibly end in a catastrophe: if a part of a wall looks similar to the street and the algorithm calculates an average trajectory of the two this would end in leaving the road and possibly crashing into pedestrians. So we need some robust algorithm to get rid of the noise. The area of statistics that especially deals with such problems is called robust statistics and the methods used therein robust estimation.
Continue reading “Separating the Signal from the Noise: Robust Statistics for Pedestrians”