Clustering the Bible


During this time of year, there is obviously a lot of talk about the Bible. As most people know the New Testament comprises four different Gospels written by anonymous authors 40 to 70 years after Jesus’ supposed crucifixion.

Unfortunately, we have lost all of the originals but only retained copies of copies of copies (and so on) which date back hundreds of years after they were written in all kinds of different versions (renowned Biblical scholar Professor Bart Ehrmann states that there are more versions of the New Testament than there are words in the New Testament). Just as a fun fact: there are many more Gospels but only those four were included in the “official” Bible.
Continue reading “Clustering the Bible”

Learning R: A Gentle Introduction to Higher-Order Functions


Have you ever thought about why the definition of a function in R is different from many other programming languages? The part that causes the biggest difficulties (especially for beginners of R) is that you state the name of the function at the beginning and use the assignment operator – as if functions were like any other data type, like vectors, matrices or data frames…
Continue reading “Learning R: A Gentle Introduction to Higher-Order Functions”

Intuition for Principal Component Analysis (PCA)


Principal Component Analysis (PCA) is a dimension-reduction method that can be used to reduce a large set of (often correlated) variables into a smaller set of (uncorrelated) variables, called principal components, which still contain most of the information.

PCA is a concept that is traditionally hard to grasp so instead of giving you the n’th mathematical derivation I will provide you with some intuition.
Continue reading “Intuition for Principal Component Analysis (PCA)”

Why R for Data Science – and Not Python!


There are literally hundreds of programming languages out there, e.g. the whole alphabet of one letter programming languages is taken. In the area of data science, there are two big contenders: R and Python. Now, why is this blog about R and not Python?
Continue reading “Why R for Data Science – and Not Python!”