Backtest Trading Strategies like a real Quant


R is one of the best choices when it comes to quantitative finance. Here we will show you how to load financial data, plot charts and give you a step-by-step template to backtest trading strategies. So, read on…

We begin by just plotting a chart of the Standard & Poor’s 500 (S&P 500), an index of the 500 biggest companies in the US. To get the index data and plot the chart we use the powerful quantmod package (on CRAN). After that we add two popular indicators, the simple moving average (SMI) and the exponential moving average (EMA).

Have a look at the code:

library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.

getSymbols("^GSPC", from = "2000-01-01")
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## [1] "^GSPC"

head(GSPC)
##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume
## 2000-01-03   1469.25   1478.00  1438.36    1455.22   931800000
## 2000-01-04   1455.22   1455.22  1397.43    1399.42  1009000000
## 2000-01-05   1399.42   1413.27  1377.68    1402.11  1085500000
## 2000-01-06   1402.11   1411.90  1392.10    1403.45  1092300000
## 2000-01-07   1403.45   1441.47  1400.73    1441.47  1225200000
## 2000-01-10   1441.47   1464.36  1441.47    1457.60  1064800000
##            GSPC.Adjusted
## 2000-01-03       1455.22
## 2000-01-04       1399.42
## 2000-01-05       1402.11
## 2000-01-06       1403.45
## 2000-01-07       1441.47
## 2000-01-10       1457.60

tail(GSPC)
##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume
## 2019-04-24   2934.00   2936.83  2926.05    2927.25  3448960000
## 2019-04-25   2928.99   2933.10  2912.84    2926.17  3425280000
## 2019-04-26   2925.81   2939.88  2917.56    2939.88  3248500000
## 2019-04-29   2940.58   2949.52  2939.35    2943.03  3118780000
## 2019-04-30   2937.14   2948.22  2924.11    2945.83  3919330000
## 2019-05-01   2952.33   2954.13  2923.36    2923.73  3645850000
##            GSPC.Adjusted
## 2019-04-24       2927.25
## 2019-04-25       2926.17
## 2019-04-26       2939.88
## 2019-04-29       2943.03
## 2019-04-30       2945.83
## 2019-05-01       2923.73

chartSeries(GSPC, theme = chartTheme("white"), subset = "last 10 months", show.grid = TRUE)

addSMA(20)

addEMA(20)

As you can see the moving averages are basically smoothed out versions of the original data shifted by the given number of days. While with the SMA (red curve) all days are weighted equally with the EMA (blue curve) the more recent days are weighted stronger, so that the resulting indicator detects changes quicker. The idea is that by using those indicators investors might be able to detect longer term trends and act accordingly. For example a trading rule could be to buy the index whenever it crosses the MA from below and sell when it goes the other direction. Judge for yourself if this could have worked.

Well, having said that it might not be that easy to find out the profitability of certain trading rules just by staring at a chart. We are looking for something more systematic! We would need a decent backtest! This can of course also be done with R, a great choice is the PerformanceAnalytics package (on CRAN).

To backtest a trading strategy I provide you with a step-by-step template:

  1. Load libraries and data
  2. Create your indicator
  3. Use indicator to create equity curve
  4. Evaluate strategy performance

As an example we want to test the idea that it might be profitable to sell the index when the financial markets exhibit significant stress. Interestingly enough “stress” can be measured by certain indicators that are freely available. One of them is the National Financial Conditions Index (NFCI) of the Federal Reserve Bank of Chicago (https://www.chicagofed.org/publications/nfci/index):

The Chicago Fed’s National Financial Conditions Index (NFCI) provides a comprehensive weekly update on U.S. financial conditions in money markets, debt and equity markets and the traditional and “shadow” banking systems. […] The NFCI [is] constructed to have an average value of zero and a standard deviation of one over a sample period extending back to 1971. Positive values of the NFCI have been historically associated with tighter-than-average financial conditions, while negative values have been historically associated with looser-than-average financial conditions.

To make it more concrete we want to create a buy signal when the index is above one standard deviation in negative territory and a sell signal otherwise.

Have a look at the following code:

# Step 1: Load libraries and data
library(quantmod)
library(PerformanceAnalytics)
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend

getSymbols('NFCI', src = 'FRED', , from = '2000-01-01')
## [1] "NFCI"

NFCI <- na.omit(lag(NFCI)) # we can only act on the signal after release, i.e. the next day
getSymbols("^GSPC", from = '2000-01-01')
## [1] "^GSPC"

data <- na.omit(merge(NFCI, GSPC)) # merge before (!) calculating returns)
data$GSPC <- na.omit(ROC(Cl(GSPC))) # calculate returns of closing prices

# Step 2: Create your indicator
data$sig <- ifelse(data$NFCI < 1, 1, 0)
data$sig <- na.locf(data$sig)

# Step 3: Use indicator to create equity curve
perf <- na.omit(merge(data$sig * data$GSPC, data$GSPC))
colnames(perf) <- c("Stress-based strategy", "SP500")

# Step 4: Evaluate strategy performance
table.DownsideRisk(perf)
##                               Stress-based strategy   SP500
## Semi Deviation                               0.0075  0.0087
## Gain Deviation                               0.0071  0.0085
## Loss Deviation                               0.0079  0.0095
## Downside Deviation (MAR=210%)                0.0125  0.0135
## Downside Deviation (Rf=0%)                   0.0074  0.0087
## Downside Deviation (0%)                      0.0074  0.0087
## Maximum Drawdown                             0.5243  0.6433
## Historical VaR (95%)                        -0.0173 -0.0188
## Historical ES (95%)                         -0.0250 -0.0293
## Modified VaR (95%)                          -0.0166 -0.0182
## Modified ES (95%)                           -0.0268 -0.0311

table.Stats(perf)
##                 Stress-based strategy     SP500
## Observations                4858.0000 4858.0000
## NAs                            0.0000    0.0000
## Minimum                       -0.0690   -0.0947
## Quartile 1                    -0.0042   -0.0048
## Median                         0.0003    0.0005
## Arithmetic Mean                0.0002    0.0002
## Geometric Mean                 0.0002    0.0001
## Quartile 3                     0.0053    0.0057
## Maximum                        0.0557    0.1096
## SE Mean                        0.0001    0.0002
## LCL Mean (0.95)               -0.0001   -0.0002
## UCL Mean (0.95)                0.0005    0.0005
## Variance                       0.0001    0.0001
## Stdev                          0.0103    0.0120
## Skewness                      -0.1881   -0.2144
## Kurtosis                       3.4430    8.5837

charts.PerformanceSummary(perf)

chart.RelativePerformance(perf[ , 1], perf[ , 2])

chart.RiskReturnScatter(perf)

The first chart shows that the stress-based strategy (black curve) clearly outperformed its benchmark, the S&P 500 (red curve). This can also be seen in the second chart, showing the relative performance. In the third chart we see that both return (more) and (!) risk (less) of our backtested strategy are more favourable compared to the benchmark.

So, all in all this seems to be a viable strategy! But of course before investing real money many more tests have to be performed! You can use this framework for backtesting your own ideas.

Here is not the place to explain all of the above tables and plots but as you can see both packages are very, very powerful and I have only shown you a small fraction of their capabilities. To use their full potential you should have a look at the extensive documentation that comes with it on CRAN.

Disclaimer:
This is no investment advice! No responsibility is taken whatsoever if you lose money!

If you gain money though I would be happy if you could buy me a coffee… that is not too much to ask, is it? 😉

The Rich didn’t earn their Wealth, they just got Lucky


Tomorrow, on the First of May, many countries celebrate the so called International Workers’ Day (or Labour Day): time to talk about the unequal distribution of wealth again!

A few months ago I posted a piece with the title “If wealth had anything to do with intelligence…” where I argued that ability, e.g. intelligence, as an input has nothing to do with wealth as an output. It drew a lot of criticism (as expected), most of it unfounded in my opinion but one piece merits some discussion: the fact that the intelligence quotient (IQ) is normally distributed by construction. The argument goes that intelligence per se may be a distribution with fat tails too but by the way the IQ is constructed the metric is being transformed into a well formed gaussian distribution. To a degree this is certainly true, yet I would still argue that the distribution of intelligence and all other human abilities are far more well behaved than the extremely unequal distribution of wealth. I wrote in a comment:

There are many aspects in your comment that are certainly true. Obviously there are huge problems in measuring “true” mental abilities, which is the exact reason why people came up with a somewhat artificial “intelligence quotient” with all its shortcomings.

What would be interesting to see is (and I don’t know if you perhaps have a source about this) what the outcome of an intelligence test would look like without the “quotient” part, i.e. without subsequently normalizing the results.

I guess the relationship wouldn’t be strictly linear but it wouldn’t be as extreme as the wealth distribution either.

What I think is true in any case, independent of the distributions, is when you rank all people by intelligence and by wealth respectively you wouldn’t really see any stable connection – and that spirit was the intention of my post in the first place and I still stand by it, although some of the technicalities are obviously debatable.

So, if you have a source, Dear Reader, you are more than welcome to share it in the comments – I am always eager to learn!

I ended my post with:

But if it is not ability/intelligence that determines the distribution of wealth what else could account for the extreme inequality we perceive in the world?

In this post I will argue that luck is a good candidate, so read on…

In 2014 there was a special issue of the renowned magazine Science titled “The science of inequality”. In one of the articles (Cho, A.: “Physicists say it’s simple”) the following thought experiment is being proposed:

Suppose you randomly divide 500 million in income among 10,000 people. There’s only one way to give everyone an equal, 50,000 share. So if you’re doling out earnings randomly, equality is extremely unlikely. But there are countless ways to give a few people a lot of cash and many people a little or nothing. In fact, given all the ways you could divvy out income, most of them produce an exponential distribution of income.

So, the basic idea is to randomly throw 9,999 darts at a scale ranging from zero to 500 million and study the resulting distribution of intervals:

library(MASS)

w <- 5e8 # wealth
p <- 1e4 # no. of people

set.seed(123)
d <- diff(c(0, sort(runif(p-1, max = w)), w)) # wealth distribution
h <- hist(d, col = "red", main = "Exponential decline", freq = FALSE, breaks = 45, xlim = c(0, quantile(d, 0.99)))

fit <- fitdistr(d, "exponential")
curve(dexp(x, rate = fit$estimate), col = "black", type = "p", pch = 16, add = TRUE)

The resulting distribution fits an exponential distribution very well. You can read some interesting discussions concerning this result on CrossValidated StackExchange: How can I analytically prove that randomly dividing an amount results in an exponential distribution (of e.g. income and wealth)?

Just to give you an idea of how unfair this distribution is: the richest six persons have more wealth than the poorest ten percent together:

sum(sort(d)[9994:10000]) - sum(sort(d)[0:1000])
## [1] 183670.8

If you think that this is ridiculous just look at the real global wealth distribution: here it is not six but three persons who own more than the poorest ten percent!

Now, what does that mean? Well, equality seems to be the exception and (extreme) inequality the rule. The intervals were found randomly, no interval had any special skills, just luck – and the result is (extreme) inequality – as in the real world!

If you can reproduce the wealth distribution of a society stochastically this could have the implication that it weren’t so much the extraordinary skills of the rich which made them rich but they just got lucky.

Some rich people are decent enough to admit this. In his impressive essay “Why Poverty Is Like a Disease” Christian H. Cooper, a hillbilly turned investment banker writes:

So how did I get out? By chance.

It’s easy to attach a post-facto narrative of talent and hard work to my story, because that’s what we’re fed by everything from Hollywood to political stump speeches. But it’s the wrong story. My escape was made up of a series of incredibly unlikely events, none of which I had real control over.

[…]

I am the exception that proves the rule—but that rule is that escape from poverty is a matter of chance, and not a matter of merit.

A consequence would be that you cannot really learn much from the rich. So throw away all of your self help books on how to become successful. I will end with a cartoon, which brings home this message, on a closely related concept, the so called survivorship bias (which is also important to keep in mind when backtesting trading strategies in quantitative finance, the topic of an upcoming post… so stay tuned!):

Source: xkcd.com/1827

Inverse Statistics – and how to create Gain-Loss Asymmetry plots in R

Asset returns have certain statistical properties, also called stylized facts. Important ones are:

  • Absence of autocorrelation: basically the direction of the return of one day doesn’t tell you anything useful about the direction of the next day.
  • Fat tails: returns are not normal, i.e. there are many more extreme events than there would be if returns were normal.
  • Volatility clustering: basically financial markets exhibit high-volatility and low-volatility regimes.
  • Leverage effect: high-volatility regimes tend to coincide with falling prices and vice versa.

A good introduction and overview can be found in R. Cont: Empirical properties of asset returns: stylized facts and statistical issues.

One especially fascinating statistical property is the so called gain-loss asymmetry: it basically states that upward movements tend to take a lot longer than downward movements which often come in the form of sudden hefty crashes. So, an abstract illustration of this property would be a sawtooth pattern:

Source: Wikimedia

The same effect in real life:

suppressWarnings(suppressMessages(library(quantmod)))
suppressWarnings(suppressMessages(getSymbols("^GSPC", from = "1950-01-01")))
## [1] "GSPC"
plot.zoo(GSPC$GSPC.Close, xlim = c(as.Date("2000-01-01"), as.Date("2013-01-01")), ylim = c(600, 1700), ylab ="", main ="S&P from 2000 to 2013")

The practical implication for your investment horizon is that your losses often come much faster than your gains (life is just not fair…). To illustrate this authors often plot the investment horizon distribution. It illustrates how long you have to wait for a certain target return, negative as well as positive (for some examples see e.g. here, also the source of the following plot):

This is closely related to what statisticians call first passage time: when is a given threshold passed for the first time? To perform such an analysis you need something called inverse statistics. Normally you would plot the distribution of returns given a fixed time window (= forward statistics). Here we do it the other way around: you fix the return and want to find the shortest waiting time needed to obtain at least the respective return. To achieve that you have to test all possible time windows which can be quite time consuming.

Because I wanted to reproduce those plots I tried to find some code somewhere… to no avail. I then contacted some of the authors of the respective papers… no answer. I finally asked a question on Quantitative Finance StackExchange… and got no satisfying answer either. I therefore wrote the code myself and thereby answered my own question:

inv_stat <- function(symbol, name, target = 0.05) {
  p <- coredata(Cl(symbol))
  end <- length(p)
  days_n <- days_p <- integer(end)
  
  # go through all days and look when target is reached the first time from there
  for (d in 1:end) {
    ret <- cumsum(as.numeric(na.omit(ROC(p[d:end]))))
    cond_n <- ret < -target
    cond_p <- ret > target
    suppressWarnings(days_n[d] <- min(which(cond_n)))
    suppressWarnings(days_p[d] <- min(which(cond_p)))
  }
  
  days_n_norm <- prop.table(as.integer(table(days_n, exclude = "Inf")))
  days_p_norm <- prop.table(as.integer(table(days_p, exclude = "Inf")))
  
  plot(days_n_norm, log = "x", xlim = c(1, 1000), main = paste0(name, " gain-/loss-asymmetry with target ", target), xlab = "days", ylab = "density", col = "red")
  points(days_p_norm, col = "blue")
  
  c(which.max(days_n_norm), which.max(days_p_norm)) # mode of days to obtain (at least) neg. and pos. target return
}

inv_stat(GSPC, name = "S&P 500")

## [1] 10 24

So, here you see that for the S&P 500 since 1950 the mode (peak) of the days to obtain a loss of at least 5% has been 10 days and a gain of the same size 24 days! That is the gain-loss asymmetry in action!

Still two things are missing in the code:

  • Detrending of the time series.
  • Fitting a probability distribution (the generalized gamma distribution seems to work well).

If you want to add them or if you have ideas how to improve the code, please let me know in the comments! Thank you and stay tuned!