Thank you both for sharing your code and for the excellent tutorial using this model. I applied the model to data for the US from 15 Feb 2020 through today (5 Apr 2020). Comparing the results from 3 days ago to the data through today, I was encouraged to see a slight departure from the expected logarithmic growth in number of infected cases, hopefully suggesting a slowing. However, the model seems to predict the peak coming around April 30, over 24 million infections at the peak, and nearly 488,000 deaths, which seems to be a good bit more than the worst CDC projections. Could this model be overpredicting these outcomes?

Many thanks!

]]>I don’t have much experience with `lintr` but I do use styler to format my code. The default style gets me > 90% there and I then manually tweak results to get the desired format. Since my scripts usually are small, it seems like too much effort to create a custom style given costs and benefits.

]]>Thanks a lot for this clear and inspiring entry.

I have already applied your code to monitor and forecast the evolution of the pandemic in Madrid (Spain).

In the process, I modified some of the charts you use to use ggplot way of doing that.

This is your code with the changes (please note that it is needed packages data.table and patchwork).

#-------------- #-------- Library loading suppressPackageStartupMessages({ library(data.table) library(patchwork) library(deSolve) }) Infected <- c(45, 62, 121, 198, 291, 440, 571, 830, 1287, 1975, 2744, 4515, 5974, 7711, 9692, 11791, 14380, 17205, 20440) Day <- 1:(length(Infected)) N <- 1400000000 # population of mainland china # old <- par(mfrow = c(1, 2)) # plot(Day, Infected, type ="b") # plot(Day, Infected, log = "y") # abline(lm(log10(Infected) ~ Day)) # title("Confirmed Cases 2019-nCoV Madrid", outer = TRUE, line = -2) DayInfec_df <- data.frame( Day = Day, Infected = Infected, logInfec = log10(Infected) ) gr_a <- ggplot(DayInfec_df, aes( x = Day, y = Infected)) + geom_line( colour = 'grey', alpha = 0.25) + geom_point(size = 2) + ggtitle("Cases - China") + ylab("# Infected") + xlab("Day") + theme_bw() gr_b <- ggplot(DayInfec_df, aes( x = Day, y = logInfec)) + geom_point(size = 2) + geom_smooth(method = lm, se = FALSE) + ggtitle("Cases (log) - China") + ylab("# Infected") + xlab("Day") + theme_bw() gr_a + gr_b + plot_annotation(title = 'Real Cases - Cov-19 - China', theme = theme(plot.title = element_text(size = 18))) SIR <- function(time, state, parameters) { par <- as.list(c(state, parameters)) with(par, { dS <- -beta/N * I * S dI <- beta/N * I * S - gamma * I dR <- gamma * I list(c(dS, dI, dR)) }) } init <- c(S = N - Infected[1], I = Infected[1], R = 0) RSS <- function(parameters) { names(parameters) <- c("beta", "gamma") out <- ode(y = init, times = Day, func = SIR, parms = parameters) fit <- out[ , 3] sum((Infected - fit)^2) } Opt <- optim(c(0.5, 0.5), RSS, method = "L-BFGS-B", lower = c(0, 0), upper = c(1, 1)) # optimize with some sensible conditions Opt$message ## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH" # [1] "ERROR: ABNORMAL_TERMINATION_IN_LNSRCH"..... :-( Opt_par <- setNames(Opt$par, c("beta", "gamma")) Opt_par # beta gamma # 1.0000000 0.7127451 t <- 1:70 # time in days fit <- data.frame(ode(y = init, times = t, func = SIR, parms = Opt_par)) col <- 1:3 # colour # matplot(fit$time, fit[ , 2:4], type = "l", xlab = "Day", ylab = "Number of subjects", lwd = 2, lty = 1, col = col) # matplot(fit$time, fit[ , 2:4], type = "l", xlab = "Day", ylab = "Number of subjects", lwd = 2, lty = 1, col = col, log = "y") # ## Warning in xy.coords(x, y, xlabel, ylabel, log = log): 1 y value <= 0 # ## omitted from logarithmic plot # # points(Day, Infected) # legend("bottomright", c("Susceptibles", "Infecteds", "Recovereds"), lty = 1, lwd = 2, col = col, inset = 0.05) # par(old) max_infe <- fit[fit$I == max(fit$I), "I", drop = FALSE] val_max <- data.frame(Day = as.numeric(rownames(max_infe)) , Value = max_infe$I ) val_lab <- paste('Infected_Peak:\n', as.character(round(val_max$Value),0)) dea_lab <- paste('Deaths (2%):\n', as.character(round(val_max$Value * 0.02,0)) ) fit_dt <- as.data.table(fit) fit_tidy <- melt(fit_dt, id.vars = c('time') , measure.vars = c('S', 'I','R') ) fit_gd <- merge(fit_dt, DayInfec_df[, c(1:2)], by.x = c('time'), by.y = c('Day'), all.x = TRUE) fit_all <- melt(as.data.table(fit_gd), id.vars = c('time') , measure.vars = c('S', 'I','R', 'Infected') ) fit_all[ , Type := ifelse( variable != 'Infected', 'Simul', 'Real')] gr_left <- ggplot(fit_tidy, aes( x = time, y = value, group = variable, color = variable) ) + geom_line( aes(colour = variable)) + ylab("# People") + xlab("Day") + guides(color = FALSE) + theme_bw() gr_left gr_right <- ggplot(fit_all, aes( x = time, y = value, group = variable, color = variable, linetype = Type) ) + geom_line( aes(colour = variable)) + # geom_point(aes(x = val_max$Day, y = val_max$Value)) + scale_y_log10() + ylab("# People (log)") + xlab("Day") + guides(color = guide_legend(title = "")) + scale_color_manual(labels = c('Susceptibles', 'Infecteds', 'Recovered','Real_Cases'), values = c('red', 'green', 'blue', 'black') ) + theme_bw() + theme(legend.position = "right", legend.text=element_text(size = 6)) + annotate( 'text', x = val_max$Day, y = val_max$Value, size = 2.5, label = val_lab) + annotate( 'text', x = val_max$Day, y = 0.05*val_max$Value, size = 2.5, label = dea_lab) gr_right gr_left + gr_right + plot_annotation(title = 'China - Projection Evolution Cov-19 (SIR Model)', theme = theme(plot.title = element_text(size = 18))) #--------------

Hope that it helps .

Kind Regards,

Carlos Ortega

`drop`

argument you have a point (I also wasted hours of debugging by overlooking this small detail!).
The question is if the remedy should be something that completely changes the *character* of the language and creates structurally incompatible code. This is what worries me.

Concerning linters: do you know the lintr package? I personally haven’t used it and would be interested in your opinion.

]]>When I started learning R in 2015, I used base R but once I learnt about dplyr and other tidyverse packages, I switched to using them primarily as I **personally** find them easier to use. Base R has a lot of gotchas and inconsistencies, e.g. `df[, "a"]`

returns a vector while `df[, c("a", "b")]`

returns a data.frame. I know there is a `drop`

argument but it’s not obvious.

I wish the core team would clean up some of the warts in the language which they seem to be doing gradually, e.g. changing default value of `stringsAsFactors`

to `FALSE`

in R 4.0.

It would also be really neat if R had type hints like Python got in Python 3.5 as it would help document functions and linters could detect possible type errors.

]]>(I actually have to check again whether I did not make a mistake there, I might have made an error with a minus sign).

But different models might be good approximations. There are several that have used a logistic function.

]]>it seems that the source https://interaktiv.morgenpost.de/corona-virus-karte-infektionen-deutschland-weltweit/data/Coronavirus.history.v2.csv is not maintained any more. Until yesterday it was updated every night, but that has stopped. Do you happen to know if there is a new file available, and if so: can you share the file name ?

thanks

Reinhard