The World Cup forecast: let's go all the way!

This is (maybe) the final post in the series dedicated to the prediction of the World Cup results − I'll try and actually write another to wrap things up and summarise a few comments, but this will probably be a bit later on. Finally, we've decided to use our model, which so far has been applied incrementally, ie stage-by-stage, to predict the result of both the semifinals and the finals.

The first part is relatively straightforward - we now know the results from the quarter-finals. Thus, we can re-iterate the procedure and i) update the data with the observed results; ii) update the 'current form' variable and the offset; iii) re-run the model to estimate each team's propensity to score; iv) predict the result of the unobserved games − in this case the two semi-finals (Brazil  vs Germany and Argentina vs Netherlands).

However, to give the model a nice twist, I thought we should include some piece of extra information that is available right now, ie the fact that Brazil will, for certain, play their semi-final without their suspended captain Thiago Silva and their injured 'star player' Neymar (who will also miss the final, due to the gravity of his injury). Thus, we ran the model by modifying the offset variable for Brazil, to slightly decrease their 'short-term' quality.

[NB: if this were a 'serious' model, we would probably try to embed these changes in a more formal way, rather than as 'ad hoc' modifications to the general set up. Nevertheless, I believe that the possibility of dealing with additional information, possibly in the form of subjective/expert knowledge, is actually a strength of the modelling framework. Of course, you could say that the selection of the offset distribution is arbitrary and other possibilities were possible − that's of course true and a 'serious' model would certainly require more extensive sensitivity analysis at this stage!]

Using this formulation of the model, we get the following results, in terms of the overall probability of going through to the final (i.e. accounting for potential draws in the 90 minutes and then extra times and possibly penalties):

Brazil 0.605 Germany 0.395
Argentina 0.510 Netherlands 0.490

So, the second semi-final is predicted to be much tighter (nearly 50:50), while Brazil are still favourites to reach the final, according to the model prediction.

As I said earlier, however, this time we've gone beyond the simple one-step prediction and have used these results to also re-run the model before the actual results of the semi-finals are known and thus predict the overall outcome - so who's going to win the World Cup?

Overall, our estimation gives the following probabilities of winning the tournament (these may not sum to 1 because of rounding):

Brazil 0.372
Germany 0.174
Argentina 0.245
Netherlands 0.206

Of course, these probabilities encode extra uncertainty, because we're going one extra step forward in the future − we don't know which of the potential futures will occur for the semi-finals. Leaving the model aside), I think I would probably like the Netherlands to win − if only for the fact that Italy would still be the second most frequent World Cup winners, only one title behind Brazil, and one and two above Germany and Argentina, respectively.

 

This article first appeared on Gianluca Baio’s personal blog.