A Big and Embarrassing Challenge to DSGE Models

Dynamic stochastic general equilibrium (DSGE) models are the leading models in macroeconomics. The earlier DSGE models were Real Business Cycle models and they were criticized by Keynesian economists like Solow, Summers and Krugman because of their non-Keynesian assumptions and conclusions but as DSGE models incorporated more and more Keynesian elements this critique began to lose its bite and many young macroeconomists began to feel that the old guard just weren’t up to the new techniques. Critiques of the assumptions remain but the typical answer has been to change assumption and incorporate more realistic institutions into the model. Thus, most new work today is done using a variant of this type of model by macroeconomists of all political stripes and schools.

Now along comes two statisticians, Daniel J. McDonald and the acerbic Cosma Rohilla Shalizi. McDonald and Shalizi subject the now standard Smet-Wouters DSGE model to some very basic statistical tests. First, they simulate the model and then ask how well can the model predict its own simulation? That is, when we know the true model of the economy how well can the DSGE discover the true parameters? [The authors suggest such tests haven’t been done before but that doesn’t seem correct, e.g. Table 1 here. Updated, AT] Not well at all.

If we take our estimated model and simulate several centuries of data from it, all in the stationary regime, and then re-estimate the model from the simulation, the results are disturbing. Forecasting error remains dismal and shrinks very slowly with the size of the data. Much the same is true of parameter estimates, with the important exception that many of the parameter estimates seem to be stuck around values which differ from the ones used to generate the data. These ill-behaved parameters include not just shock variances and autocorrelations, but also the “deep” ones whose presence is supposed to distinguish a micro-founded DSGE from mere time-series analysis or reduced-form regressions. All this happens in simulations where the model specification is correct, where the parameters are constant, and where the estimation can make use of centuries of stationary data, far more than will ever be available for the actual macroeconomy.

Now that is bad enough but I suppose one might argue that this is telling us something important about the world. Maybe the model is fine, it’s just a sad fact that we can’t uncover the true parameters even when we know the true model. Maybe but it gets worse. Much worse.

McDonald and Shalizi then swap variables and feed the model wages as if it were output and consumption as if it were wages and so forth. Now this should surely distort the model completely and produce nonsense. Right?

If we randomly re-label the macroeconomic time series and feed them into the DSGE, the results are no more comforting. Much of the time we get a model which predicts the (permuted) data better than the model predicts the unpermuted data. Even if one disdains forecasting as end in itself, it is hard to see how this is at all compatible with a model capturing something — anything — essential about the structure of the economy. Perhaps even more disturbing, many of the parameters of the model are essentially unchanged under permutation, including “deep” parameters supposedly representing tastes, technologies and institutions.

Oh boy. Imagine if you were trying to predict the motion of the planets but you accidentally substituted the mass of Jupiter for Venus and discovered that your model predicted better than the one fed the correct data. I have nothing against these models in principle and I will be interested in what the macroeconomists have to say, as this isn’t my field, but I can’t see any reason why this should happen in a good model. Embarrassing.

Addendum: Note that the statistical failure of the DSGE models does not imply that the reduced-form, toy models that say Paul Krugman favors are any better than DSGE in terms of “forecasting” or “predictions”–the two classes of models simply don’t compete on that level–but it does imply that the greater “rigor” of the DSGE models isn’t buying us anything and the rigor may be impeding understanding–rigor mortis as we used to say.

Addendum 2: Note that I said challenge. It goes without saying but I will say it anyway, the authors could have made mistakes. It should be easy to test these strategies in other DSGE models.