Making Inferences about Tropics, Germs, and Crops

Posted by Dietrich Vollrath on February 07, 2016 · 13 mins read

One of the long-running arguments in growth is “geography versus institutions”. A lot of ink and a tiny amount of computing power was thrown at this question. The early stages of this involved a lot of cross-country regressions that attempted to figure out empirically whether measures of institutions or geography had explanatory power for GDP per capita. (By the way, I’m talking here just about the economic growth literature’s take on this. The general question goes back centuries.)

A tweet by Garett Jones recently reminded me of one of the entries in this literature, Bill Easterly and Ross Levine’s “Tropics, Germs, and Crops” paper. EL want to assess the role of geography in explaining cross-country incomes per capita. Their main questions are summed up nicely in the abstract.

Does economic development depend on geographic endowments like temperate instead of tropical location, the ecological conditions shaping diseases, or an environment good for grains or certain cash crops? Or do these endowments of tropics, germs, and crops affect economic development only through institutions or policies?

Their conclusion is that geography has no effect, other than through its relationship to institutions. This conclusion, though, doesn’t follow from the empirical tests they run. Let’s run through the empirics on these questions, and see how to answer them.

I’m going to do this with precisely the dataset that EL use. I don’t recall when or where I picked up the data, but I believe it was from Easterly’s old website, where he had a nice set of links to datasets. Regardless, it’s the right dataset because I can perfectly replicate their results. You can download my version here and play around with it yourself.

First we have to establish that geography has a relationship to income per capita in the first place. If it doesn’t, the answers are clear. EL’s three primary measures of geography: (log) settler mortality, latitude, and a dummy for being landlocked.

All these are questionable as measures of “geography”, and they are certainly incomplete. But for the purposes of this exercise, we’ll stick with them. Settler mortality is from Acemoglu, et al, and is intended to capture the nature of the disease environment (of Europeans, at least). Latitude is supposed to capture whether the country is tropical or not, and landlocked is there to account roughly for access to shipping. The “Crops” of the title of the paper could be added by including a series of dummies for producing different crops, but including those doesn’t really change the point of this post, so I’m going to skip them for brevity’s sake.

Regress log income per capita on the three geography measures. You get

------------------------------------
                              .   
Dep var: GDP p.c.            b/se   
------------------------------------
Log Settler Mortality      -0.610***
                          (0.111)   
Landlocked                 -0.711   
                          (0.359)   
Absolute Latitutde          2.377*  
                          (1.027)   
Constant                    9.695***
                          (0.612)   
------------------------------------
R-squared                   0.509   
N                              72   
------------------------------------
* p<0.05, ** p<0.01, *** p<0.001

A quick apology that the results aren’t formatted better. That aside, in case you aren’t familiar, the first number listed for each variable is the estimated effect (e.g. -0.610 for settler mortality), and the number in parentheses is the standard error of that estimate. The stars show you whether the estimated effect is statistically significant. So settler mortality is significant at less than 1/10th of 1 percent.

Anyway, we see a clear effect of two measures of geography: settler mortality and latitude. The R-squared for this regression is 50.9%, meaning half of the variation in GDP per capita is associated with variation in “geography”. At a minimum, it suggests that geography could explain varation in incomes. But of course we don’t know if this acts only through institutions, or is a direct of geography.

Is there an indepdent effect of geography outside of institutions?

That’s our second question. This has a clear empirical set-up. Regress GDP per capita on geography and a measure of institutions. Look at the estimated effect of geography. Is it significant, even with institutions in the regression?

This is entirely straightforward. The whole point of regressions is to do this kind of thing: separate out the effect of one variable from another. I ran that regression, using the EL measure of institutions, which is pulled from Kaufman, Kraay, and Zoido-Lobaton. This is an averaged index of 6 different measures of “institutions”. I’ve made my views on the silliness of these indices before, so I won’t belabor the point here. We’ll just accept it as is for this post.

Here’s the result:

------------------------------------
                            .   
Dep var: GDP p.c.            b/se   
------------------------------------
Log Settler Mortal~y       -0.302** 
                      (0.102)   
Landlocked                 -0.530   
                      (0.291)   
Absolute Latitutde          0.698   
                      (0.870)   
Institutions                1.163***
                      (0.189)   
Constant                    8.639***
                      (0.521)   
------------------------------------
R-squared                   0.682   
N                              72   
------------------------------------
* p<0.05, ** p<0.01, *** p<0.001

You can see that institutions are indeed significant (three stars!!). But so is settler mortality (at 1%), and landlocked is actually significant at 10% (but no stars for you!). “Geography matters” here, even after you control for institutions.

The effect size of geography is definitely smaller than before. Look at the coefficient on settler mortality, which dropped from -0.610 to -0.302. This is showing that some of the effect of mortality on GDP per capita in my first regression was actually just picking up the effect of institutions (and by implication, mortality and institutions are negatively related, as AJR would suggest). But what the regression also shows is that the remaining effect of mortality is still statistically significant. Yes, there is an independent effect of geography on GDP per capita, even controlling for geography’s correlation with institutions.

Whether the estimate of “Institutions” here is in fact a causal estimate is a different question. We might be concerned that institutions are endogenous, either because GDP per capita improves institutions, or because we’ve left out some other control variable that institutions is picking up on (culture, etc..). And if the estimate of institutions is wrong, then the conclusion about the geography variables could be wrong too.

So…..

Does geography still matter even if institutions are instrumented?

Because of this concern about institutions being endogenous, we’d like to find some pure exogenous variation in institutions to use in our empirical analysis. In other words, we’d like to find another variable that both (a) (imperfectly) predicts the level of institutions and (b) is unrelated to the problematic unobserved variation in GDP per capita. We want an instrument.

If we had such an instrument (IV), the process here is clear. We’d regress institutions on that IV and get the fitted values, which would represent variation in institutions that is by assumption unrelated to the problematic unobserved variation in GDP per capita. Regress GDP per capita on this fitted value of institutions, and et voila, we’ve got a true estimate of the effect of institutions. We could run the regression with the geography variables included as well, and see if geography matters for GDP per capita once we control for instrumented institutions.

The problem is we need an instrument, and this is where EL run into a problem. The instrument for institutions that they propose to use is exactly the measures of geography: settler mortality, landlocked, and latitude. Which means they assume that these three measures have no relationship to GDP per capita outside of their influence on institutions.

They answered their question by assuming the answer was no. There is no getting around the fact that condition (b) above is an assumption. There is literally nothing you can do to test if it is true or not. To use settler mortality, landlocked, and latitude as IV’s, you are, without fail, assuming they have no independent effect on GDP per capita.

But what about this over-identification (OID) test that EL use? Isn’t that evidence that their assumption is correct, and in fact geography has no effect on GDP per capita outside of institutions?

No.

Here’s why. The OID test has the following null hypothesis: settler mortality, landlocked, and latitude are jointly unrelated to unobserved variation in GDP per capita. If you find enough evidence to reject this null, then you have evidence that one or more of your instruments are related to the unobserved variation in GDP per capita. In other words, your assumption is wrong.

But EL do not reject the null. EL report that they fail to reject the null.

Repeat after me. Failure to reject the null does not mean the null is true. Failure to reject the null does not mean the null is true. Failure to reject the null does not mean the null is true. Failure to reject the null does not mean the null is true. Failure to reject the null does not mean the null is true. Failure to reject the null does not mean the null is true. Failure to reject the null does not mean the null is true. Failure to reject the null does not mean the null is true.

This is not evidence that geography does not matter. It’s not. It’s not even kinda-sorta evidence, and well, you know, it probably means that geography doesn’t matter. That’s not how statistics work. There is no evidence, so there is no conclusion to draw. A failure to reject is an absence of information, so you cannot infer anything from it.

So geography totally matters, right?

Maybe. We don’t know. We haven’t solved the problem we had in the 2nd regression I ran. Our estimated effect of institutions might be biased.

What is true is that if the true effect of institutions were larger (say a coefficient of 2.22 versus 1.16), then the estimated effects of geography would end up statistically insignificant. How do I know that? Because I can run a regression where I force the coefficient on institutions to be 2.22, and see what happens to the geography variables. Here you go:

------------------------------------
                                .   
Dep var: GDP p.c.            b/se   
------------------------------------
Institutions                2.220   
                              (.)   
Log Settler Mortal~y       -0.022   
                          (0.107)   
Landlocked                 -0.366   
                          (0.348)   
Absolute Latitutde         -0.828   
                          (0.994)   
Constant                    7.681***
                          (0.592)   
------------------------------------
R-squared                           
N                              72   
------------------------------------
* p<0.05, ** p<0.01, *** p<0.001

You can see that all the geography variables are not statistically significant (no stars!). Why does this happen? Because institutions are now explaining so much, there really isn’t anything “left over” for geography to explain.

2.22 is the value that EL actually find for the effect of institutions. But how do they get that? By running an IV regression that assumes geography has not effect. It’s circular. If they assume that geography doesn’t matter, then they find a big effect of institutions. And then they use this big effect of institutions to conclude that geography doesn’t matter.

Take the hypothesis: “Geography does not matter outside of its influence on institutions.”” I can’t prove that hypothesis wrong, certainly not with this data. But using the OID test from the El regressions doesn’t prove that this statement is right.