Friday, February 10, 2017

Skills and agricultural productivity

The World Bank
Markus Goldstein



Do skills matter for agricultural productivity? Rachid Laajaj and Karen Macours have a fascinating new paper out which looks at this question. The paper is fundamentally about how to measure skills better, and they put a serious amount of work into that. But for those of you dying to know the answer – skills do matter, with cognitive, noncognitive, and technical skills explaining about 12.1 to 16.6 of the variation in yields. Before we delve into that, though, let’s talk (a lot) about measurement (if you are just interested in more on yields, skip down).

Laajaj and Macours are out to measure three realms of skills we might think matter for farming. So they set off for western Kenya to interview around 900 farmers. They are focusing on three sets of skills: cognitive, noncognitive and technical skills.
By cognitive skills they mean stuff like math and reading. To measure this, their basic tools are not only Raven’s matrices and digit span tests, but also some math tests and a reading comprehension test.

Noncognitive skills mean a lot of things to other people. Some of my colleagues who have thought about this a lot call them socio-emotional skills which is a more informative moniker (and a psychologist I work with objects to the word skills, since there are questions about how and when they are malleable). Laajaj and Macours are focusing on personality traits (using a subset of an instrument that is often used to measure the “Big Five”) and looking at things like self-esteem, tenacity, perceptions about the causes of poverty, attitudes towards change, tenacity, learning orientation, optimism, meta-cognitive ability, and locus of control. They measure a lot of these with a 5-point scale (agree vs disagree with a given statement) but also do some questions with visual aids. (They also put in some questions where the question causes the meaning of the scale to reverse, which will let them look for acquiescence bias – more on this below)

Good farmers know stuff about how to make crops grow. So Laajaj and Macours also try to tackle technical skills: agricultural knowledge and know how. They work with local agronomists to build questions that cover things such as when and how to use inputs, knowledge of practices such as spacing and conservation, and some other knowledge such as the active ingredients in different fertilizers. Here they use a mix of open-ended and multiple choice questions, some with visual aids.

And then Laajaj and Macours go to town to check reliability. The survey takes 2.5 hours to complete, so to check for fatigue (as well as to check for priming, etc) they randomize the order of the sections (cognitive, technical, and noncognitive). And they randomize the order of questions within sections and the order of possible answers in multiple choice questions.

One of their key tests for reliability is to go back to respondents (with a very high response rate) three weeks later and retest them on all of the measures. This is going to let them look at the stability of some of these measures (and as someone who has taken some of these noncognitive questionnaires, I wonder about my stability – 15 minutes later). But there’s more: Laajaj and Macours want to see how other’s perceptions line up with self-reported measures so they ask each respondent to report on themselves with a sort of aggregated-scale set of 14 questions and then ask them to report on another farmer. And then they ask another household member, and the community health worker to also report on these skills for that farmer.

Now that we’ve got all these measures of skills from the respondent, one easy thing to do is to use them all separately or add them up into a simple index and see if they matter for things like yield and adopting different practices/technologies. But there are things one can do to potentially make these raw measures work better for one’s estimates. These include exploratory factor analysis, correcting for acquiesce bias and item response theory (IRT). The first of these is frequently used in economics (e.g. by Heckman and coauthors in some of their work on noncognitive skills) but the other two are more commonly found in the psychometrics literature. These are all nicely explained in the paper, so I will skip that here.

OK, that’s a lot of background so let’s look at some results. First up: test-retest results (using the second visit responses). It turns out cognitive ability is pretty well correlated across tests, with the simple index coming in with a correlation of 0.84. The noncognitive and technical indices don’t fare so well, with cross-test correlations of 0.53 for noncognitive and 0.30 for technical. Laajaj and Macours then apply some of the ways to improve the indices (discussed in the preceding paragraph) and get the cross-test correlation for noncognitive skills up to 0.7, and 0.41 for technical skills. Conclusion here: using these tools can help boost the stability (reliability) of noncognitive measures but technical skills questions remain persistently noisy (more on this below).

Next, Laajaj and Macours turn to Cronbach’s alpha, which is increasing in the correlation between items and thus gives us some sense of how well a set of the sub-indicators measure the same construct. They note that a threshold value of 0.7 is often used. Their conclusion: “Cronbach’s alpha is above the bar for the cognitive skill construct, barely acceptable in the case of noncognitive, and substantially below the acceptable threshold in the case of the technical skills.” Interestingly, when they look at the improved indices relative to the simple indices, only the indices of the 6 noncognitive factors show a marked increase in Cronbach’s alpha.

So that’s an overview of ways to deal with data improvement using statistics, but Laajaj and Macours also paid a lot of attention to how the survey is implemented.

First, in developed countries, these surveys are often self-administered. But in a context of low literacy we are going to have rely on enumerators to ask them. When I think about these types of questions (and especially for noncognitive skills) I worry that the enumerator could nudge the respondent in one way or another. And indeed they do: enumerator fixed effects explain 5 percent of the variance in cognitive skills, 7 percent for technical skills and nine percent for noncognitive skills (and there is imperfect compliance with the random assignment of enumerators, so this is a lower bound). The results when they look at the test-retest correlation confirm that it’s important to take into account enumerator effects.

They also make use of the fact that order in which folks are interviewed is also (imperfectly) random. It turns out that the folks interviewed later in the survey round have significantly higher technical scores. Taken together these results suggest that a) it’s good to randomize enumerators (as much as possible), b) one should consider re-standardization of the application of these measures during the round, and c) self-administration may help, if it’s possible. One thing I wonder about is the level of education of the enumerators. Laajaj and Macours use university educated enumerators, as is common with many survey firms and not a few statistical agencies. In the first survey, I ever worked on, we used high school graduates as much as possible precisely because my collaborator was worried that university students would either fail to relate to the respondents (farmers mostly) and/or they would push them towards certain (normative) answers. Something to think about.

Second, some of these questions (particularly the noncognitive ones) are hard to understand. Laajaj and Macours try to unpack this by looking at the test-retest scores of farmers who are above versus below the median of the cognitive index. There is not a big difference in the cognitive index, not much clear difference in the noncognitive scores, but fairly large differences in technical scores. Adding these results to the enumerator effects, Laajaj and Macours point to the importance of extensive piloting (they did a pilot, combined with debriefs of respondents to see what they understood of the questions) plus super careful work on translating questions – particularly the tricky noncognitive concepts (e.g. active imagination or generating enthusiasm).

Third, this survey is long and these questions are taxing. Since Laajaj and Macours have randomized the order of the modules and the questions, they can see if fatigue kicks in. Surprisingly, it doesn’t. However, what they do find is that the test-retest correlation (a measure of its reliability) for noncongitive skills is highest when it comes last in the survey. The lesson here: save those weird and perturbing questions for last!

Fourth is the issue that certain types of folks may just want to agree more with everything, introducing a bias into the noncognitive questions. Indeed, the developed country evidence shows that this acquiescence bias is higher for lower income and less educated populations. Laajaj and Macours find evidence of the latter here: the acquiescence score for those with no education is twice as large as for those with 10 years of education. Interestingly, they also find that those with higher acquiescence scores have significantly lower yields, which leads me to wonder if this might actually be picking up another, unmeasured (at least not directly), personality trait. Their take away here: be sure to include questions that reverse the scales in the survey, and use these to correct answers.

Fifth, since the noncognitive questions rely heavily on the 5 point (Likert) scale, it could be that folks interpret this differently. While anchoring vignettes can help with this (see Mallory Montgomery’s recent post on the differential happiness of men and women for further insights into this), Laajaj and Macours turn to the reports of a respondent’s skills that they get from other folks. They find that while correlations between others’ reports of the respondent’s observable and objective skills (e.g. language and math) are good, the others are not so informative. Indeed, they find that for a bunch of measures, the correlation between two different people’s report on the same respondent is lower than the correlation of one person reporting on two different people. Ahh, the wonders of projection. One person who does seem to do a better job is the community health worker. These folks do a good job of lining up with respondents’ direct reports leading Laajaj and Macours to conclude that “some of the first order results can be obtained by asking 3 simple questions to a well-informed key-informant, instead of asking 2.5 hours of targeted skill questions and tests to the respondent. That said, clearly such proxy measures are not a good solution when one aims to obtain comparable skill measures across villages.”

Finally, Laajaj and Macours unpack the difficulty of asking technical questions. As discussed above, they did their homework on this, working with local agronomists. But they have to toss out some questions because there is too much variation. These are mostly questions with unambiguous correct answers. Their conclusion on the others: there is a lot of noise because the answers were often context (perhaps varying over time) specific. So these questions, if you are looking for variation, are likely to be hard.

So that’s the measurement part of things. How well do these measures do in predicting farming outcomes? Before delving into this, Laajaj and Macours make the important point that all three types of skills are strongly, positively correlated – and hence when thinking skills, it’s important to try and include all these dimensions.

Maize is the main crop in this area, so maize yields are a critical outcome. Now these data, as anyone who has used yield data knows, are noisy. To get rid of the noise, they use a four season average (there is a separate farm outcomes survey that goes along with this project) and then they use the average rank of the farmer’s yield rescaled from 1-100. Bottom line: all three types of skills matter and they explain a meaningful proportion of yields. What proportion? The simple indices explain 12.1 percent, and improving them (using factor analysis etc.) gets you to 14.1 percent. Using the average of the test and retest gets you about the same improvement and since doing some stats is far cheaper (and nicer) than doing a resurvey, it’s easy to conclude that one should use the improved indices (if they make sense for what you’re trying to examine). These results are fairly robust to the inclusion of controls and, indeed, one striking results is that cognitive skills remain significant even after controlling for education and literacy.

One concern Laajaj and Macours have is that noncognitive and cognitive skills may be working through technical knowledge. So they also run a specification where they drop the technical skills and find that the coefficients on both cognitive and nognitive skills increase and that they both seem to matter equally for productivity. A one standard deviation increase in either dimension of skills increases the average rank of maize yield by 4 to 5 percentage points.

When Laajaj and Macours try to break these results down further by looking at individual skills, they don’t find much. So while these skills seem to matter as a group, it’s hard to say which one in particular is important for yields. When they look at correlations of different skills with individual farming practices, they also don’t find much – with one interesting exception. It turns out weeding is significantly positively correlated with conscientiousness.

This is a fascinating paper, both for its myriad insights into measuring skills and for seeing how those skills matter for agriculture outcomes. I look forward to more work in this area, particularly with other samples from elsewhere and, hopefully, larger samples so we can better understand which of these skills matter and how.

No comments: