Are we getting smarter?

Have you ever heard of the Flynn effect? We asked the experts Jakob Pietschnig and Marco Vetter what the Flynn Effect is all about and why it is also relevant for psychological assessment in practice.


Have you ever heard of the Flynn effect? Maybe in your psychology studies or from articles and books about intelligence? We asked the experts Jakob Pietschnig (Professor and Head of the Department of Differential Psychology and Psychological Assessment, University of Vienna) and Marco Vetter (Chief Psychology Officer, SCHUHFRIED) what the Flynn Effect is all about and why it is also relevant for psychological assessment in practice.


SCHUHFRIED (SF): How would you explain the Flynn effect to a layperson?

Jakob Pietschnig (JP): These are positive changes in test results in intelligence tests in the population over time. Whether this is necessarily a change in population intelligence per se, i.e., the change in all cognitive abilities that make up our intelligence, I would leave unanswered.

SF: Over what period and to what extent was this increase?

JP: Actually, we've been seeing these changes in ability since formal ability tests have been in place, that is, since the beginning of the 20th century. This change was in the positive direction at least until the 1980s. It has affected fluid intelligence [see glossary at end of text] more than crystallized intelligence. For the so-called full-scale IQ, one could speak of an increase of 3 IQ points per decade. With fluid int. it was somewhat more with 4 IQ points and with crystalline int. it was somewhat less with 2 IQ points. However, this increase has never been linear. There were always phases with stronger and weaker increases. In the 1980s, these increases declined globally and, in some countries, there was even stagnation or reversal. This negative change would then be the anti-Flynn effect. However, I would not yet consider this to be certain.

Marco Vetter (MV): The different increase between fluid and crystallized intelligence is interesting because it's actually counterintuitive, right?

JP: Exactly. One would suspect that this increase is due to some changes in education because you can improve something relatively quickly with that. But this would tend to lead to an increase in crystallized int. But in fact, we see that the fluid int. has increased more.

MV: From a test development perspective, the classic test material for measuring fluid int. are matrices, which has remained constant over time and is still widely used. For crystallized int., on the other hand, it is relatively difficult to keep the test material constant over years, as vocabulary, general knowledge, and the like change more over time. Could this be one reason why the Flynn effect is less observable in crystallized int.?

JP: I also suspect that there is a masking of the Flynn effects in crystallized int. because these test items become more difficult or even wrong over time. As a result, the Flynn effect might show up less or even not at all. For example, in the Intelligence Structure Test from the 1970s, in the Sentence Completion subtest, there is the test item "What is the most important component of a television?" and the correct answer would be "picture tube." A person born after the 2000s can't even answer that correctly because the picture tube simply doesn't exist anymore. This is what causes this masking.

SF: Apart from the differences in the Flynn effect between the different intelligence domains, are there also differences between countries? And which countries are we actually talking about?

JP: There is data from all continents except Antarctica. Compared to many other research areas, we also have data not only from WEIRD countries (Western, Educated, Industrialized, Rich, Democratic), but also from Sudan, Kenya or Oceania, for example. This shows us that there are also differences between countries, but the question here is how to interpret them. For example, if you look at the data at the continent level, there are the strongest increases in Asia. However, there is no solid basis for why a summary per continent should be a meaningful unit of observation; in terms of the Flynn effect, such an assignment would be arbitrary. Therefore, I would take a rather cautious and differentiated view.

SF: Many tests in the Vienna Test System are developed and normed by SCHUHFRIED. Do we see the Flynn effect in our norming data as well?

MV: Norm data, which has been systematically collected, goes back to the 1990s. In a joint research project with Jakob Pietschnig, we are currently investigating these with regard to the Flynn effect. If you compare the paper-pencil version data from the 1960s and 1970s with our current representative samples for various matrices tests, you actually see enormous increases in test scores. From the 90s on, it becomes more differentiated. Here we see that there are more improvements in crystallized int., some of which can be explained quite logically. For example, in the English Language Skills Test (ELST), people have probably gotten better because English has become very much part of our education system. So, in summary, the more you go back in time, the more you see the Flynn effect. In some dimensions, you can still see it today, but the whole picture becomes blurrier.

SF: The question now is: Where does this come from? What are possible explanations for the Flynn effect?

JP: There are over a dozen hypotheses that are used to explain the Flynn effect. These can be broadly divided into biological, environmental, and hybrid causes. The most likely hypotheses have to do with hybrid factors, such as perinatal nutrition or hygiene. These have improved over time. In addition, the near stagnation of the Flynn effect at the time of World War II fits well. However, schooling certainly plays a role as well. In addition, test-taking behavior has changed. We have simply become smarter and more knowledgeable about how to deal with tests in general.

SF: Is there already an explanation for the anti-Flynn effect?

JP: Just like guessing behavior, the effects of schooling, hygiene and medicine have a natural end. If I feed an optimally nourished child even more, I don't get a smarter child, I get a fatter child. So, there are saturation effects. In addition, there are so-called diminishing returns: In education, it does make a difference whether I educate a child for one or two years. However, it no longer makes much difference whether I educate a child for 13 or 14 years.

When the first studies on the reversal of the Flynn effect were published, explanations in connection with migration movements (migration hypothesis) and the assumption that persons at the lower end of the ability distribution reproduce faster and earlier (fertility hypothesis) or to the effect that due to medical devices nowadays also persons reach the reproductive age who would not have reached it in former times (mortality hypothesis) came up instantly. However, these were all just conceptual ideas. We also examined these hypotheses empirically in detail using two different data sets, and nothing consistent in this direction emerged.

SF: We have now talked about environmental causes for the (anti-)Flynn effect. What is your speculation about the influence of the Corona pandemic on this development?

JP: Basically, the question is how long the whole thing will last. I don't know whether the two years will have a major impact. We do have a good health care system in Austria, but nevertheless certain interventions are more difficult to implement and less accessible; this has negative effects on physical, but also mental health. Education systems also do not benefit from necessary pandemic-related measures such as home schooling. Mandatory mask wearing in schools is certainly very good from a medical standpoint, but it doesn't necessarily promote attention, learning environments, and the like. So, if the pandemic has an effect, it's certainly a negative one.

SF: Where do you see the relevance of the Flynn effect in people's everyday lives?

JP: Outdated test norms basically have an impact on any expert assessment. A particularly drastic example would be the death penalty in the USA. There is a clause that people who have an IQ of less than 70 may not be executed. Now it depends on whether the person in question was once given a test with up to date or outdated norms. If they were outdated, a positive Flynn effect means they have a greater "chance" of being eligible for the death penalty. Another example is certain funding decisions which are based on test scores. In Germany, there is financial support from the state to foster reading and spelling. Here, too, one wants to see test results before deciding whether or not a child will receive funding. The same problem occurs here: Outdated norms and a positive Flynn effect mean no funding, even though the child might need it.

SF: What is the relevance of the Flynn effect for psychologists who are doing psychological assessment in practice?

MV: From my point of view, the Flynn effect is an important reason to always use up to date test norms. We also take this into account in our tests. We adhere to standards such as DIN 33430 and check our procedures at least every 8 years to see whether the norms are still up to date. If we detect relevant changes, we update the norms. In practice, the size of the norm sample is very often paid attention to. However, even very large norm samples are not very suitable for ethical psychological assessment if they are outdated.

Jakob Pietschnig's basic research is very important for us in this respect. We can better assess which dimensions will be particularly affected by changes and in which direction we can expect them in the norm. This enables us to respond at an early stage, evaluate the effects in our norms and, if necessary, collect new norms in a timely manner.



Further readings:

Pietschnig, J., & Voracek, M. (2015). One century of global IQ gains: A formal meta-analysis of the Flynn effect (1909–2013). Perspectives on Psychological Science10(3), 282-306.

Pietschnig, J., Deimann, P., Hirschmann, N., & Kastner-Koller, U. (2021). The Flynn effect in Germanophone preschoolers (1996–2018): Small effects, erratic directions, and questionable interpretations. Intelligence, 86, 101544.

Pietschnig, J., Voracek, M., & Gittler, G. (2018). Is the Flynn effect related to migration? Meta-analytic evidence for correlates of stagnation and reversal of generational IQ test score changes. Politische Psychologie, 2, 267–283.



Fluid intelligence: includes basic processes of thinking and is largely independent of experience.

Crystallized intelligence: includes the ability to apply acquired knowledge; it is considered to be predominantly culture dependent.

Full-Scale IQ: Scores from intelligence test batteries consisting of multiple subtests measuring both crystallized and fluid abilities.

Matrices: Test paradigm for measuring logical reasoning. Abstract shapes are presented in a grid (matrix) of rows and columns arranged according to certain rules. Subjects must recognize and apply these rules by filling in a missing shape or marking incorrect shapes.

DIN 33430: a DIN standard (DIN=Deutsches Institut für Normung; translated “German Institute of Standardization”), which contains quality criteria and standards for job-related aptitude testing.