I recently had a fun conversation.
We were having lunch at the Baltic Economics Conference a few weeks ago. The conversation turned to an evergreen topic among academic economists—gripes about teaching.
One economist shared a story about Gen Z students. Because of COVID, these students have little experience with in-person exams. When the economist decided to hold one, the students pushed back hard. Gen Z can’t even take a real exam anymore!
The anecdote led to nods of agreement around the lunch table. Some colleagues complained about take-home exams. Others grumbled about how ChatGPT makes it too easy for students to cheat.
I’m sympathetic to the plight of teaching in the age of large language models.
However, I was feeling a bit contrarian that day. So, I said something like, “Yeah, but in-person exams aren’t that realistic. In most work situations, you don’t have to solve a task in two hours without any outside help.”
Not everyone agreed. Some pointed out scenarios when you must be quick. Maybe you’re working at a central bank during a financial crisis or making split-second decisions on a trading floor. What if you’re a firefighter? You can’t review “Firefighting 101” at the scene of a burning building.
That made me pause.
Upon reflection, I believe there are two types of thinking: “Fast thinking,” which involves quick problem-solving, and “deep thinking,” which requires sustained effort over time.1 In exams and job interviews, we’re mostly testing fast thinking. However, in many real-life situations, it’s deep thinking that matters.
Let’s say you’re a data analyst. You bring value by drawing insights from messy data. For that, you need to think about how to analyze the data, what those columns mean, and, wait, why does this field have so many NULLs? While there are best practices you can follow, the process isn’t formulaic. There’s no substitute for sitting down and thinking deeply about the problem.
If deep thinking is so important, why do we have exams and job interviews?
A strong argument is that fast and deep thinking are likely positively correlated (both driven by general intelligence). Yes, they’re not the same thing, but in practice, they’re very similar. Since deep thinking is difficult to test, we test for fast thinking and hope for the best.2
That’s not terrible, but it comes at a cost. For example, I’m not a super fast thinker. In school, I was good at math but occasionally struggled with exams. I’d sometimes get freaked out or anxious due to time pressure. However, with enough time, I can do pretty complex things (including writing academic papers). While fast thinking may be an OK predictor of deep thinking, it’s hardly perfect. As a result, relying on fast-thinking tests to identify deep thinkers will inevitably yield false negatives.
So, are in-person exams fine or evil?
Hard to say. What I’m confident of, though, is that we need both fast and deep thinkers. We need folks who can make the right call when facing a burning building. However, we also need scholars who think deeply. Our task is to create environments—whether in education or the workplace—that recognize and celebrate both types of intelligence.
Let’s do that. While we’re at it, let’s not blame Gen Z for the imperfections in our education system. They’re fine, no cap.
I know, I know: Researchers in psychology and psychometrics have extensively studied different types of intelligence, and here I am introducing weird new definitions. Well, this is Substack, so allow me to be speculative. What I define as “deep thinking” seems related to fluid intelligence or System 2 thinking, while “fast thinking” is closer to crystallized intelligence or System 1 thinking.
For example, a meta-analysis by Ackerman, Beier, and Boyle (2005, Table 4) reports an average correlation of 0.46 between Raven’s test results (a common measure of fluid intelligence, similar to deep thinking) and measures of general knowledge (a marker for crystallized intelligence, similar to fast thinking).
Always fun to read your posts! It made me think deeply 😉, not only about what the exams are testing, but also about the alternatives.
As you say, exams are a proxy for something, be it general intelligence, ability to memorise and retrieve information fast, or some other types of skills/abilities. But proxies are just that - proxies. Someone that aces all the exams and does well in a job interview may not necessarily turn out to be a star employee.
One way to deal with that could be to create exam/interview questions based on situations, I.e. describing the situation and asking what they would do or how they would solve that problem in that particular situation. That opens possibilities for a range of different answers that may engage both fast and slow thinking. Although the evaluation of those answers would be more labour intensive and subjective. Also, what we say we would do and what we actually do is not always the same either. I may say that I will leave work on time, go to the gym, and cook myself a healthy dinner, but actually I stay late, pick up McDonalds on the way home and watch Netflix the whole evening.
If we went a step further, we could create simulated (or real) environments and observe how people behave in those environments. For example, I took a driving test last week, which involved me having to perform various parking manoeuvres and drive around the city while an eagle-eyed examiner sitting next to me recorded all the things that I didn’t do properly. However, this type of evaluation is also not perfect as there are many outside variables that can influence the outcome of the test. Things like traffic and weather conditions, the behaviour of other drivers and pedestrians on the road, the examiner’s mood, and many others can play a major role. You might say that you should be able to handle all those difficulties to earn a driving license, which I agree with. But at the same time there will be a number of not so good drivers that may get favourable conditions and pass the test.
So I guess the best thing to do is to always be mindful of what it is that we want to evaluate and choose the most appropriate method for that. At the same time, every measure will only be a proxy, so we should not take the results as gospel. Context matters.