Desirable Difficulties:
Testing, Spacing, and Interleaving

By Dr Atherton (Director of Learning and Research, Teacher of English)

22 July 2024

Bjork’s Concept of Desirable Difficulties

As discussed in the February issue of the Enquiry, and as outlined by Nick Soderstrom (2019), ‘long-term learning can be enhanced by intentionally impairing short-term performance’ (2019). Strategies that appear to hinder or disrupt short-term progress (performance) can in fact have great benefit for long-term understanding and retention (learning). These strategies are what Bjork has labelled ‘desirable difficulties’ (1994), desirable because they improve long-term learning, but difficult because they might appear to slow progress initially. In this issue of the Enquiry, we’ll discuss each of the three desirable difficulties that Bjork outlines and the implications they may have for classroom teaching.

Testing as a learning event

Rather than being only viewed as an assessment of the learning that has already taken place, Bjork (2011) and Henry Roediger (2006) argue that testing should be seen also as a way of facilitating future learning. What is called the testing effect describes the fact that the act itself of being tested and successfully retrieving information leads to more effective long-term retention. Testing, then, in the words of Bjork, should be viewed as a learning event in its own right and not just a measure of all other learning events. With this in mind, we might consider which of the following four might be most conducive to long-term learning:

■ A: Study Restudy Restudy Restudy Test
■ B: Study Restudy Restudy Test Test
■ C: Study Restudy Test Test Test
■ D: Study Test Test Test Test

Despite students often feeling most comfortable with (A), which is because it induces short-term performance, (D) is far more effective for long-term retention because it harnesses the testing effect. The testing becomes part of the sequence of learning and not simply a future assessment of it. This manner of studying also taps into what is called the generation effect, which describes the fact that if a student generates a solution or answer as opposed to being presented with one then retrieval is strengthened. This is assuming, though, there are not fundamental flaws or gaps in understanding, in which case reteaching would be most beneficial: the testing effect assumes there is enough of a knowledge-base to be tested.

But what kind of testing would be most beneficial? The key to harnessing the power of the testing effect is to use frequent, low-stakes quizzing. Low-stakes quizzing, as opposed to the high-stakes testing of mock exams, can take many forms (with other suggestions listed in Sherrington’s post at the end), such as:

■ Flashcards or apps such as Quizlet
■ Knowledge dumps
■ Quizziz
■ Kahoot
■ Retrieval relay
■ Microsoft Forms
■ Elaborative-Interrogative
■ Paired quizzes

It is also especially beneficial, and logistically effective from a teacher’s point of view, for these low-stakes quizzes to be self-marked wherever possible. This also has the added benefit of tapping into a student’s metacognitive awareness of their own current strengths and weaknesses (‘I don’t seem to know X very well so I should revise it more’).

Figure 1

One way in which to integrate low-stakes quizzing into lessons is to begin every lesson with a retrieval task based on previously learned material. The above format, waiting for the students on the board as they enter the classroom, covers material from the last lesson (green), the previous topic (red) and then a longer, more analytical question (blue). [Figure 1]

Whatever format it might take, the key takeaway is that the opportunity for retrieval of information that promotes generation is more effective than a further study event. Whilst restudying may enhance performance, it tends not to be as effective for long-term retention. What this looks like will obviously vary from classroom to classroom and discipline to discipline.

Spacing practice

One of the reasons students find massed practice (like cramming) so attractive is because it does enhance short-term performance, but it is deadly for long-term retention of information. It has long been understood that over time we become less likely to be able to retrieve information previously learned (especially when it has been crammed) and so revisiting material at regular intervals facilitates long-term learning, an idea related to the Theory of Disuse. Ebbinghaus’ original research into memory decay, called the Forgetting Curve, is typically represented as shown. [Figure 2]

Figure 2

What is interesting to note is that the initial rate of decay after first studying the material is very steep, but that with each subsequent spaced review the forgetting curve is actually eased, increasing what is called its retrieval strength. Thus, as Ebbinghaus initially demonstrated and as has subsequently become what Bjork calls one of the most robustly evidenced ideas across the study of memory, every
time a memory is retrieved, that memory is more accessible in the future. The more we revisit material at regular intervals across time the more robust our recall of that information becomes.

The perhaps more salient question, though, is what is the optimal time to revisit material after initial study? In a recent ResearchEd presentation and based on an Action Research project into Ebbinghaus, Damien Benney constructed the above graph that plots the optimal gap between initial study and review in light of when that material will be needed for a future examination. For instance, if an examination was due in 50 days from the point of initial teaching then Benney concluded the optimal gap would be a total of nine days, perhaps much sooner than one might anticipate. [Figure 3]

Figure 3

It is here that synergies between the testing effect and spacing effect become most powerful. What would be the best way to revisit the previously taught material and to return to it at regular intervals? Following Bjork and Roediger, the answer would be to revisit by using low-stakes testing as a learning event. As such, building time into a sequence of learning for frequent revisiting of previously studied material will help students to retain it in the long term.

Interleaving

Interleaving is often held in contrast to blocking where the latter refers to material being studied in one single sequence, but interleaving refers to material from several related areas being taught continuously. This is similar to spaced practice, but whereas spacing typically refers to distributing the same topic, interleaving involves distributing different types of problem or topic.

The example that Dunlovsky (2012) gives is to imagine a student learning addition and subtraction. Typically, they might spend a block of practice adding and then a block of practice subtracting. The next topic may then introduce division and multiplication and practice may begin with one before moving to the other. This would be an example of massed practice or blocking. However, interleaving would involve solving one problem from each type before solving a new problem from each type. As Bjork explains, ‘blocked practice appears optimal for learning but interleaving actually results in superior long-term retention’. Blocking feels neater and short-term gains give the illusion of it being more effective, but research would indicate this is not necessarily the case.

The efficacy of interleaving can partly be explained because it naturally compels spaced practice, but it also helps to promote making connections across and between different topics, which can result in high-order thinking.

Both spacing and interleaving have fascinating implications for curricular design and the sequencing of content. Is it better, for instance, to teach Topic A and then Topic B or to find a way to interleave material from Topic B during the study of Topic A? What would be the best way to revisit material from Topic A during the study of Topic B? What topics or materials are closely related enough to benefit from being interleaved? As with any such questions, answers need always be rooted within specific disciplines and debated by subject experts as no one answer will fit all.

References

Rob and Elizabeth Bjork: ‘Making Things Hard on Yourself, But In a Good Way: Creating Desirable Difficulties to Enhance Learning’ (2011).

Nick Soderstrom: ‘Learning Vs Performance: A Distinction Every Educator Should Know’ (2019).

John Dunlovsky: ‘Strengthening the Student Toolbox: Study Strategies to Boost Learning’ (2013).

Tom Sherrington: post ‘10 Techniques for Retrieval Practice’ (2019).

Henry Roediger: ‘Test Enhanced Learning’ (2006).

David Didau: ‘Deliberately Difficult: Focusing on Learning Rather than Progress’ (2013).

‘Desirable Difficulties: Testing, Spacing, and Interleaving’ by Andrew Atherton, published in The Enquiry: Issue 8.

The Enquiry is a staff journal dedicated to reflections on educational research, and teaching and learning at Downe House School. Issue 8 was published in July 2024, looking back at Lent term 2024.

All previous issues can be found here: The Enquiry by downehouseschool Stack – Issuu.

Learning & Research News

A flexible and modern approach – Full Boarding | Flexi Boarding | Day Pupil – what’s your Downe House journey?
Back to top