This episode was the second half of an experiment. In Episode 1, I let NotebookLM, Google’s AI summarizer, narrate a set of my writings about how learning works and how AI should fit inside it. In Episode 2, I took the same writings back and told the story myself. Not because the AI version was wrong, but because the part that was missing was the part an AI cannot, by its nature, carry: the stakes for a human learner, the cultural context of how knowledge is built, and the reason the word struggle appears on the very first line.
This essay is the written extension of that second telling.
What “struggle” actually is
When I say learning is a struggle, I am not using the word metaphorically, and I am not romanticizing frustration. I am naming something cognitive science has been documenting carefully for fifty years.
The most useful technical term for what I call struggle is the one Robert Bjork and his collaborators, working out of UCLA, introduced in the 1990s: desirable difficulties. Bjork’s insight, which has been replicated extensively since, is that conditions that make performance worse in the short term often produce better learning in the long term. Spacing practice out instead of cramming it. Testing yourself instead of re-reading. Interleaving related topics rather than blocking them. Working a problem before you have been shown the method. These conditions make the learner’s task harder in the moment. They also produce knowledge that sticks.
John Sweller’s cognitive load theory, developed through the 1980s and still dominant in instructional design research, comes at the same phenomenon from the other direction. Sweller distinguishes germane load, the productive mental work of integrating new material into existing structures, from extraneous load, the mental work wasted on confusing presentation or missing context. You want to load the learner up with germane effort. You do not want to waste their working memory on extraneous load. The line between the two is what good instruction navigates.
In math education, the term you hear most often is productive struggle, a phrase James Hiebert and Douglas Grouws used in their 2007 review of effective teaching research. It names exactly what I was describing in the episode: the learner moving from unclear to clearer, staying in the zone where the task is hard enough to matter but not so hard it collapses.
What unites all of these traditions is that they converge on the same practical claim. The struggle is the learning. If you remove the struggle, you have not made learning more efficient. You have removed the event that makes learning happen.
This matters for the AI question more than most people realize, because the whole user experience of a large language model is designed around removing effort. That is the product promise.
Bloom, honestly
I used Bloom’s taxonomy in the episode because it is the clearest shorthand for the point I was making about sequence. I want to be more careful about Bloom on the page than I could be in the microphone.
The original taxonomy was published in 1956 by a committee Benjamin Bloom chaired. The version most people remember, with the levels knowledge, comprehension, application, analysis, synthesis, evaluation, is that original. In 2001, Lorin Anderson and David Krathwohl published a revision that renamed and reordered the top of the pyramid: remember, understand, apply, analyze, evaluate, create. Most serious instructional designers today work from the revised version. The key move in the revision is that creation sits at the top, not evaluation. That change matters, because it recognizes that generative work is the hardest cognitive task, not the easiest.
The most common fair critique of Bloom is that learning is not linear. That critique is correct, and I agree with it. Learning is not a single straight line. Learners can loop back to foundational definitions after attempting high-level creation. They can approach a concept through application before they have fully articulated a definition. They can, and often should, move between levels.
What the critique is usually missing is the distinction between linear and sequential. A sequence is not a line. A sequence is a set of dependencies. You cannot meaningfully create with a concept you have not understood, the same way you cannot synthesize variables you cannot define. The foundations are load-bearing whether or not the learner visits them in a straight line.
For an educator, Bloom is most useful as a diagnostic. When a learner stalls at analysis or creation, the instinct is to push harder at the top. The more honest move is usually to check whether the base of the taxonomy was fully built. Ninety percent of the time, the answer is no.
What AI models actually do
The episode used the phrase predictive model to describe systems like ChatGPT and Gemini. I want to unpack that more carefully, because what these models do is the whole reason the sequence question matters.
A large language model is, at its core, a system that has been trained on a very large corpus of text to predict what token, roughly what piece of a word, is most likely to come next in a sequence. It does not know what the tokens mean in any sense philosophically similar to how you know what words mean. It has statistical regularities between tokens. That is not a criticism of the technology. It is a description of what the technology is.
Emily Bender, Timnit Gebru, Margaret Mitchell, and Angelina McMillan-Major named this clearly in their 2021 paper “On the Dangers of Stochastic Parrots.” Their argument was not that large language models are useless. It was that we should be precise about what they are: systems that generate plausible-sounding text based on statistical patterns, without reference to underlying truth, meaning, or context.
The practical implication, for a learner, is this. When you ask an LLM a question, the model produces text that is statistically likely given your prompt and its training data. The quality of what you get back is a function of two things: the clarity of your prompt, and the distribution of what the model has seen before. If you cannot articulate a question precisely, because you do not yet have the foundational understanding to articulate it precisely, the model will still return something. What it returns will sound authoritative. It will be structured. It will resemble expertise. The learner, lacking the foundation to evaluate what came back, will often accept it.
This is the specific danger the episode names. It is not that AI tools produce bad output. It is that they produce confident output, and a learner who skipped the foundation has no ground on which to check it.
There is empirical work beginning to measure this. Studies of students using LLMs for writing tasks have found that the surface quality of submitted work improves while measures of the students’ own conceptual understanding stall or decline. The ChatGPT-assisted draft reads better than the unassisted draft. The student who wrote it learned less. That is the pattern this essay, and the episode, are trying to name.
The cultural context the model inherits
There is a second risk the episode raises, one that is not reducible to the sequence problem. AI models are trained on human language at scale. Human language at scale is not a neutral sample of reality. It is a record of which voices have had the power to put their language into forms that were recorded, preserved, digitized, and scraped into training sets.
Safiya Noble’s 2018 book Algorithms of Oppression documented this in a domain people had treated as obviously objective: search. Her work showed that search engines’ results on queries about Black girls, Latino families, and other non-dominant groups returned systematically dehumanizing content, not because the engineers were bigoted, but because the underlying training data was. Joy Buolamwini’s Gender Shades research at the MIT Media Lab found that commercial facial-recognition systems had error rates on dark-skinned women an order of magnitude worse than on light-skinned men. Same technology, same company, wildly different performance across populations, for the same structural reason.
The large language models people now use as learning tools inherit the same substrate. They have been trained on text that reflects which histories were written down, which voices were published, which interpretations were treated as canonical. What the episode calls grand narratives, the stories a dominant group tells about itself and treats as history, are heavily represented in the training data. What the chapter I wrote calls mini-narratives, the accounts of non-dominant communities, often oral, often confined to community publications that were never scraped at the same scale, are under-represented.
When a learner who has not been taught to look for mini-narratives asks an AI tool to summarize a culturally complex topic, the tool produces an answer that sounds neutral and reflects the distribution of its training data. The learner takes the grand narrative as the neutral one. The mini-narrative is silent, because nothing in the interface reveals its absence.
This is why I wrote in the episode that cultural context is not an add-on to AI literacy. It is AI literacy. A learner who knows to ask “whose voice is missing from this summary” is protected against a class of errors that a learner who has only been taught prompt engineering is not.
What sequence-respecting looks like, in practice
The episode offered a short list of prompts that put AI in its right place. I want to put a longer version on the page, because these are the moves I actually teach educators and graduate students.
Build the definition first. Define these five terms in plain language. Give one example and one non-example for each. Flag any definitions that are contested in the field.
Check understanding before moving on. Ask me five questions to check whether I understand the basics. Do not give me the answers until I have answered yours.
Surface common misconceptions. Show me two or three misconceptions people commonly hold about this topic. Explain why each one is wrong, and give me a way to test whether I am holding it.
Generate graduated practice. Produce a practice set on this topic that starts with the simplest case and increases in complexity over ten items. Do not solve them for me.
Pluralize the frame. Summarize three perspectives on this issue from communities that experience it differently. Name whose perspective each one represents and what the perspective depends on.
Ask the tool to audit itself. What might be missing from your previous answer? Whose voice or experience is not represented here?
That last move is the one most people skip. A model will often acknowledge, when asked directly, that it is working from incomplete or biased data. It will not volunteer the acknowledgment. The learner has to ask.
What this leaves us with
The single sentence from the episode I would send home with a reader is this one: AI should compress time, not compress development.
Compressing time is the honest promise of these tools. Looking up a definition that would have taken an afternoon in a library takes a second now. Generating a practice problem set that would have taken a teacher an hour takes a minute now. Finding relevant examples, pulling quotes, summarizing long readings, the time costs drop across the board. Used this way, AI is a legitimate accelerator.
Compressing development is the trap. A learner who bypasses the basic level, asks the model for the polished product at the top of Bloom, and turns in what came back has not developed. They have transacted. The transaction looks efficient. The development is hollow. And because the hollow looks like the developed, from the outside, the loss is nearly invisible until it matters.
Cultural context adds one more layer on top of this. A learner who has the foundation can interrogate what the model returns. A learner who also has cultural context can interrogate whose voice the model returned. A learner who has neither is receiving, quietly, the grand narrative, and calling it truth.
The work, for educators, is to design AI use that honors both layers. Build the sequence. Build the cultural-context check. Then let the tool do the parts that genuinely compress time, and not the parts that would be development.
The work, for learners, is the mirror of that. Define the terms yourself. Build the frame yourself. Practice the moves yourself. Ask whose voice is in the answer and whose voice is not. Then, and only then, use the model for higher-level creation and synthesis.
That is what the episode was trying to say in its last minute, and the written version has now said at length. The sequence matters. The context matters. And the struggle, if you let it happen, is the thing that builds the learner.
This is the second episode of Season 1. If you have not listened to Episode 1, the NotebookLM version of this same set of writings, it is worth listening to the two back-to-back. The comparison is the point.
DEB

