The history of AI is a history of three distinct technical approaches, each of which produced real capability and then hit a wall that the approach could not overcome. Understanding the shape of this history explains why the current moment is different from the previous two periods when AI appeared to be breaking through.
Wave One: Rules Written by Humans
From the 1950s through the 1980s, AI research focused on building systems whose intelligence came from rules that humans wrote down. If a researcher wanted an AI system to diagnose a disease, they would consult medical experts, elicit the rules those experts used to reach diagnoses, and encode those rules in the system. The system would then apply the rules to new cases and produce diagnoses. This approach was called symbolic AI, and the specific systems built this way were called expert systems.
Expert systems worked for narrow, well-defined problems. A system that diagnosed a specific class of infections, or that configured computer systems from a catalogue of components, could reach expert-level performance within its narrow domain. By the 1980s, expert systems were being deployed in commercial settings and generating serious commercial interest.
The approach hit a wall that turned out to be fundamental. Human experts cannot articulate most of what they know. A doctor can diagnose a complex case through a process that combines explicit medical knowledge with intuition, pattern recognition, and judgment developed over decades of practice. The explicit knowledge can be written down, and the intuition and pattern recognition cannot, because the experts themselves are not consciously aware of the rules they are applying. As researchers tried to extend expert systems to broader domains, they discovered that the written-rule approach could not capture the depth of real expertise. The systems became brittle, handling anticipated cases well and failing unpredictably on anything outside the patterns their rules had been designed for.
The research community concluded that the rules-written-by-humans approach had reached its limits. Funding contracted, commercial interest faded, and the period that followed was known as an AI winter. The basic insight that remained from this wave is that intelligence is not reducible to rules humans can explicitly articulate, and any AI system whose intelligence comes entirely from written rules will be limited by what humans can write down.
Wave Two: Learning From Data
The second wave of AI, which developed from the late 1980s through the 2010s, took a different approach. Instead of asking experts to write down rules, researchers designed systems that could learn patterns directly from data. A machine learning system would be given thousands of examples of a task, along with information about what the correct output was for each example, and would adjust its internal parameters until it could produce correct outputs on new examples it had not seen before. The specific mathematical techniques varied. What they shared was the shift from rules-written-by-humans to patterns-learned-from-data.
Machine learning required two resources that were scarce in the first wave. It required large quantities of data, and it required substantial computing power to process that data. The internet provided the first. From the late 1990s onward, the volume of data available for training grew faster than at any point in history. Digitised documents, photographs, video, and structured records accumulated in quantities that made machine learning practical for tasks that had been impractical a decade earlier.
The computing power came from a specific kind of hardware. Graphics processing units, or GPUs, had been developed for rendering video game graphics. They were designed to perform many simple calculations in parallel, which is what video graphics requires. In the mid-2000s, researchers discovered that machine learning algorithms also required many simple calculations in parallel, and that GPUs could therefore run machine learning training substantially faster than ordinary computer processors. This discovery turned out to be transformative, because it made it practical to train much larger models on much larger datasets than had been possible before.
A particular class of machine learning technique, called deep learning, benefited most from this combination. Deep learning systems use many layers of computation, each layer learning progressively more abstract patterns from the output of the layer below. Deep learning had been theoretically understood for decades and had been impractical at useful scale until the data and the computing power both became available. In 2012, a deep learning system won an image recognition competition called ImageNet by a margin large enough that the research community understood something significant had changed. Within three years, deep learning was producing results that had been considered decades away. By 2016, deep learning systems were performing at or above human level on a range of perception tasks including image recognition, speech recognition, and the game of Go.
This wave produced substantive capability for narrow tasks. A deep learning system could be trained to recognise faces, translate between languages, transcribe speech, or identify medical conditions in scans, and could reach professional-level performance. The limits of this wave appeared when researchers tried to build systems that worked across tasks rather than specialising in one. A system trained to recognise faces could not translate languages. A system trained to translate could not summarise. Each capability required a separately trained system, and the work of training each system was substantial. The second wave therefore produced many narrow AI tools and no general-purpose AI that could handle language and reasoning across different kinds of professional work.
Wave Three: The Transformer
The third wave began in 2017 with a research paper titled "Attention Is All You Need." The paper introduced a new architecture for machine learning systems that worked with language, called the transformer. The transformer produced better results than previous approaches on language tasks, and that alone would have been a meaningful advance. What made the transformer transformative rather than incremental was that the architecture scaled in a specific way that previous architectures did not.
Previous language AI systems reached a ceiling in performance that additional data and additional computation could not raise. A system that could translate adequately could not be made to translate substantially better simply by training it on more data or giving it more computational resources. Performance flattened. The transformer did not flatten. As researchers trained larger transformer models on larger datasets with more computing power, the models got better in ways that did not plateau. A transformer twice the size of a previous transformer, trained on twice the data with twice the compute, would reliably perform better than the smaller one. This property, which came to be called scaling, meant that the performance of transformer-based systems was bounded not by the architecture but by the resources a research team could invest.
This observation set off the development that produced the AI tools the professional is using today. Between 2018 and 2024, a series of progressively larger transformer-based systems demonstrated that scaling continued to produce new capabilities. A system trained at one scale could not hold a coherent conversation. A system trained at a hundred times that scale could draft complex documents, follow multi-step instructions, and adapt its output to the specific situation of the request. The capability gains were not just quantitative but qualitative. At sufficient scale, transformer-based systems acquired capabilities that had not been present in smaller versions, including the ability to follow instructions the models had not been explicitly trained to follow.
The systems that resulted from this scaling process are called large language models, or LLMs. When a professional uses an AI tool today, whether for drafting, analysis, summarisation, or extraction, they are almost certainly using a large language model or a system built around one. Understanding what a large language model is and how it works is therefore the last piece of this module, and the one with the most practical relevance to the professional's work.