The Four Layers of Intelligence

The four layers build on each other. Automation sits at the bottom. Rule-based AI sits above automation. Machine learning sits above rule-based AI. Deep learning sits above machine learning. Each layer adds a capability the layer below does not have, and each higher layer tends to depend on the lower layers working properly to deliver its value. Working through the layers from bottom to top is the clearest way to understand what distinguishes them.

Automation

Automation is the execution of fixed instructions without learning or judgment. A system given a specific sequence of steps will perform that sequence exactly the same way every time the triggering conditions are met. An email autoresponder that sends the same reply to every incoming message at a given address is an automation. A scheduled job that runs every night to back up a database is an automation. A workflow that creates a ticket in a support system whenever a form is submitted is an automation.

Automation works through explicit logic that a human has written. Someone decided what counts as a trigger, what actions should follow, and what rules determine which action applies in which circumstance. The system then executes that logic reliably. The value of automation is speed and consistency. A task that would take a human ten minutes and occasionally be done incorrectly can be completed in milliseconds every time, with no variation across runs.

Automation has no capacity to handle situations its designer did not anticipate. If a form arrives in an unexpected format, the automation will either fail or produce wrong output. If the upstream system changes its data structure, the automation will break until someone updates the logic. Automation does not adapt, and it does not improve from experience. It does exactly what it was told to do, which is both its strength and its limit.

Automation is not artificial intelligence in any meaningful sense, and distinguishing automation from AI is the first step in using the four-layer framework well. Most of what gets marketed as "AI automation" is actually just automation, sometimes with a small AI component attached. A system that extracts structured data from forms and routes it through a workflow is mostly automation; the AI component may be a single step that reads the form and converts it into structured fields. Understanding which parts of a workflow are automation and which parts are AI helps a practitioner predict where the system will be reliable and where it may fail.

Rule-Based AI

Rule-based AI extends automation by adding decision logic. A rule-based AI system applies a set of rules, written by humans, to make choices between paths based on the situation it encounters. A chatbot that follows a decision tree to diagnose a customer's problem is a rule-based AI system. A fraud detection system that flags transactions matching specific patterns is a rule-based AI system. The expert systems of the 1980s described in Module 1.1 were rule-based AI.

Rule-based AI differs from pure automation because it makes decisions rather than just executing steps. Automation says "when this trigger fires, do these actions in this order." Rule-based AI says "when this situation arises, evaluate these conditions and take the action that matches." The rules encode knowledge about the domain, and the system applies that knowledge to specific cases.

Rule-based AI works well when the domain is stable, the rules can be stated explicitly, and the edge cases are known in advance. A compliance check that verifies whether a transaction meets regulatory thresholds is well suited to rule-based AI, because the regulations are explicit and the conditions can be enumerated. A customer service script that handles common requests is well suited to rule-based AI, because the expected requests can be anticipated and the appropriate responses can be scripted.

Rule-based AI fails in the same situations that defeated the expert systems of the 1980s. When the domain involves judgment that cannot be reduced to explicit rules, when the edge cases outnumber the typical cases, or when the situation changes faster than the rules can be updated, rule-based AI becomes brittle. A rule-based customer service system will handle the cases it was designed for and fail on anything outside those patterns. A rule-based medical diagnostic system will work within the specific conditions the rules cover and will not generalise beyond them.

The limit of rule-based AI is the same limit that ended the first wave of AI in the 1980s. Human experts cannot articulate most of what they know. A system whose intelligence comes entirely from explicitly written rules will be limited by what humans can write down, which is less than what humans know. The next layer of the stack addresses this limit by changing how the system acquires its knowledge.

Machine Learning

Machine learning systems acquire their knowledge from data rather than from explicitly written rules. Given a large collection of examples, a machine learning system identifies the statistical patterns that distinguish different kinds of cases, and then applies those patterns to new cases it has not seen before. The system's knowledge sits in its internal parameters, which are adjusted through training so that the system produces the right outputs on the training examples.

The shift from rules-written-by-humans to patterns-learned-from-data is a fundamental change in how AI systems work. A machine learning system that predicts customer churn does not use rules that a human wrote down about what causes churn. It learns the patterns from historical data, identifies which combinations of signals correlate with customers who left, and uses those patterns to score current customers. The system may identify patterns that the humans who built it did not know were predictive, and it may apply those patterns more consistently than any human analyst could.

Machine learning comes in several varieties, distinguished by what kind of information the system gets during training.

Supervised learning is the most common type. The system is given inputs paired with the correct outputs, and it learns to produce the right output for new inputs. A spam filter trained on emails that have been labelled as spam or not-spam is a supervised learning system. Credit risk scoring, medical image classification, customer churn prediction, and most commercial machine learning applications are supervised. Supervised learning requires labelled data, which is often the expensive part, because someone has to produce the correct answers for the training examples.

Unsupervised learning is used when labelled data is not available or when the goal is to find structure the humans building the system had not anticipated. The system is given inputs without labels and learns to identify patterns or groupings in the data itself. A customer segmentation system that groups customers by purchasing behaviour without being told what groups to produce is unsupervised. So is a fraud detection system that identifies transactions that look statistically unusual without being told what counts as fraud. Unsupervised learning is useful for discovery and for cases where the relevant structure is not yet known, and it tends to produce outputs that require more human interpretation than supervised systems.

Reinforcement learning is used when the system needs to learn through trial and error in an environment. The system takes actions, receives rewards or penalties based on the outcomes, and adjusts its behaviour over time to accumulate more reward. A trading algorithm that learns which actions produce profit across many simulated trades is a reinforcement learning system. Game-playing AI is typically reinforcement learning. Reinforcement learning has specific technical requirements (the environment has to provide feedback in a form the system can use) and is less common in commercial applications than supervised learning.

A concrete worked example illustrates how supervised learning operates in practice. Consider a machine learning system that predicts which of a firm's customers are likely to cancel their subscription in the next month. The training data consists of historical records of customers who either did or did not cancel, along with information about each customer at the time (tenure with the firm, usage patterns, support ticket history, demographic data, the pricing plan they are on, whether they have recently been served by a specific account manager). The system is trained by showing it these historical records and asking it to predict the cancellation outcome for each one. The training process adjusts the system's internal parameters until it produces accurate predictions on the historical data. Once trained, the system can be given the current data for a customer who has not yet cancelled and produce a probability score estimating how likely that customer is to cancel in the next month. The firm can then use these scores to direct retention efforts toward the highest-risk customers.

What the system has learned, in this case, is which combinations of signals correlated with cancellation in the historical data. It may have learned that customers whose usage dropped by more than a certain percentage in the past month were more likely to cancel. It may have learned that customers who had recently interacted with support about billing issues were at elevated risk. It may have learned patterns that combine many weaker signals in ways no human analyst would have specified. The system does not know why these patterns hold. It only knows that they hold in the training data, and it applies them to current cases.

Machine learning works best when the task involves finding patterns in data, the patterns are stable enough to persist across the training set and the new cases the system will encounter, and high-quality labelled data is available in quantities sufficient for training. Credit risk scoring, demand forecasting, fraud detection, churn prediction, and recommendation systems are typical machine learning applications, all of which involve predicting an outcome from a pattern of signals that can be learned from historical examples.

Machine learning has specific limits that matter for practitioners. The system can only learn patterns that exist in its training data. If the training data is biased, the patterns the system learns will reflect that bias. If the world changes after the system is trained (a pandemic, a regulatory change, a new competitor), the patterns may no longer predict correctly, a phenomenon called distribution shift. Machine learning systems can also identify patterns that are statistical coincidences rather than causal relationships, which produces systems that work well on historical data and fail when the coincidences do not hold in new situations.

Machine learning is a broad category that includes many specific techniques. Decision trees learn by asking successively narrower yes-or-no questions about the input. Linear regression learns a weighted combination of inputs that predicts an output. Ensemble methods combine the predictions of many simpler models. These techniques all share the underlying principle of learning patterns from data, and they all sit within the machine learning layer of the stack. The layer above, deep learning, refers specifically to a subset of machine learning techniques that have produced the breakthroughs of the past decade.

Deep Learning

Deep learning is machine learning using a specific kind of mathematical structure called a neural network with many layers. A neural network consists of small computational units (loosely inspired by brain neurons, though the resemblance stops at the loose inspiration) arranged in layers. Each unit takes inputs from the units in the layer below, applies a mathematical operation to them, and passes its output to the units in the layer above. The network as a whole transforms input data through successive layers of processing, with each layer learning progressively more abstract patterns from the output of the layer below.

The key property of deep learning is that the network learns the useful representations of the data automatically, rather than requiring humans to specify what features to look at. In traditional machine learning, a human would decide which features of an email matter for spam detection (sender domain, presence of certain words, number of links) and the system would learn to weight those features. In deep learning, the system is given the raw text and learns for itself which features matter, often identifying signals that humans would not have thought to specify.

This property matters most for tasks involving unstructured data, which is data that does not come in neat rows and columns. Text, images, audio, and video are all unstructured. A human cannot efficiently specify which features of a photograph matter for identifying a cat, because the features are spatial patterns of pixels that defy simple description. A deep learning system, given enough labelled examples, learns to identify those patterns itself.

Deep learning systems come in several architectures, each suited to different kinds of data. Convolutional neural networks were developed for image recognition and work by applying the same pattern-detection operations across different regions of an image. Recurrent neural networks and their successors were developed for sequential data like speech or time series. Transformer architectures, which emerged in 2017 and power the current generation of large language models, were developed for language and have since been adapted for images, audio, and multimodal tasks. Other architectures exist for specific applications, and the practitioner rarely needs to know which architecture a particular tool uses. What matters is that the tool is a deep learning system, which means it learns patterns from data through many layers of processing and is well suited to tasks involving unstructured inputs.

The current generation of AI tools that most professionals encounter, including ChatGPT, Claude, Gemini, and similar systems, are deep learning systems based on the transformer architecture. They sit at the top of the four-layer stack. Everything in the remainder of this module develops what these specific systems are and how they work, because understanding them in detail is what allows a practitioner to use them well.