In the previous section, you explored the four foundational layers of digital intelligence: Automation, Artificial Intelligence, Machine Learning, and Deep Learning. The next step is to look underneath those layers and ask a simple question: how do these systems actually learn?
Learning in machines is not magic. It is a structured process of observing data, spotting patterns, making predictions, checking where those predictions went wrong, and then adjusting. In spirit, this is similar to how people learn from experience, but machines do it at a very different speed and scale. A person might need months or years of practice to become fluent in a task. A learning system can run through millions of practice rounds in a short period of time, improving a little with each cycle.
Once you understand this process, AI stops looking like a black box and starts to resemble a well organised feedback loop. Data goes in, the system makes a guess, compares it with reality, and updates itself so that the next guess is slightly better. Over time, this repeated improvement builds real capability.
By the end of this section, you will see that learning in AI is a measurable and teachable process. The same principles that guide basic models also support advanced platforms such as Cyrenza, which improve as they are exposed to more tasks, more corrections, and more real-world usage.
1.2.2.1. What “Learning” Means for a Machine
A machine never forms self awareness or inner experience. Its behaviour comes from patterns it detects in data. When large collections of examples are analysed, the system identifies combinations of inputs that often appear before specific outcomes. A weather model, for instance, can learn that certain mixes of humidity, temperature, and pressure frequently precede rainfall. This type of learning is mathematical and relies on repeated exposure to consistent patterns.
Learning takes place through a continuous cycle. The system receives input such as images, text, numbers, audio, or sensor readings. It examines the information for structure and regularities. It produces a prediction based on what it has learned so far. It then receives feedback that shows the distance between its prediction and the correct answer. Using this feedback, the system adjusts its internal settings. This cycle runs many times, and each round contributes small improvements that accumulate into stronger performance.
Data sits at the centre of this process. Datasets act as the system’s experience and determine the quality of its learning. Photos, documents, transaction logs, time series, recordings, and other observations can all serve as training material. Clean, representative, and well labelled data leads to reliable behaviour. Incomplete or biased data often leads to weak or unfair outcomes. Careful curation and review ensure that the learning process produces systems that behave consistently and support dependable real world use.
1.2.2.2. The Role of Algorithms
The word “algorithm” can sound technical, yet it simply refers to a clear, step by step method for solving a problem. In AI, algorithms define how a system searches for patterns in data, how it measures its mistakes, and how it adjusts its behaviour on the next attempt. Different kinds of problems call for different algorithms, just as different tasks in an organisation rely on different procedures, checklists, or methods.
Decision trees: learning by asking good questions
A decision tree learns a sequence of yes or no questions that split data into groups that are easy to predict. Each split tries to make the groups purer, which means members of a group mostly share the same label.
How it works.
- Start with all examples at the root.
- Evaluate possible questions, for example “is income greater than R20,000,” “did the email contain a link,” “is temperature above 18°C.”
- Choose the question that best separates the data. This uses impurity measures such as Gini or entropy.
- Repeat on each branch until the leaves are pure or a depth limit is reached.
- For classification, each leaf votes for the most common class inside it. For regression, each leaf predicts the average value inside it.
Decision trees have several strengths that make them useful in many practical settings. They produce rules that are easy for people to read and understand, which supports clear explanations and transparent decision making. They can work with both numbers and categories without requiring heavy preprocessing, which keeps preparation simple. They also serve as an effective building block inside larger model families such as Random Forests and Gradient Boosted Trees, where many trees are combined to improve accuracy and stability.
Example.
In email filtering, a decision tree works by asking a series of simple yes or no questions about each message. For example, the first question might be, “Does the sender appear on a block list?” If the answer is yes, the message is likely unsafe. If the answer is no, the tree moves to the next question, such as, “Does this email contain more than three links?” A later question might be, “Does the subject line include common spam phrases?”
At the end of these checks, the email reaches a “leaf” in the tree, which represents a final decision. One leaf might say “deliver to inbox,” another might say “send to quarantine for review,” and another might say “mark as spam.” In this way, the decision tree turns a few clear questions into a reliable, automatic judgment about where each email should go.
Linear regression: fitting a straight line to predict a number
Linear regression predicts a numeric outcome as a weighted sum of inputs. The model learns weights that produce a line or plane that best fits the data.
How it works.
- Choose the inputs (features) you care about, for example the size of a house, the number of rooms, and how far it is from the city center.
- Ask the model to predict the price using a simple formula that combines these inputs with adjustable numbers called weights.
- Compare the predicted price to the real price and calculate how big the mistake is, on average, using a measure such as mean squared error (larger errors count more).
- Let the training process adjust the weights a little bit so that, across many examples, the average error becomes smaller.
- Add a small penalty that discourages very large weights (L1 or L2 regularization) so the model stays stable and does not overreact to unusual data points.
This method is useful because it is fast to train, simple to implement, and easy to understand. It provides clear numbers that show how each input affects the outcome, along with confidence intervals that indicate how reliable those estimates are. Because of this transparency and speed, it often serves as a strong starting point, or baseline, for many forecasting and prediction tasks before moving to more complex models.
Example.
A simple example is revenue forecasting for a business. Imagine you collect data each month on a few key factors, such as the season of the year, your average price per product or service, how much you spent on advertising, and how many orders are already in the backlog. These factors become the inputs, often called features, that describe the situation for that month.
The model studies historical data that links those features to actual sales outcomes. Over time it learns how strongly each factor tends to push sales up or down. For instance, it might learn that higher ad spend usually increases sales, or that certain seasons are consistently quieter. Once trained, the model can take the latest values for season, price, ad spend, and backlog, and then estimate next month’s revenue as a single predicted number. This gives decision makers a structured, data based way to plan budgets, inventory, and staffing.
Neural networks: stacking simple steps to learn complex patterns
Neural networks learn patterns by stacking many simple layers. Each layer transforms the input a little, then passes it to the next layer. With enough layers, the network can represent very complex relationships.
How it works.
- The network is built from layers. Each layer has small units that take numbers in, multiply them by learned weights, add a bias term, and then pass the result through a simple function such as ReLU to decide what goes forward.
- After the data passes through all layers, the network produces an output, for example a predicted category, a score, or the next word in a sentence.
- A loss function then compares this output to the correct answer and turns the difference into a single number that represents how big the error is.
- Backpropagation works backward through the network and calculates how much each weight contributed to that error.
- An optimizer, such as Adam or stochastic gradient descent, uses these error signals to adjust the weights slightly in a direction that should reduce the loss next time.
- This process repeats over many examples and passes through the data, until the network’s performance stops improving and becomes stable.
Deep learning is especially powerful because it can work directly with unstructured data such as text, images, audio, and video. Instead of needing everything in neat tables, it can learn from natural language documents, photos, recordings, and clips that appear in real workplaces every day. This makes it well suited for modern business environments where much of the information is written, spoken, or visual.
Another major strength is that deep learning can discover useful features by itself. In traditional systems, experts had to spend a lot of time deciding which variables or patterns to feed into a model. Deep learning reduces this manual effort by learning those patterns automatically during training. Because of this, it now powers leading systems in speech recognition, language translation, image understanding, and large language models that support tools like Cyrenza.
Examples.
Face verification is a good example of how deep learning works with images in practice.
When a system verifies a face, it does not compare two photos pixel by pixel. Instead, a vision network takes each face image and turns it into a compact set of numbers called an embedding. You can think of this embedding as a kind of fingerprint in number form. Two photos of the same person will produce embeddings that are very close to each other in this numeric space. Photos of different people will produce embeddings that are farther apart.
The verification process usually follows a few steps. First, the system detects and crops the face in each image. Second, the deep learning model converts each cropped face into its embedding. Third, the system measures how similar the embeddings are, often using a distance measure such as cosine similarity or Euclidean distance. Finally, it compares this distance to a predefined threshold. If the distance is below the threshold, the system treats the faces as a match. If the distance is above the threshold, the system treats them as different people.
In real deployments, organizations tune the threshold based on their risk profile. For example, phone unlocking and office access systems often prefer very low false accept rates, even if that means occasionally asking the user to try again. Responsible use also includes strong privacy and security controls, clear consent, and compliance with local regulations on biometric data.
Summary.
Decision trees, linear regression, and neural networks are three different ways of teaching machines from data, and each one learns in its own style.
Decision trees learn by asking a series of simple questions that split the data into smaller and smaller groups. At the end of each branch, the tree reaches a decision, such as “approve” or “decline,” or a value, such as a price or a risk score. The result looks like a flowchart that you can read and explain.
Linear regression learns how to draw a straight line through data points so that it can predict a number, such as sales next month or the price of a product. It estimates how much each factor, for example advertising spend or season, pushes the prediction up or down. This gives both a forecast and a clear sense of which inputs matter most.
Neural networks learn by stacking many small computations in layers. Each layer transforms the data slightly, and together they can capture very complex patterns, such as shapes in images or meaning in sentences. They are less transparent than trees or simple lines, but they are powerful when the data is rich and unstructured.
In all three cases, the core idea is the same: the algorithm provides a method for learning from examples, instead of a fixed list of instructions.
1.2.2.3. Training and Testing
AI learns by studying many examples and must be judged on how well it handles new ones. To make that possible, teams split data into distinct parts so that learning and evaluation remain clean and trustworthy.
Training data is the material the model studies to discover patterns. During training, the model sees inputs together with the correct answers and adjusts its internal parameters to reduce mistakes on these examples. Strong training sets are representative of real conditions and include both common situations and reasonable edge cases. Labels should be accurate, features should be cleaned, and any sensitive attributes should be handled with care.
Many projects also create a validation set, carved from the training pool, to tune settings such as thresholds, model complexity, learning rate, or regularization. The validation set supports choices like early stopping and helps compare candidate models without touching the final test data.
Testing data is kept completely separate from training and validation. The model never sees these examples during learning or tuning. The test set answers a single question: does performance hold up on fresh data drawn from the same process that will exist in production. Tests should be large enough to give stable metrics and should mirror deployment conditions. For time based problems, use a later time range to imitate the future. For rare events such as fraud, preserve the real class balance so results reflect operational reality.
In short, training teaches the model, validation guides design choices, and testing verifies generalization. Learning is confirmed when a model performs well on data it has never seen before.
How scientists check if learning is real
- Split the data correctly.
Divide your dataset into three separate parts: a training set, a validation set, and a test set, and make sure the same record never appears in more than one group. When you are predicting categories, try to keep the proportion of each category similar in all three sets. For time based data, always split in chronological order so that earlier data is used for training and later data is used for testing, which mimics how the system will work in real life. - Tune on the validation set only.
Use the validation set to make design choices such as which features to include, which thresholds to use, and which model settings (for example depth, learning rate, or regularization strength) work best. The validation set is your “practice field” for improving the model without touching the final exam. - Lock the choices.
Once the results on the validation set are stable and acceptable, fix your design decisions. At this point you should stop changing features, thresholds, or model settings, so that the final evaluation remains fair and unbiased. - Evaluate on the test set once.
After the design is locked, run the model on the test set a single time and record the results. Report clear metrics such as precision, recall, ROC AUC, calibration, and cost based measures that reflect the real impact on your business or institution. - Use cross validation when data is limited.
If you have a small dataset, repeat the training and validation process several times by rotating which portion of the data plays the role of the validation set. This method, called cross validation, gives a more reliable estimate of how the model will perform on new data.
Overfitting explained
A model that performs well on training data but poorly on testing data has not learned a general rule. It has memorized the answers. That failure is called overfitting. Overfitting happens when a model is too complex for the amount or quality of data. It learns noise and coincidences that do not repeat.
Simple example.
You want a model to recognize cats in photos. If you train only on pictures of cats on carpets, the model may learn to associate carpets with cats. When it sees a cat on grass, it fails. It learned the background, not the animal.
How to reduce overfitting.
- Use more and better data. Variety helps the model learn the true signal.
- Simplify the model or add regularization such as L1 or L2 penalties and dropout.
- Use data augmentation such as flips, crops, and noise so the model sees many versions of the same truth.
- Keep features that have a causal link to the target and drop accidental proxies.
- Monitor validation metrics and stop training when they stop improving.
Data leakage warning.
Leakage happens when information from the test set or from the future sneaks into training. Examples include using a feature that is only known after the decision is made, or normalizing all data together before splitting. Leakage makes results look great in the lab and fail in production. The cure is strict separation, careful pipelines, and audits.
1.2.2.4. How Neural Networks Work (The Heart of Deep Learning)
Neural networks are computer systems that learn by connecting many small units that work together. Each unit receives numbers as input, applies a simple calculation, and passes the result forward. One unit on its own is limited, but many units linked in layers can learn useful patterns in data.
Each layer of a neural network transforms the information it receives. The first layer reads the raw data, such as pixels, audio samples, or words. Middle layers reshape this information step by step. The final layer produces an outcome, such as a label, a prediction, or a generated sentence.
In a vision model the early layers usually pick up very simple features in an image, such as edges or colour changes. Later layers combine these into shapes, and even later layers recognise full objects like faces or vehicles. The network discovers these features automatically during training by adjusting its internal weights to reduce error.
Language models follow the same idea. Early layers notice short patterns in text. Middle layers capture grammar and relationships between words. Higher layers focus on meaning, intent, and the wider context of a sentence or paragraph. With each layer the representation becomes clearer and more useful for tasks such as translation, summarisation, or answering questions.
Deep Learning refers to networks that have many layers rather than just a few. Greater depth allows the system to learn stronger and more detailed representations, as long as the data and training process support it. This approach is the basis for modern systems in image recognition, speech processing, and advanced language models, and it forms the core technology behind platforms such as Cyrenza.
1.2.2.5. Feedback — How AI Improves Over Time
AI systems continue to improve after the first training phase. Once they are in use, they keep gaining experience from new data, fresh examples, and the feedback that people provide in real work.
In real deployments this improvement happens in simple, practical ways. Day to day interactions create new records that can be stored, cleaned, and later used when the system is retrained on a scheduled basis. The knowledge sources around the model, such as document libraries, policies, and templates, can be updated more frequently so the system always has access to the latest information. Over time, teams can also train specialized versions of the base model on their own cases so that it responds in a way that fits their industry and organisation.
This is how Cyrenza’s Knowledge Workers become more effective for each client. At the start, an agent arrives with strong general skills in language, reasoning, and analysis. As it works on real tasks, it sees the client’s own terminology, report layouts, approval steps, and strategic goals. These patterns are saved into the organisation’s AI memory. On later tasks, the agent can draw on this memory and its behaviour begins to feel more and more tailored to that specific environment.
The agents grow through feedback. Useful feedback can come from several sources:
- Human corrections, such as edited drafts, redlined documents, or improved analyses.
- User signals, for example explicit approvals, rejections, or simple ratings.
- System outcomes, including resolution times, error rates, and key business metrics.
High quality feedback is recorded and organised. It is then used to adjust the prompts that drive the agents, the information they can look up, and, when needed, the way the underlying models are trained. Over time this process sharpens performance, reduces repeated mistakes, and aligns the agents with the organisation’s standards and way of working.
The Role of Humans — The Teachers of AI
Machines can absorb patterns at extraordinary speed, yet they remain only as sound as the people who guide them. Human oversight is what turns raw capability into responsible intelligence. It is humans who decide which data is appropriate, which objectives matter, and which behaviours are acceptable. Without that direction, even powerful models can learn the wrong lessons, repeat existing biases, or optimise for the wrong outcomes.
In practice, this means that people act as instructors rather than spectators. They review outputs, correct mistakes, provide context, and set boundaries. They decide what “good” looks like in a particular organisation and a particular domain. When users understand how AI systems learn, they can shape prompts more clearly, design better feedback loops, and insist on explanations that match their professional standards. The goal of a programme like this is precisely that shift, from passive use to active direction. Participants are not just end users of tools. They are emerging directors of AI, able to steer intelligent systems toward ethical, accurate, and purpose aligned work.
Now that you have a clear view of how machines learn and how humans teach them, the next step is to see the structure in visual form. In the following section, we will map how Automation, Artificial Intelligence, Machine Learning, and Deep Learning connect as a stack. That visual model will serve as a reference point for the rest of the Cyrenza curriculum, making it easier to place each new concept within the broader architecture of modern intelligence systems.