Evaluating Outputs Systematically

Introduction

This section establishes a disciplined evaluation framework for AI-assisted professional work. As AI Knowledge Workers generate drafts, analyses, and recommendations at high speed, the standard for adoption cannot be based on fluency, structure, or confidence of presentation. Professional reliability requires systematic scrutiny. Learners learn to evaluate outputs using repeatable methods that reveal hidden assumptions, test reasoning integrity, and confirm alignment with evidence and organisational constraints.

The section introduces evaluation as an active process of interrogation rather than passive reading. Learners are trained to extract the core claim, identify the logic chain that supports it, and locate the assumptions that must hold true for the output to remain valid. They learn to examine whether the output is grounded in credible sources, whether intermediate reasoning steps are complete, and whether any claims require verification before use in decisions or deliverables.

Learners are then introduced to triangulation as a primary reliability technique. Triangulation strengthens evaluation by requiring confirmation through independent checks. This includes cross-referencing outputs against trusted records, policies, and primary materials, as well as testing the reasoning by requesting alternative methodologies and counter-arguments. Divergence between reasoning paths becomes a signal of uncertainty, prompting deeper human review and, where necessary, escalation to domain expertise.

By the end of this section, Learners will be able to apply a structured protocol for evaluating AI outputs, detect weaknesses efficiently, and approve work products with confidence grounded in evidence, logic integrity, and professional standards.

2.1 The Protocol of Interrogation

2.1.1 Purpose of the Protocol

AI Knowledge Workers can produce outputs that appear complete, well structured, and professionally written. This speed and polish increase productivity, yet they also increase the risk of unexamined adoption. The Protocol of Interrogation is the evaluation discipline used in Cyrenza to prevent silent failure and preserve professional standards. It trains users to evaluate outputs as analytical material that must earn trust through evidence, logic, and alignment with constraints.

Interrogation is not scepticism for its own sake. It is a structured method for determining whether an output is reliable enough to inform decisions, stakeholder communication, and operational action.

Passive reading focuses on comprehension of what the output says. Active evaluation focuses on validation of whether the output should be trusted and used. Learners learn to shift from reading for clarity to reading for integrity.

Active evaluation requires the reviewer to answer four questions:

What is the output claiming
What evidence supports the claim
What assumptions are required for the claim to remain valid
What constraints or standards must the output satisfy

This approach prevents presentation quality from becoming the basis of trust.

2.1.3 Step One: Clarify the Objective and Decision Context

2.1.3.1 Identify the Decision Question

Every evaluation begins by identifying what decision or deliverable the output is meant to support. Learners are taught to restate the decision question in precise terms.

Examples of decision questions include:

Should this contract clause be accepted, revised, or rejected
Which scenario should be used for budgeting under new conditions
Which operational bottleneck should be prioritised for redesign
Which marketing hypothesis should be tested next

A clear decision question defines what must be true for the output to be useful.

2.1.3.2 Identify the Audience and Standard of Proof

Different audiences require different standards. A draft for internal brainstorming can tolerate more uncertainty than a board pack, a client recommendation, or a regulatory submission. Learners learn to classify the output by the level of consequence and match evaluation depth accordingly.

2.1.4 Step Two: Extract the Core Claims

2.1.4.1 Identify the Claims That Matter

Learners learn to identify which statements are load-bearing, meaning that if they are wrong, the output becomes unreliable or harmful. These claims usually include:

Numerical values, thresholds, and comparisons
Legal interpretations, obligations, and risk classifications
Causal explanations and driver analysis
Recommendations and prioritisation choices
Compliance statements and policy alignment claims

The goal is to focus scrutiny where it matters most.

2.1.4.2 Separate Facts, Interpretations, and Recommendations

Learners are trained to separate the output into three layers:

Facts: what is asserted as true
Interpretations: what the facts are said to mean
Recommendations: what action is suggested

This separation makes it easier to validate the foundation before assessing the conclusion.

2.1.5 Step Three: Identify and Challenge Assumptions

2.1.5.1 Why Assumptions Require Explicit Review

Many AI errors are not caused by incorrect writing. They are caused by implicit assumptions that the reviewer never notices. Learners learn that any conclusion is only as strong as its assumptions.

Common assumption categories include:

Data completeness assumptions
Stability assumptions about markets, operations, or behaviour
Policy and permission assumptions
Typical case assumptions that may not hold in edge cases
Time horizon assumptions, such as short-term versus long-term effects

2.1.5.2 Assumption Testing Questions

Learners apply a standard set of questions:

What must be true for this conclusion to hold
Which assumptions are uncertain or unverified
Which assumptions are organisation-specific
What changes would overturn the recommendation

Assumption testing shifts evaluation from surface reading to structural validation.

2.1.6 Step Four: Verify the Evidence Base

2.1.6.1 Identify the Source of Each Key Claim

Learners verify whether key claims are grounded in:

Provided documents and internal records
Organisational policies and standards
Approved datasets or system-of-record sources
Verified external references where relevant

If a claim is not tied to a source, it is treated as unverified until proven.

2.1.6.2 Validate the Use of Evidence

Evidence can be present and still misused. Learners review whether the output:

Uses the correct source for the correct claim
Represents the source accurately
Applies the source within the correct context and limitations
Avoids extrapolating beyond what the evidence supports

This is especially important in legal, financial, and compliance workflows.

2.1.7 Step Five: Inspect the Logic Chain

2.1.7.1 Confirm the Intermediate Steps

Learners are trained to inspect whether the reasoning includes valid intermediate steps between premises and conclusion. Logical integrity requires that the output shows how it gets from inputs to the recommendation.

Where the chain is unclear, Learners request:

A step-by-step explanation
A restatement of reasoning in a structured format
A decomposition of drivers, trade-offs, and dependencies

2.1.7.2 Look for Common Logic Failures

Learners check for:

Logical leaps without justification
Over-generalisation where exceptions should apply
Unstated trade-offs and missing constraints
Inconsistent reasoning across sections of the output

Logic inspection is essential for preventing plausible outputs from becoming untested decisions.

2.1.8 Step Six: Evaluate Constraint and Governance Alignment

2.1.8.1 Professional Outputs Must Respect Boundaries

Learners review whether the output respects:

Policy requirements and compliance rules
Permission constraints and information access rules
Authority limits, including escalation pathways
Internal style standards for official deliverables

An output that violates constraints cannot be approved, even if it is analytically strong.

2.1.8.2 Identify Where Escalation Is Required

Some outputs must trigger escalation to a domain professional, such as legal counsel, risk leadership, or finance governance. Learners learn to recognise these triggers and treat escalation as a control mechanism, not a weakness.

2.1.9 Step Seven: Produce an Evaluation Result

2.1.9.1 Classify the Output

Learners classify outputs into one of four evaluation outcomes:

Approved for use
Approved with minor edits
Requires refinement and re-evaluation
Not acceptable without additional evidence or escalation

This classification makes review decisions explicit and repeatable.

2.1.9.2 Document the Rationale

Professional evaluation includes brief documentation of:

What was verified
What assumptions remain
What risks are accepted
What changes were required

This supports defensibility and continuity across workflows.

2.1.10 Applying the Protocol Within Cyrenza

Cyrenza supports interrogation by producing structured drafts and enabling iterative refinement through human feedback. The protocol ensures that review remains systematic rather than reactive. Learners learn to use Cyrenza outputs as high-quality starting points, then apply interrogation to confirm accuracy, strengthen reasoning, and ensure alignment with organisational standards before approval.

The Protocol of Interrogation is the practical discipline that keeps augmented intelligence reliable at scale. It protects against silent failure while preserving the speed and capacity advantages of a digital workforce.

2.2 Verification Through Triangulation

2.2.1 Purpose of Triangulation

Triangulation is the discipline of validating an AI-generated output through independent confirmation. In professional environments, a single reasoning path is rarely sufficient evidence for reliability, particularly when the output influences decisions, client deliverables, compliance outcomes, or financial commitments. Triangulation reduces the likelihood of silent failure by forcing the output to survive multiple tests that reveal hidden assumptions, missing constraints, and brittle logic.

Triangulation is not a sceptical posture. It is a professional verification method that increases confidence through structured checks. It ensures that trust is earned through evidence and consistency rather than through presentation quality.

2.2.2 The Core Principle

Triangulation follows a simple principle: no high-impact conclusion should rely on one source, one method, or one narrative. Learners learn to confirm critical claims using at least two independent anchors:

Independent sources of truth
Independent reasoning paths
Independent methodologies or representations

When these anchors align, confidence increases. When they diverge, uncertainty becomes visible and must be resolved through human judgment.

2.2.3 Triangulation Method One: Cross-Checking Against Sources of Truth

2.2.3.1 What Counts as a Source of Truth

Learners are trained to validate outputs against sources that are authoritative for the specific context. Common categories include:

System-of-record data, such as finance systems, HR systems, CRM records, and claims platforms
Approved organisational policies, standards, templates, and playbooks
Signed contracts, legal precedents, and official correspondence
Primary documents, such as reports, datasets, meeting minutes, and audit trails
Trusted external references where appropriate, such as regulators, standards bodies, and verified market data

A source of truth is defined by governance, not by convenience.

2.2.3.2 What to Cross-Check First

Not every detail requires verification.Learners focus on the load-bearing elements, including:

Numerical claims, calculations, and thresholds
Legal obligations, timelines, and contractual rights
Regulatory references and compliance statements
Definitions and scope boundaries
Recommendations that trigger action or stakeholder communication

2.2.3.3 Evidence Mapping

Learners learn to map each critical claim to its supporting source. If a claim cannot be mapped, it is treated as unverified. This practice strengthens defensibility and improves review efficiency, because the reviewer knows exactly what has been validated.

2.2.4 Triangulation Method Two: Alternative Reasoning Paths

2.2.4.1 Why Alternative Reasoning Matters

An output may appear coherent while depending on weak assumptions or missing steps. By forcing the system to produce a second reasoning path, the reviewer can detect brittleness. If the model produces inconsistent conclusions when reasoning differently, this indicates that the answer is not stable.

Learners learn to treat stability across reasoning paths as a reliability signal.

2.2.4.2 Common Alternative Reasoning Techniques

Learners use several structured approaches:

Stepwise derivation

Request the reasoning as explicit steps, showing intermediate conclusions and dependencies. This exposes leaps, missing assumptions, and logic gaps.

Constraint-first reasoning

Request a reasoning path that begins with constraints, policies, and risk posture, then builds the conclusion inside those boundaries. This tests whether the output respects governance.

Assumption-first reasoning

Request the model to list assumptions first, then derive the conclusion only from those assumptions. This makes hidden dependencies visible.

Quantitative versus qualitative framing

Request the answer using a different representation, such as a table, a decision tree, a risk register, or a scenario matrix. Representation changes can reveal weak links that a narrative format hides.

2.2.4.3 Interpreting Divergence

Divergence across reasoning paths does not automatically mean the output is wrong. It means uncertainty is present and must be resolved. Learners learn to respond by:

Identifying which assumptions cause the divergence
Verifying those assumptions against sources
Narrowing the decision question
Refining constraints and requesting a new output
Escalating to domain experts when the consequence level requires it

2.2.5 Triangulation Method Three: Counterpoint and Adversarial Testing

2.2.5.1 The Role of Counterpoint

Professional decisions require awareness of alternative viewpoints, hidden risks, and unintended consequences. Learners learn to request the counterpoint to an argument or recommendation. This tests whether the output can withstand challenge, and it surfaces trade-offs that may not be visible in a single narrative.

Counterpoint testing is especially useful when the output includes recommendations.

2.2.5.2 Structured Counterpoint Prompts

Learners apply disciplined counterpoint requests such as:

Provide the strongest arguments against this recommendation
Identify risks and failure modes that could invalidate the conclusion
List scenarios where the opposite decision would be preferable
Highlight stakeholders who may object and why

The purpose is not debate. The purpose is completeness and risk visibility.

2.2.5.3 Using Counterpoint to Improve Final Deliverables

Counterpoint outputs often become inputs for refinement, allowing the final deliverable to:

Include explicit risk notes and mitigations
Strengthen justification for the chosen direction
Improve defensibility with stakeholders and governance bodies
Reduce blind spots and overconfidence

2.2.6 Triangulation Method Four: Independent Replication Within Cyrenza

2.2.6.1 Using Multiple Role-Based Agents

Cyrenza supports triangulation by enabling work to be replicated by different role-based Knowledge Workers. Learners learn to request independent replication from a second agent with a different professional orientation.

Examples include:

Finance analysis replicated by an Internal Auditor agent
Contract review replicated by a Compliance and Ethics agent
Marketing recommendations replicated by a Market Research Analyst agent
Insurance triage replicated by a Risk Assessment Officer agent

Replication improves reliability because it introduces alternative reasoning lenses and increases the chance that weaknesses are caught.

2.2.6.2 Peer Review as a Digital Workforce Pattern

Learners learn to treat second-agent checks as a digital equivalent of peer review. This reduces individual agent bias and improves consistency across outputs.

2.2.7 A Practical Triangulation Protocol

2.2.7.1 The Minimum Standard for High-Impact Outputs

Learners apply a minimum standard based on task consequence. For high-impact outputs, the minimum includes:

Source cross-check of all load-bearing claims
Alternative reasoning path for the main conclusion
Counterpoint or risk challenge for recommendations

If any of these steps surface uncertainty, the output returns to refinement before approval.

2.2.7.2 Time-Effective Implementation

Triangulation can be efficient. Learners are trained to prioritise effort by focusing on:

Claims that trigger action
Claims that carry compliance or reputational consequences
Claims that depend on uncertain assumptions
Claims that cannot be easily reversed once acted on

This keeps verification rigorous without becoming operationally burdensome.

2.2.8 Interpreting Results and Taking Action

2.2.8.1 When Triangulation Aligns

When sources and reasoning paths converge, confidence increases. The output can move toward approval, with any remaining uncertainty documented.

2.2.8.2 When Triangulation Diverges

When results diverge, Learners do not force agreement. They isolate the cause and respond with:

Additional verification against sources
Clarification of constraints and objectives
More specific task framing
Escalation to domain expertise when required

Divergence is treated as useful information. It signals where the reasoning is fragile and where human judgment must lead.

2.2.9 Role of Triangulation in Reasoning Control

Triangulation is a core mechanism of reasoning control in Cyrenza. It operationalises professional scepticism without slowing productivity. It ensures that AI outputs remain inputs to human judgment rather than substitutes for verification. When applied consistently, triangulation increases reliability, strengthens defensibility, and prevents silent failure from entering decisions and deliverables.