3.2

Practical Exercise: The Budget and Performance Audit

30 min

This practical exercise teaches participants how to evaluate AI workflow economics under realistic enterprise conditions. Many organisations can build a workflow that produces strong outputs in a demonstration setting. Fewer organisations can build workflows that remain affordable, fast enough, and reliable when scaled to thousands of items per month. This exercise develops that competence.

Participants will be required to treat workflow design as an operational planning activity. They will evaluate cost, latency, and throughput together. They will then design a two-stage approach that concentrates expensive capability only where it produces measurable value.

The scenario is intentionally framed around compliance review because compliance work involves three realities that make economic discipline essential:

  • High volume, since organisations process many documents routinely.

  • High risk, since errors can lead to contractual exposure, regulatory breaches, and reputational damage.

  • Governance requirements, since a human must be able to review, validate, and sign off on final decisions.

The purpose is to practise designing a workflow that respects these realities.

Learning Goals for the Exercise

By completing this exercise, participants will learn to:

  1. Separate high-volume screening work from high-stakes deep review work.

  2. Use batch processing to handle scale efficiently.

  3. Reserve deep reasoning capacity for exception cases that require high confidence and careful judgement.

  4. Estimate total monthly cost in a way that can be communicated to stakeholders.

  5. Estimate time to completion, including both screening and escalation review.

  6. Explain why the chosen design balances speed, cost, and risk.

Scenario

You are the lead architect for a new compliance review system. The organisation processes 5,000 contracts per month.

The objective is to identify contracts that are likely to present compliance or legal risk. These high-risk contracts must be escalated for legal review and human sign-off. Most contracts are expected to be routine. A small subset will require careful analysis.

The organisation wants a system that:

  • screens all contracts consistently

  • flags the highest-risk cases reliably

  • produces review artifacts that legal staff can assess efficiently

  • remains economically sustainable as volume grows

Available Approaches

You have two technical options available for processing the contracts.

Option 1: Deep Reasoning Model in Real-Time

  • Cost: $2.00 per contract

  • Time: 1 minute per contract, per request

  • Strength: strong reasoning and nuanced interpretation

  • Weakness: expensive at scale and slow for high-volume work

Option 2: Fast Model in Batch Processing

  • Cost: $0.10 per contract

  • Time: 4 hours for the full batch of 5,000 contracts

  • Strength: efficient, scalable, consistent for structured screening

  • Weakness: less suited to nuanced interpretation for the most complex edge cases.

Task

You must design a two-stage workflow.

Stage 1: Initial Screening

  • Decide which approach you will use to screen all 5,000 contracts.

  • The output of screening must include:

    • a risk score or classification, such as low, medium, high

    • a brief explanation for the risk label

    • the main clauses or patterns that triggered the risk label

    • a recommendation on whether escalation is required

Stage 2: Deep Review and Human Sign-Off

  • A subset of 50 contracts will be escalated as the highest-risk cases.

  • Decide which approach you will use for the final review of these 50 contracts.

  • The output of deep review must include:

    • a structured risk summary suitable for legal review

    • highlighted clauses with short explanations

    • suggested remediation options, such as revised wording or escalation notes

    • a clear recommendation statement that supports human sign-off

Required Calculations

You must calculate:

  1. Total monthly cost

  2. Time to completion for screening

  3. Time to completion for deep review

Your calculations should be explicit and easy to follow, as though presenting them to a finance and compliance stakeholder group.

Method: How to Approach the Exercise

Participants should follow this structured method.

Step 1: Identify the Work Distribution

The first insight is that the workflow has two very different workloads:

  • A large high-volume workload: 5,000 contracts

  • A small high-risk workload: 50 contracts

Stage 3 expects participants to treat these workloads differently, rather than processing everything with the same model.

Step 2: Select the Screening Strategy

Screening is primarily a classification problem. It benefits from consistency, structure, and scale. In most organisations, the correct approach is to screen at low cost and high throughput, then reserve expensive reasoning for exceptions.

Participants should consider:

  • Can the screening decision be expressed as rules, patterns, and categories?

  • Is the screening objective to detect risk signals rather than to produce final legal judgement?

  • Is there tolerance for false positives at this stage, provided false negatives are controlled?

Step 3: Select the Escalation Strategy

Deep review is a judgement problem. It benefits from careful reasoning, longer context, and stronger synthesis. Legal staff need structured outputs that support decision-making and sign-off.

Participants should consider:

  • Are the escalated contracts likely to include complex clause interactions?

  • Is interpretation sensitive to jurisdiction or organisational policy?

  • Does the output require clear justification, traceability, and defensible reasoning?

Step 4: Compute Costs

Calculate monthly cost by multiplying unit cost by volume at each stage.

Step 5: Compute Time to Completion

Calculate time separately for:

  • screening completion time

  • deep review completion time

Consider whether deep review can occur in parallel with screening output availability.

Worked Example Outcome

The following demonstrates a disciplined design and the required calculations.

Stage 1 Decision: Use the Fast Model in Batch for Screening

Reasoning:
Screening is high-volume and repeatable. Batch processing provides predictable cost, predictable load, and consistent classification output. It avoids real-time bottlenecks and supports standardised scoring across all 5,000 contracts.

Screening cost calculation:
5,000 contracts × $0.10 = $500 per month

Screening time to completion:
4 hours for the full batch

Stage 2 Decision: Use the Deep Reasoning Model in Real-Time for the 50 High-Risk Contracts

Reasoning:
The final 50 contracts represent the highest legal and compliance risk. The objective is deeper analysis, clause-level explanation, and defensible reasoning for human sign-off. The cost is justified because the volume is small and the risk is high.

Deep review cost calculation:
50 contracts × $2.00 = $100 per month

Deep review time to completion:
50 contracts × 1 minute each = 50 minutes total processing time
If reviews are performed sequentially, this is approximately 50 minutes.
If reviews can be parallelised across users or queues, completion time may be shorter, but participants should state their assumption.

Total Monthly Cost

$500 (screening) + $100 (deep review) = $600 per month

Total Time to Completion

  • Screening completes in 4 hours

  • Deep review takes 50 minutes once the 50 high-risk contracts are identified

A practical operational interpretation is:

  • Within the same working day, screening can complete in the background.

  • High-risk reviews can then be completed in under an hour of model processing time, plus human review time.

Expected Reasoning Pattern

A disciplined design typically:

  • uses the fast batch model for broad screening and classification

  • uses the deep reasoning model for a small set of high-risk contracts that require careful interpretation

  • produces review-ready artifacts at each stage so humans can validate decisions efficiently

  • concentrates cost where marginal reasoning quality produces meaningful reduction in risk

This structure is a standard pattern in enterprise AI deployments. It mirrors how human organisations already operate. Routine work is handled at scale, while specialists focus on exceptions.