3.3

Failure Patterns Beyond Hallucination

18 min

A lease abstract that correctly extracts and accurately represents the majority of the commercial terms in a complex lease, but silently omits a co-tenancy clause whose operation could materially affect the tenant's obligations under certain circumstances, produces a document that appears comprehensive and is professionally incomplete in a way that may not surface as a problem until months after the transaction has completed and the clause is triggered. The abstract looks like careful, thorough work. The omission is invisible in the abstract itself because an absent item leaves no visible trace. Detection requires the reviewer to read the actual lease and compare its terms against the abstract systematically, rather than reading the abstract and assessing whether it appears complete.

A consulting report that correctly frames the strategic question, applies a recognisable analytical framework with apparent rigour, and presents conclusions in the structured, evidence-referenced form that professional consulting deliverables conventionally take, but bases its market sizing on figures the model generated from general patterns rather than from the data the client provided, produces advice that sounds rigorous and rests on invented evidence. The structure of the report provides no signal that the specific figures are not drawn from the client's data, because the model has learned how to present generated figures in the format and with the contextual explanation that real figures receive.

The detection challenge with this failure pattern is that surface reading and structural assessment are both insufficient to catch it. The output passes the tests that surface reading applies because its grammar, structure, tone, and apparent reasoning are all appropriate. The error is in the substance, in the specific factual claim that is wrong, the specific provision that is misread, the specific interaction between clauses that is missed, or the specific figures that are invented rather than sourced. Detecting it requires the reviewer to engage with the specific substantive claims in the output at the level of checking them against source material, rather than reading the output as a whole and assessing its general quality. The discipline this demands is more time-consuming than surface reading, and it is the discipline that protects professional work from the category of AI error that is most likely to survive an insufficiently rigorous review.

Invented Specifics in the Absence of Information

The third failure pattern exploits a specific and important property of the generation mechanism: the model generates a response to every question submitted to it, regardless of whether it has access to the specific information needed to answer that question accurately. When a human professional is asked a question and does not have the information required to answer it reliably, the professional response is to acknowledge the gap, request the necessary information, or qualify the response explicitly to reflect the limits of available knowledge. The AI model's generation mechanism does not produce this response reliably, because the mechanism is optimised to generate the most statistically likely continuation of the text, and a continuation that provides a substantive answer is statistically more likely than a continuation that acknowledges a gap in the available information. The result is that the model fills informational gaps with plausible-sounding content rather than flagging them.

A practitioner who asks an AI tool to summarise a specific client's historical claims experience without providing the actual claims data will frequently receive a detailed, structured summary that reads as though it was produced from comprehensive records. The summary may reference specific claim types, approximate frequencies, resolution patterns, and aggregate figures. These details will be presented with the specificity and formatting that characterise a genuine data-based analysis. They will have been generated by the model from the statistical patterns it learned about how claims summaries of that kind are typically structured and what kinds of figures they typically contain, rather than from the client's actual claims history, which was never provided to the model.

A financial analyst who asks the tool to comment on a specific company's revenue trajectory and margin performance without providing the company's actual financial statements may receive commentary that references specific percentage changes in revenue, specific margin figures, and specific period comparisons. These figures will be presented in the confident, precise language of financial analysis. They will have been generated by the model from patterns in financial commentary rather than from the company's actual results. A real estate professional who asks the tool about comparable sales in a specific local market without providing actual transaction data may receive a list of comparables with addresses, sale prices, and sale dates that reads as precisely sourced market evidence and was generated from the model's general knowledge of how comparable sales analyses are structured and what kinds of figures they typically contain.

This failure pattern is particularly prevalent when practitioners submit questions that require specific, current, or organisation-internal information without providing that information as context. The model's willingness to provide a detailed, confident-sounding answer is not evidence that it has access to the information required to answer accurately. It is evidence that the generation mechanism has found a statistically likely continuation of the text that takes the form of a detailed, confident answer. Every specific claim in an AI output that could only be accurate if the model had access to specific data the practitioner did not provide must be treated as potentially invented until verified against the actual source. This applies with particular force to numerical figures, specific dates, named individuals, referenced documents, and any other claim whose accuracy depends on access to specific information rather than on general professional knowledge.

Inconsistency Across Long Outputs

The fourth failure pattern is a consequence of the sequential character of the text generation process and becomes more pronounced as the length and complexity of the AI output increases. The model generates text one word at a time, with each successive word predicted on the basis of the full sequence of text that precedes it. Maintaining perfect consistency across a long, complex output requires the model to track and honour all of the constraints, definitions, analytical positions, and factual claims it has already generated as it continues generating additional text. This tracking becomes progressively more demanding as the output grows longer, because the constraint set that must be respected grows with every additional claim or analytical position the model produces.

The failure mode that results from this property takes several forms in professional work. A position taken in an early section of a long output may be qualified, modified, or implicitly reversed in a later section without the modification being explicitly acknowledged, because the model's attention to the earlier constraint has degraded as the generation has continued. A definition established at the beginning of a document may be applied inconsistently as the document proceeds, with the term being used in one sense in some sections and a slightly different sense in others. An analytical conclusion reached in one part of a report may be incompatible with an assumption made in a different part, creating an internal contradiction that undermines the reliability of both conclusions.

In professional work, internal inconsistency is a serious quality problem whose consequences extend beyond the specific sections where the inconsistency appears. A risk assessment that describes a specific risk factor as material in the executive summary and immaterial in the detailed analysis creates a document whose conclusions cannot be relied upon by the decision-makers receiving it, because the document provides internally contradictory guidance about the significance of the factor in question. A contract review that identifies a specific clause as non-standard in the high-level summary but treats the equivalent clause as standard in the detailed section-by-section analysis creates confusion for the solicitor and the client about what the reviewer's professional assessment actually is. A due diligence report that applies different valuation assumptions in different sections, without acknowledging or reconciling the difference, produces overall conclusions that rest on an inconsistent analytical foundation and that cannot be defended if challenged.

The verification discipline that addresses this failure pattern is distinct from the verification disciplines that address the preceding three patterns. Checking for fabricated references requires verification against primary sources. Checking for plausible but incorrect analysis requires substantive engagement with the specific claims in the output. Checking for invented specifics requires confirming that the specific figures, dates, and details in the output correspond to actual data the practitioner provided. Checking for internal inconsistency requires reading the output as a complete document with attention to whether the positions, definitions, and conclusions in each section are compatible with those in every other section. For long professional outputs, this is most efficiently accomplished by reading the complete output in sequence before focusing verification effort on specific substantive claims, so that the overall analytical structure and the consistency of positions across sections can be assessed before the detail-level verification begins. The investment of time this requires is proportionate to the professional consequences of delivering a document whose internal contradictions undermine the reliability of its conclusions, which in the professional domains this programme addresses are consistently significant.