Choosing the Right Model for a Task

The Problem with Intuitive Model Selection

Most professionals who use more than one AI tool develop informal preferences over time. They gravitate toward the tool they first used successfully, the one recommended by a trusted colleague, or the one that happens to be most accessible within their organisation's technology environment. These preferences are not irrational. Familiarity with a tool's behaviour, its interface, and its particular strengths is genuinely valuable and reduces the friction of daily work.

The limitation of preference-based model selection is that it optimises for comfort rather than for fit. Different tasks place different demands on AI tools, and those demands correspond, with meaningful precision, to the capability profiles that distinguish the major models from one another. A professional who routes all of their AI-assisted work through a single tool because it is familiar will sometimes be using the best available option for the task at hand and sometimes be using a tool that is poorly matched to what the task requires. The consequences of this mismatch range from minor, when the tool produces an adequate but not optimal output, to significant, when the mismatch involves data handling obligations that the chosen tool's terms of service do not adequately address.

A deliberate model selection framework replaces intuition and habit with structured reasoning. It identifies the dimensions along which tasks and models differ, applies those dimensions systematically to a given task, and arrives at a considered choice that reflects the actual requirements of the work rather than the path of least resistance. The framework presented in this section organises that reasoning across four analytical dimensions. Together, they address the full range of considerations that bear on professional model selection.

The Four Dimensions of Model Selection

The framework operates across four dimensions, each of which addresses a distinct category of requirement that varies between professional tasks and corresponds to genuine capability or compliance differences between models. The four dimensions are task type, information sensitivity, accuracy threshold, and platform integration requirement. They are addressed in sequence below, with an explanation of the reasoning behind each dimension and the model selection implications it generates.

Dimension One: Task Type

The first and most substantive dimension of model selection is the nature of the task itself. Task type encompasses the cognitive demands the task places on the AI tool: the volume of material it must process, the type of reasoning it must apply, the kind of output it must produce, and the domain knowledge it must draw on. Different models have been built, trained, and optimised with different capability profiles, and those profiles correspond to genuine differences in performance across task types.

Document-intensive analysis. Tasks that require reasoning across large volumes of text, including the review of lengthy contracts, the synthesis of findings from multiple research sources, the analysis of full case files, or the assessment of comprehensive policy documents, are the tasks most sensitive to differences in context window capacity. A model that can hold and reason across 200,000 tokens in a single session can process a full contract, a complete set of pleadings, or a quarter's worth of financial documentation without losing coherence or requiring the material to be broken into segments. A model whose context window is significantly smaller must work with fragments, which introduces the risk that reasoning applied to one segment will not adequately account for content in another.

For this category of task, Claude's context capacity and demonstrated performance on long-document analysis make it the primary candidate. Gemini 3 Pro offers a theoretically larger context window, but professional validation of performance at extreme context lengths is less extensive, and quality at the upper limits of the context window warrants specific testing before reliance.

Code generation and technical work. Tasks involving the writing, review, or explanation of code, the design of technical architectures, or the analysis of technical systems place demands on the model that are distinct from those of document analysis or professional writing. GPT-5 has accumulated a substantial track record in technical and developer contexts, supported by the breadth of its integration with developer tools, version control platforms, and software development environments. Claude also demonstrates strong performance on coding tasks, particularly where the code is embedded within a larger analytical or explanatory context. For organisations whose technical work requires deep integration with a specific development ecosystem, GPT-5's broader integration footprint is a relevant consideration.

Research tasks requiring current information. Many professional tasks require engagement with information that postdates the training data of standard language models. Regulatory developments, recent case law, current market conditions, evolving industry standards, and breaking commercial developments all represent categories of information that a model trained on a fixed historical dataset cannot address with accuracy. For tasks where the recency of information is a primary requirement rather than a secondary preference, models with access to current web data offer a meaningful capability advantage. Grok's connection to live data from the X platform makes it relevant in contexts where real-time information is essential. Gemini, with web access enabled, draws on Google's search infrastructure and provides access to current information across a broad range of topics. The appropriate choice between these options depends on the specific nature of the current information required and the reliability standards applicable to the professional output.

Nuanced professional writing. Tasks that require the production of professional written outputs of high quality, including complex correspondence, detailed analytical reports, structured legal documents, and professional communications that must precisely calibrate tone, register, and level of qualification, place demands on the model's language and reasoning capability rather than primarily on its knowledge base. Claude's demonstrated strengths in instruction-following, careful reasoning, and appropriate expression of uncertainty make it particularly suited to this category of task. The model's tendency to qualify conclusions and acknowledge limitations, which can occasionally produce more hedged outputs than a task requires, is in most professional writing contexts an asset rather than a limitation: professional documents that acknowledge the boundaries of available evidence and the conditions under which conclusions hold are generally more defensible and more useful than those that assert conclusions beyond the evidence.

Multimodal tasks. An increasing proportion of professional work involves documents that contain not only text but also images, charts, diagrams, photographs, and other non-textual content. Financial reports include data visualisations. Insurance claims include photographs of damage. Engineering assessments include technical drawings. Legal documents may include reproductions of images or graphic evidence. For tasks that require the AI tool to process and reason across both text and non-text content, the multimodal capabilities of GPT-5 and Gemini are relevant advantages. Both models can accept and reason across image inputs alongside text, which allows the full document to be submitted rather than only the text extracted from it.

Dimension Two: Information Sensitivity

The second dimension of model selection is the sensitivity of the information involved in the task. As addressed in detail in Section 6 of Module 4.1, professional information exists on a spectrum of sensitivity that carries direct implications for which AI deployment configurations are permissible. The model selection decision is inseparable from the data handling decision, because the choice of model determines the deployment configuration and therefore the data handling terms under which the information will be processed.

Public information. Information that the organisation has placed in the public domain, or that comes from publicly available sources, carries no sensitivity-based constraint on model selection. Any of the major models, operating under standard commercial terms, may be used with this category of information. Research tasks drawing on public sources, drafting tasks using publicly available reference material, and analytical tasks applied to information that has been publicly disclosed are all appropriate for standard hosted deployment without special consideration.

Internal information. Information that is used within the organisation in the conduct of its work, but that has not been placed in the public domain and carries no specific legal or regulatory protection, requires that the AI tool's data handling terms include adequate provisions for its protection. The minimum adequate provision is a commitment from the provider that submitted data will not be used for model training. Most major providers offer this commitment through their enterprise tiers, and some include it in their standard commercial terms for business accounts. Professionals using AI with internal information should verify which tier of service they are operating under and confirm that this provision is in place.

Confidential information. Information subject to specific legal or contractual obligations of confidentiality, including attorney-client privileged communications, protected personal data under GDPR, material non-public financial information, and trade secrets, requires a deployment configuration that specifically addresses those obligations. Standard commercial terms, even at enterprise subscription tiers, are unlikely to provide adequate protection for all categories of confidential information without specific negotiation. The appropriate configurations for this category of information are, depending on the specific regulatory obligation, either a specifically negotiated enterprise agreement that includes a compliant data processing agreement and appropriate data handling commitments, or self-hosted deployment on the organisation's own infrastructure. The choice between these options depends on the specific nature of the regulatory obligation and the technical capacity of the organisation.

The practical implication for model selection is that sensitivity determines the permissible deployment configurations, and the permissible deployment configurations constrain the available model choices. A task involving personal data under GDPR requires a model deployed under a compliant data processing agreement, which narrows the field to providers with whom such agreements are available. A task involving material non-public financial information may require on-premises deployment, which narrows the field further to open source models or closed source models with on-premises licensing. Working through the sensitivity dimension first ensures that the model selection decision operates within the bounds of compliance rather than treating compliance as a secondary consideration.

Dimension Three: Accuracy Threshold

The third dimension is the accuracy threshold applicable to the task: the degree of precision, factual correctness, and reliability that the output must meet before it is fit for professional use. This dimension does not determine model selection alone, but it determines the verification standard that must be applied to whatever model is used, and in some cases it has implications for which models are most appropriate.

It is a fundamental property of all current large language models that they generate outputs through a probabilistic process that does not guarantee factual accuracy. Every model addressed in this section can produce outputs that are fluently expressed, logically structured, contextually appropriate in register and tone, and factually incorrect. The mechanisms through which this occurs are varied: the model may draw on training data that contained inaccurate information, may generate a plausible-sounding but fabricated reference or citation, may misapply a correct general principle to a specific case where it does not hold, or may produce a calculation error embedded within an otherwise sound analytical output. These failure modes are present to varying degrees in all models and are not fully eliminated by any current alignment approach.

The accuracy threshold for a given task determines what verification is required, not whether verification is required. Verification is always required. The question is how rigorous that verification must be, against what standard it is applied, and who bears the professional responsibility for the accuracy of the final output.

High-stakes professional outputs. Legal documents, regulatory filings, financial statements, medical communications, insurance coverage determinations, and any other output that will be relied upon by parties who may suffer material consequences if it is incorrect require verification against primary sources with the same rigour that would be applied to any research-based professional output. Every factual claim should be traceable to an identified primary source. Every citation should be verified in the original. Every calculation should be checked against source data. The AI tool's output is a draft and a research aid, not a finished product, and the professional who uses it bears full responsibility for the accuracy of the version that is submitted, filed, communicated, or relied upon.

In high-stakes contexts, the choice of model should reflect the model's known accuracy characteristics. Claude's tendency toward careful qualification of uncertain claims is an asset in these contexts, as it makes the boundaries of the model's confidence more visible and therefore easier to verify. GPT-5's fluency and confident assertion style can make inaccuracies harder to identify on casual review, which argues for particular rigour in verification when this model is used for high-stakes outputs.

Lower-stakes outputs. Internal drafts, brainstorming outputs, exploratory research, and early-stage analysis that will be substantially reviewed and revised before any professional reliance is placed on them carry a lower accuracy threshold. Verification is still required, but the standard is one of logical consistency and obvious error identification rather than primary source verification of every factual claim. For this category of task, any of the major models is appropriate, and the choice can be made primarily on the basis of task type and integration convenience.

Dimension Four: Platform Integration Requirement

The fourth dimension addresses the practical reality that AI-assisted professional work does not happen in isolation from the broader digital environment in which professionals work. The documents produced, the communications sent, the data analysed, and the workflows executed all occur within specific platforms and tool ecosystems. The degree to which an AI tool is natively integrated with those platforms affects the friction of incorporating AI assistance into existing workflows, the quality of the contextual information the AI tool can access, and the consistency with which AI assistance can be embedded in routine professional practice.

Microsoft 365 environments. Organisations whose professional work is organised primarily around Microsoft Word, Excel, PowerPoint, Outlook, and Teams operate within an ecosystem for which Microsoft Copilot, powered by GPT, provides the deepest native integration. Copilot is embedded directly within these applications, providing AI assistance without the need to switch between tools, copy content between applications, or manually manage the transfer of context from the working document to the AI interface. For organisations that have made a significant investment in the Microsoft 365 ecosystem and wish to integrate AI assistance as seamlessly as possible into existing workflows, the integration advantage of Copilot is a genuine practical consideration that may outweigh marginal differences in model capability.

Google Workspace environments. Organisations whose work is organised around Google Docs, Sheets, Gmail, Drive, and Meet operate within an ecosystem for which Gemini provides equivalent native integration. The ability to access AI assistance within the applications already in use, drawing on documents stored in Drive and context from Gmail, reduces switching costs and improves the contextual grounding of AI outputs in a way that a separately accessed model cannot replicate. For Google Workspace-centric organisations, this integration consideration is directly relevant to model selection.

Platform-agnostic environments. Organisations that do not have a strong integration dependency on a specific productivity ecosystem, or that use a mixture of tools without a dominant platform, have greater flexibility in model selection and can make their choice primarily on the basis of task type, sensitivity, and accuracy threshold. In these environments, the most appropriate model for a given task is the one whose capability profile best matches the task requirements, without the integration consideration constraining the choice.

Cyrenza platform environments. Professionals working within the Cyrenza platform operate in an environment where model selection is managed at the platform level by the Cyrenza Context Fabric. The CCF routes each Knowledge Worker's task to the appropriate underlying model based on the task type, with the organisation's context, Vault documents, permissions, and current task details assembled and applied before the model processes the request. The professional working within Cyrenza does not need to make explicit model selection decisions for individual tasks. The platform manages this on their behalf, applying the same analytical dimensions described in this section at the infrastructure level rather than requiring each user to apply them manually.

Applying the Four Dimensions Together

The four dimensions described in this section are not independent. They interact, and in some cases the constraints imposed by one dimension narrow the permissible options to the point where the remaining dimensions become secondary. The appropriate sequence for applying the framework is therefore to work through the dimensions in the order in which they operate as constraints.

Sensitivity is the most constraining dimension, because it determines the deployment configurations that are permissible for a given task. If the information involved in the task is confidential under regulatory or legal obligations, the permissible deployment configurations may be narrow, and model selection is constrained accordingly. The sensitivity dimension should therefore be assessed first, and the remaining dimensions applied within the set of models and configurations that the sensitivity assessment permits.

Task type is the most substantive dimension for capability selection within the permissible set. Once the sensitivity dimension has established which models and deployment configurations are appropriate, the task type dimension identifies which of those options is best matched to the cognitive demands of the task.

Accuracy threshold determines the verification standard that must be applied to the chosen model's output, and in some cases supports the choice between models with different accuracy characteristics. It is addressed third because it operates as a quality assurance consideration rather than a selection filter in most cases.

Platform integration is the final dimension, applicable in cases where multiple models have passed the sensitivity, task type, and accuracy assessment with similar outcomes. Integration convenience is a legitimate consideration in professional practice, and where it does not require accepting a model that is less appropriate on the more substantive dimensions, it is reasonable to allow it to inform the final choice.

The Evolving Nature of Model Selection

A framework for model selection would be incomplete without an acknowledgment that the specific conclusions it generates are subject to change as the AI landscape evolves. The capability profiles described in this section and the preceding sections of this module reflect the current state of the major models. New model versions are released regularly. Capability gaps narrow or shift. New integration partnerships are announced. Data handling terms are revised through enterprise negotiations that expand the provisions available to professional users.

The framework itself, however, remains stable. The four dimensions of task type, information sensitivity, accuracy threshold, and platform integration requirement will remain relevant as long as AI tools are used in professional environments, because they correspond to the enduring requirements of professional work rather than to the transient properties of specific model versions. A professional who understands these dimensions, and who applies them with the understanding of model characteristics built through the earlier sections of this module, is equipped to make appropriate model selection decisions not only for the current landscape but for the landscape as it continues to develop.