4.2

Closed Source and Open Source Models

12 min

The Origin of the Distinction

To understand the difference between closed source and open source AI models, it is necessary to understand what an AI model actually consists of at a technical level, because the distinction between the two categories is defined precisely by what is and is not made publicly available.

A large language model is, at its core, a mathematical structure of enormous complexity. It consists of billions of numerical parameters, known as weights, arranged within a specific architectural design. These weights are not programmed by hand. They are learned through a training process in which the model is exposed to vast quantities of text and adjusted, through millions of iterative calculations, until its outputs align with the objectives set by its developers. The result of this process is a trained model: a specific configuration of weights that encodes, in a form the model can use to generate responses, an enormous amount of information about language, knowledge, reasoning patterns, and the relationships between ideas.

The trained weights are the model. They represent the accumulated product of the training process: the compute time consumed, the data curated, the engineering decisions made, and the alignment work performed to make the model safe and useful. For the organisations that develop AI models at scale, these weights represent an asset of significant technical and commercial value. The decision about whether to release those weights publicly or retain them as proprietary intellectual property is the defining choice that separates closed source from open source models.

Closed Source Models: Structure, Implications, and Professional Relevance

A closed source model is one whose trained weights are retained by the developing organisation and not released to the public. Users of closed source models access the model's capabilities through an interface provided by the developer, an application programming interface that allows developers to build products on top of the model, or an enterprise agreement that provides access under negotiated terms. At no point does the user obtain the model weights themselves. They interact with the model as a service, provided and operated by the developing organisation.

The major closed source model providers currently serving the professional market include Anthropic, whose Claude model family is the underlying engine of the Claude.ai product and the Cyrenza platform; OpenAI, whose GPT model family powers ChatGPT and Microsoft Copilot; Google DeepMind, whose Gemini model family underlies the Gemini product and Google Workspace AI features; and xAI, whose Grok model is integrated into the X platform and available through a separate API. Each of these organisations has made the strategic decision to develop, improve, and provide access to their models as a service rather than releasing the model weights for public use.

This structure has several practical consequences for professional users.

Consistency and continuous improvement. Because the model is operated by its developer, the developer is responsible for its performance, its reliability, and its ongoing improvement. Updates to the model, corrections to known limitations, improvements in capability, and responses to newly identified safety concerns are all managed by the provider and applied to the service without requiring any action from the user. A professional using a closed source model through a commercial product always has access to the current version maintained by the developer, without needing to manage updates, migrations, or compatibility issues.

Safety and alignment work. Closed source model developers invest significantly in what is known as alignment: the process of training and adjusting models so that their outputs are helpful, accurate, and unlikely to cause harm. This work includes the development of internal policies about what the model will and will not assist with, the implementation of filtering systems that identify and decline harmful requests, and ongoing evaluation of the model's behaviour across a wide range of inputs. For professional environments, particularly regulated ones, this alignment work provides a degree of baseline safety assurance that would be difficult and costly to replicate independently. The model arrives with safety properties built in, rather than requiring the deploying organisation to develop and maintain those properties themselves.

Constrained customisation. The service-based access model imposes limits on how the model can be modified or adapted. A professional using a closed source model can influence its behaviour through prompting, through the use of system-level instructions where the provider supports this, and through the configuration options available within the product. What they cannot do is modify the model's underlying weights, retrain it on proprietary data in ways not supported by the provider's platform, or alter its fundamental behaviour in ways the provider does not permit. For the majority of professional use cases, these constraints are not significant limitations. For organisations that wish to build deeply specialised AI capabilities tailored to a specific domain, they may become relevant.

Data handling under provider terms. When a professional submits information to a closed source model, that information is processed on the provider's infrastructure under the terms of the applicable service agreement. The specific data handling provisions, covering whether submitted data may be used to train future models, how long it is retained, who within the provider's organisation can access it, and under what circumstances it may be shared with third parties, are set out in the provider's terms of service and, where a separate agreement has been negotiated, in a data processing agreement. These terms vary significantly between providers and between tiers of service within a single provider's offering. The professional obligation to review and understand these terms before submitting sensitive information is clear and direct.

Open Source Models: Architecture, Ecosystem, and Genuine Capability

An open source model is one whose trained weights have been released publicly, typically under a licence that specifies the conditions under which they may be used, modified, and redistributed. The release of model weights means that any individual or organisation with the technical capability to do so can download the model, run it on their own infrastructure, examine its architecture, modify its behaviour, and build products or services using it, subject to the terms of the applicable licence.

The open source AI ecosystem has grown substantially over the past several years and now includes models of genuine professional capability. Meta's Llama model family, released under licences that permit commercial use with certain restrictions, represents perhaps the most widely deployed open source model family and has been the foundation for a large number of specialised models developed by academic institutions, commercial organisations, and independent researchers. Mistral AI, a French organisation, has released a series of open source models with strong performance relative to their size, making them particularly relevant in European professional and research contexts. The Falcon model family, developed by the Technology Innovation Institute in Abu Dhabi, represents another significant open source contribution. Microsoft's Phi series of smaller models has demonstrated that carefully curated training data can produce capable models with substantially lower computational requirements than larger alternatives.

This ecosystem is not a uniform collection. Open source models vary enormously in their capability, their safety properties, their training data composition, their computational requirements, and the quality of their documentation and community support. The decision to use an open source model is not a single decision but a series of decisions about which model, under what licence, deployed in what infrastructure, with what safety and governance arrangements in place.

Local deployment and data sovereignty. The most significant practical advantage of open source models for professional environments is the possibility of running the model entirely within an organisation's own infrastructure. When a model runs on servers that the organisation owns and operates, within a network perimeter that the organisation controls, submitted data does not leave that perimeter. The data processing terms of a third-party AI provider become irrelevant because there is no third-party AI provider. The organisation is both the user and the operator of the model.

This property is of direct relevance to European professional environments operating under the General Data Protection Regulation. GDPR imposes strict requirements on the transfer of personal data to processors outside the European Economic Area and on the data processing agreements that must be in place with any third-party processor. An organisation running an open source model on European infrastructure, under its own operational control, processes data entirely within its own environment. The GDPR implications of that processing are substantially different from those arising from the transmission of personal data to a third-party AI provider operating infrastructure in multiple international jurisdictions.

Fine-tuning and domain specialisation. A second significant capability of open source models is the possibility of fine-tuning: the process of continuing the model's training on a curated dataset specific to a particular domain, organisation, or task type. Fine-tuning allows an organisation to take a general-purpose open source model and adapt its behaviour to reflect the specific terminology, reasoning patterns, document structures, and professional standards of its particular field. A legal services organisation might fine-tune an open source model on a curated dataset of legal documents, enabling the model to handle legal language with greater precision than a general-purpose model. An insurance organisation might fine-tune a model on claims documentation, policy language, and coverage analysis examples, producing a model that understands insurance concepts at a level of depth that a general-purpose model cannot match.

Fine-tuning is a powerful capability, but it requires technical infrastructure, data engineering expertise, and careful attention to the quality and composition of the training data. A poorly curated fine-tuning dataset will produce a model that has absorbed the errors, biases, and gaps of that dataset alongside its useful content. The responsibility for the quality and safety of a fine-tuned model rests with the organisation that conducted the fine-tuning, not with the developer of the underlying base model.

Variable performance and the evaluation challenge. The open source model ecosystem contains models of widely varying capability. Some open source models, particularly the largest and most recent releases, achieve performance on standard benchmarks that is competitive with the best closed source models. Many others perform substantially below the frontier of closed source capability, sometimes in ways that are not immediately apparent from headline benchmark results. The evaluation of open source models for professional deployment requires more technical expertise than the evaluation of closed source products, because closed source providers present their models through polished interfaces that abstract away performance variation, while open source models must be evaluated directly.

The Misconception of Free Access

The description of open source models as freely available is accurate in a specific and narrow sense: the model weights can be downloaded without payment. In every other relevant sense, the professional deployment of an open source model carries substantial costs that must be understood before the open source route is chosen on economic grounds.

Computational infrastructure. Large language models require significant computational resources to operate at professional quality. The hardware required to run a capable open source model with acceptable response times for professional use, whether on physical servers or cloud compute instances, carries costs that scale with the size of the model and the volume of requests it handles. For models in the range that would be considered professionally capable, the compute costs of self-hosting may equal or exceed the cost of equivalent usage through a closed source provider's commercial API, particularly when the capital cost of hardware or the variable cost of cloud compute is properly accounted for.

Engineering and operational expertise. Deploying an open source model in a professional environment is an engineering project, not a configuration task. It requires expertise in machine learning infrastructure, containerisation, network security, model serving, performance optimisation, and monitoring. These are specialised skills that most professional services organisations do not have in-house and would need to acquire through hiring or contracting. The ongoing operational cost of maintaining a self-hosted model, including managing updates, monitoring for performance degradation, responding to security vulnerabilities, and ensuring the reliability of the deployment, is a continuous commitment rather than a one-time effort.

Safety and governance. A closed source model arrives with the safety and alignment work of its developer built in. An open source model, particularly one that has been fine-tuned or modified, requires the deploying organisation to take responsibility for its safety properties. This includes establishing internal governance arrangements for how the model is used, implementing appropriate safeguards against misuse, monitoring outputs for quality and safety, and maintaining documentation sufficient to demonstrate responsible deployment to regulators or auditors. In the context of the European Union's Artificial Intelligence Act, which establishes risk-based requirements for AI systems deployed in the EU, these governance responsibilities are not merely prudent practice. They may be legal obligations depending on the classification of the system's intended use.

The European Regulatory Context

For professionals operating within European jurisdictions, the choice between closed source and open source models has dimensions that extend beyond the technical and commercial considerations discussed above.

The European Union's Artificial Intelligence Act, which entered into force in 2024, establishes a regulatory framework for AI systems that distinguishes between different risk categories and imposes requirements that vary accordingly. General-purpose AI models, including the major closed source models, are subject to specific transparency and documentation requirements under the Act. Organisations deploying these models in high-risk applications face additional obligations relating to conformity assessment, technical documentation, and human oversight.

Open source models occupy a particular position within this framework. The AI Act includes provisions that apply differently to open source model releases, recognising the distinct governance challenges that arise when model weights are released publicly rather than being operated as a controlled service. The specific provisions, and their interaction with the data protection requirements of GDPR and the sector-specific regulations that apply in financial services, insurance, and legal practice, create a regulatory landscape of considerable complexity.

For most professional users, the practical implication is that the regulatory implications of model choice should be assessed with the involvement of the organisation's legal counsel or compliance function, particularly where the intended use involves personal data, regulated professional activities, or applications that might be classified as high-risk under the AI Act. The sections of this module that follow address the data handling implications of different deployment models, but they do not substitute for legal advice on specific regulatory obligations.

Choosing a Starting Point

For the majority of professionals engaged in knowledge-intensive work in consulting, legal services, insurance, finance, and related fields, closed source models accessed through reputable commercial platforms represent the appropriate starting point. The reasons for this are rooted in the practical realities of professional work rather than in a preference for one category of model over another.

Closed source models are immediately accessible without engineering investment. Their safety and alignment properties have been developed and maintained by teams with significant expertise and resources. Their data handling terms, while requiring careful review, are documented and can be evaluated against professional obligations. They are updated and improved by their developers without requiring action from the user. For organisations without dedicated AI engineering capability, they eliminate the infrastructure, expertise, and governance costs that self-hosted open source deployment requires.

Open source models become the appropriate choice when specific conditions apply. Organisations operating under data residency requirements that prohibit the transmission of certain categories of data outside their controlled infrastructure, organisations seeking to build deeply specialised AI capabilities through fine-tuning on proprietary data, and organisations that have reached a scale of AI use at which the economics of self-hosting become favourable relative to commercial API costs are all cases where the open source route warrants serious evaluation. These conditions are real and significant, but they describe a subset of professional deployments rather than the typical starting situation for individual professionals or organisations beginning to integrate AI into their work.

The sections that follow address the specific capabilities and limitations of the major closed source models currently available for professional use, the practical framework for matching model selection to task requirements, and the deployment considerations that inform the choice between hosted and self-hosted operation.