The release of Open AI’s ChatGPT to the public in November 2022 has changed the trajectory of AI’s adoption.
As ChatGPT became the fastest-growing consumer application in history, the large language model (LLM) that powers it caught the attention of enterprises and turned AI adoption into a B2C2B1 – business to consumer to business – journey.
AI is now a top C-suite priority, creating a massive new area of spending across AI infrastructure and applications. In fact, while enterprises largely reduced software spending in 2023, generative AI defied this trend.
The tech stack around the space has also evolved considerably, thanks to robust fundraising for startups and established technology companies plunging billions into R&D to strengthen their AI offerings.
However, the dizzying array of tools and techniques now available to develop an LLM architecture obfuscates how enterprises actually make purchasing decisions.
Understanding the multiple options available to CTOs and how they think about incorporating LLMs into their organizations is critical to envisioning how the AI ecosystem might grow and evolve. This landscape will remain complex for some time, but our conversations with practitioners clarify a framework for thinking about emerging corporate access patterns.
V1 to VF: Three typical ways to access LLMs
Three general models of how companies access LLMs have emerged. In order of increasing complexity, these are:
- Use off-the-shelf applications or call APIs provided by standalone foundation model vendors (e.g. OpenAI or Cohere)
- Pursue a more involved relationship with standalone vendors or incumbent tech companies (e.g. Google, Microsoft, Amazon, Databricks, Snowflake)
- Develop a self-hosted model
Generally, enterprises opt for the simplest method that accomplishes the goals they set out to achieve with LLMs. However, further considerations, such as privacy, cost, vendor onboarding and existing tech stacks can change the calculation. The popularity of each access model will ultimately determine the main beneficiaries of enterprise investment in AI and machine learning (ML).
V1: Getting started with off-the-shelf applications
The simplest access models are off-the-shelf applications or direct APIs through LLM providers. Enterprises utilize LLMs by paying per token (typically a small chunk of text – four English characters according to OpenAI) without having to manage their own infrastructure. Architecturally, this model is as straightforward as embedding ChatGPT to field open-ended questions from customers (think Expedia) or inserting the logic of an LLM behind an enterprise’s own front-end interface to handle searches.
Powering existing applications with the logic of LLMs often greatly improves performance vs. existing substitutes (when have you found a website’s FAQ section more helpful than ChatGPT?), making this path ideal for experimentation and getting to “version 1” of an LLM implementation. Standalone foundation model vendors, such as OpenAI, Anthropic, Cohere and AI21, all offer easy-to-use solutions in this category.
The pay-per-token model is lucrative for LLM providers. For example, OpenAI’s GPT4 can cost around $30 per 1 million input tokens and $60 per 1 million output tokens, quickly escalating costs for customers with heavy usage. Consider this example: a chat bot on a travel app like Expedia receives thousands of customer queries per day, and queries typically consist of several tokens. If we assume Expedia’s 112 million monthly unique visitors ask one question per month such as, “What would be a good itinerary for a one-week trip to the Amalfi Coast?” and receive a detailed response (1,697 characters long in our test), we arrive at nearly $35 million/year spend across Expedia’s user base. It’s easy to see this cost exploding with increased usage. Therefore, enterprises must configure their generative AI apps carefully – identifying common questions and routing users to existing content vs. generating entirely new content for each query, shortening potential inputs and outputs and optimizing responses to reduce follow-up questions.
In summary, common considerations with using off-the-shelf models via API include:
Privacy: Enterprises preferring to retain their proprietary data often opt to host models within their existing cloud infrastructure. Companies in the financial and medical industries, in particular, follow this path.
Cost: Companies must track the throughput of their LLM-based applications and ensure the volume of tokens used is manageable. At some point, self-hosting an open-source model is significantly cheaper than paying by token or for committed volumes with a closed-source model.
Specialization: Using general models in specific domains like financial markets or technology can lead to undesirable and inaccurate outputs.
For enterprises with straightforward, low-volume use cases – such as basic customer chatbots or enterprise search – the off-the-shelf model makes sense; this will be the case for countless companies, especially in the mid-market and long tail of the Fortune 2000. Enterprises with more robust needs, for example a search tool that understands nuances of pharmaceuticals, or constraints, such as strict data retention rules, will find this type of access too limited, pushing them to the next option.
V2: Upgrading to AI suites
After an enterprise gets its LLM-powered application off of the ground, it may seek some controls – privacy guarantees, negotiated pricing and handholding through customization of proprietary data. These “V2” LLM-based applications are more deeply integrated, somewhat customized and provide additional value vs. off-the-shelf options.
Players like OpenAI and Cohere are winning large enterprise contracts for V2 access, but incumbent tech companies – namely large cloud service providers (CSPs), such as Azure OpenAI Service, Google Vertex AI and AWS Bedrock – are rushing to seize on the LLM opportunity with several tailwinds behind them: captive customer bases and committed spend, compliance and security guarantees and vast product suites complementing LLMs. In industries with legacy tech stacks and stringent regulatory oversight, CSPs’ suites are especially enticing. For example, hospital software provider Epic announced a sprawling generative AI product suite built on top of Azure OpenAI Service.
Databricks and Snowflake are also betting that their proximity to enterprises’ data will naturally position them to capture burgeoning AI spend. Databricks took this a step further with is release of DBRX – its open-source, general-purpose LLM. The company pitches its Mosaic AI Foundation Model product as the ideal managed solution for deploying DBRX: customers can privately host DBRX, work with Databricks to deploy DBRX on their own hardware and fine-tune the model on their own data.
Microsoft’s OpenAI Service offers an interesting look into how companies are accessing generative AI. Microsoft maintains a close relationship with OpenAI, which has a standalone product and fine-tuning capabilities accessible via API. Practitioners highlight an emerging pattern: companies experiment on OpenAI’s platform then shift to Microsoft’s Azure OpenAI Service when going to production.
Startups in the space face a difficult decision – they need to meet customers where they are, but potentially lose their branding power and pricing leverage when slotted alongside several other models within a broader marketplace. Some enterprises also wish to avoid being locked into a specific model and preserve optionality. Additionally, the tech majors sometimes prioritize their own models above third-party models, further alienating the standalone LLM vendors. Expect startups – Anthropic, Cohere, AI21 and others – to deepen partnerships with incumbents while building up their own product suites and ecosystems.
VF: Self-hosting LLMs
Optimizing for cost, privacy and cybersecurity needs and specialization, enterprises may decide to host models themselves. In this paradigm, enterprises can choose an open-source model, such as Llama 2, and deploy it on their own infrastructure. Although self-hosting obviates spend on a proprietary model, not every enterprise is up to the task of managing its own infrastructure or customizing a model with its own data. Longer term, we expect more enterprises to pursue this route as they differentiate their LLM-powered applications and get to the final version (VF). Look for startups hosting open-source models (HuggingFace) and those reducing the barriers of customization (Weights & Biases, Comet, Galileo) to power this leg of the enterprise LLM journey.
Tracking enterprise access patterns
New LLM vendors are growing their enterprise presence and product suites but they must navigate competing offerings from incumbent tech companies – often betting on both their standalone products and selling through services like AWS Bedrock (which offer several models in one place). Enterprises often initially follow the path of least resistance, experimenting with models directly through the foundation model vendors’ APIs. Over time, they may turn them into products within the comfort of a larger suite (OpenAI, Azure OpenAI or Databricks) or via a self-hosted solution. Expect large, highly regulated enterprises to choose products from the tech majors and smaller and/or less-regulated companies to engage directly with standalone LLM vendors. The savviest customers will opt to self-host.
AI has been solidified as a top C-suite priority and isn’t going anywhere. Enterprises will continue to incorporate LLMs into their organizations and must be selective as they develop LLM architecture with the backdrop of a growing AI ecosystem. This incorporation of AI creates a massive new era of spending across AI infrastructure and applications expected to exceed $1.8 trillion2 by 2030.