Foundation Models Powering Generative AI:
The Fundamentals

Disclaimer: This article is part of a series covering the fundamentals of AI technologies. Our objective is to provide independent research insights on what these technologies are and how they work. We also provide context regarding their applications and highlight key risks and considerations.

This article is written and published by S&P Global, as a collaborative effort among analysts from different S&P Global divisions. It has no bearing on credit ratings.

Published: November 29, 2023

Highlights

The emergence of generative artificial intelligence (AI) represents a key milestone in the development of machine learning and deep learning tools. Generative AI extends the scope of AI automation beyond pattern recognition to content generation. While much of the early public interest in generative AI centered on text generation — specifically on chatbot applications — the technology encompasses a wide array of mediums, with key investment areas including image, code, audio, video and structured data.

The advancement of foundation models that power generative AI benefited significantly from transformer neural network architectures. These enabled generative AI to effectively process entire sequences at once, rather than individual data points, and improved models' ability to capture context. Additionally, transformer architectures reduced the time it takes to train models and could be applied to computer vision and speech processing.

While this new wave of AI comes with several concerns for organizations, including data privacy issues and challenges regarding biases, we see potential for trustworthy AI frameworks that manage such risks safely. Human judgement is key in that regard.

Early generative AI use cases include content creation, summarization, coding assistance, media enhancement, and conversational interfaces with systems. Generative AI could also become a game changer in finding solutions to key social and environmental challenges.

From traditional to generative AI

In a matter of months, AI has evolved from classification, clustering, dimensionality reduction, and prediction operations to an original content creator. Generative AI refers to the ability of AI models to generate new content. This includes text, images, video, code, audio, and synthetic data. In this short primer, we will explain the key differences between traditional AI and generative AI, dive into the foundation models that power generative AI as well as an illustration of its relatively short history, and help frame some of its various applications, risks and considerations.

The emergence of generative AI powered by foundation models does not mean that traditional AI is outdated. Instead, generative AI adds to the toolkit and can help solve novel problems. Traditional AI, through supervised and unsupervised machine learning and deep learning models, remains useful as part of the sequencing and generalized toolbox (see Table 1).

Table 1: Where traditional and generative AI excel

Traditional AI is good at	Examples of use cases
Structured data analysis	Forecasting, prediction
Computer vision	Physical security, anomaly detection
Classification	E.g., identifying images as “cat” vs. “not cat”
Directed chatbots	Deterministic dialog flows
Process automation	Process optimization, robotic process automation

Generative AI is good at	Examples of use cases
Content creation	Text, images, marketing, social media, synthetic data
Code generation	Phyton, SQL, Cobol, documentation
Summarization and content extraction	Financial reports, filings, legal documents
Directed chatbots	Deterministic dialog flows
Semantic search	Natural language search using whole sentences, knowledge mining

Source: S&P Global.

Generative AI drew widespread public attention on Nov. 30, 2022, when OpenAI released a demo of ChatGPT, a chatbot that had 100 million users in the first two months after its launch. However, generative AI had been brewing for years (see Figure 1). Several technological milestones paved the way, such as the development of generative adversarial networks in 2014 and the transformer architecture developed by Google in 2017. The transformer architecture accelerated the development of language-based generative AI because it enabled analysis of not just single words but sequences of words, capturing their meaning in context.

Generative AI’s initial claim to fame were large language models (LLMs), for example Google’s BERT, which emerged in 2018 and is based on transformer architecture. LLMs can generate summaries, translate text into multiple languages, and create original content. But the capabilities of generative AI go beyond LLMs and encompass a wide array of outputs (see Table 2).

Table 2: Examples of generative AI use cases

Text generation

Creates new text based on user prompts or inputs. Functions may include conversational text chatbots, summarization, or auto-generation of text.

Image generation

Creates new images or manipulates existing ones, such as by extending the frames of photographs, generating new art pieces from text prompts, or transforming visual art to a different medium (e.g., from a drawing to an ultra-realistic photograph).

Code generation

Creates or suggests code. This may involve generative code translation, code autocomplete, automated testing, or other functions.

Audio generation

Creates various forms of audio, such as speech, music and sound effects. This category includes voice cloning, music composition, and voice assistants.

Video generation

Generates video sequences or animations, including text-to-video generation, in-motion deepfakes, and generative video inpainting.

Structured synthetic data generation

Creates artificial data, which can be provided with specific characteristics and can be used to protect users' privacy or to bolster training sets for initiatives where little real-world data is available.

Source: S&P Global.

Foundation models — of which LLMs are just one type — are at the core of generative AI (see Figure 2). When fine-tuned — in other words, trained on additional data related to a given topic — these pre-trained general-purpose models can serve specific applications. For example, FinBERT is a BERT model pre-trained on financial texts, such as financial statements, earnings call transcripts, and analyst reports, to improve the accuracy of its finance-related analysis and output.

A wealth of possibilities

Fine-tuning is important, but it's not the only technique to optimize outputs. Options that are less demanding on IT infrastructures include prompt engineering, taking advantage of longer context windows to provide more complex input instructions, and the use of plug-ins. Plug-ins can provide models with external resources to look up information and extend models' capabilities, so they can, for example, search customer databases. Prompt engineering, longer context windows, and plug-ins aren't mutually exclusive, and many organizations will invest in all three options.

Foundation models: What they are and why they matter

Foundation models are AI models trained on a vast quantity of usually unlabeled data. Some of these models are open source, while others are proprietary. If released, the latter are usually accessible via application programming interfaces. Foundation models can be trained on a single medium, such as unstructured textual data, or on multiple mediums, for example text and images, or text and code. The models "learn" the relationship between data points. The breadth of the training data means that the models are versatile and can address many use cases, including use cases on which they haven't been specifically trained.

Foundation models can be trained in three stages:

Pre-training: High quantities of training data to inform large, general-purpose models.
Fine-tuning: Addition of context-specific data to align outputs more closely with use cases.
Inference: Incorporation of user feedback for further tuning and addition of more data as users interact with the models via plug-ins.

Foundation models power generative AI and have emerged as a new category of deep learning (see Figure 3). Like deep learning algorithms, foundation models are AI neural networks, but they are more complex and larger than older models, both in terms of the number of embedded parameters and the volume of data needed to train them.

Unlike foundation models, more traditional AI deep learning models that use algorithms, such as recurrent neural networks and convolutional neural networks, have a narrow scope and are used for specific cases. For example, a manufacturing company might deploy AI with deep learning computer vision systems for warehouse stock control. The use cases for foundation models are broader since they are trained on large and diverse datasets. This doesn't necessarily benefit the quality of the output. However, foundation models can be more closely aligned with specific use-cases, if humans fine-tune models or create greater specificity in instructing them.

Companies have only recently begun large-scale productization of these models. We estimate that the generative AI market will expand at a compound annual growth rate of 58% between 2023 and 2028, reaching $36.4 billion by the end of 2028. The foundation model segment, which we forecast will likely generate revenue of $11.4 billion by 2028, will remain relatively consolidated due to the significant resources and budget required to pre-train models that can keep up with leading foundation models. This creates significant entry barriers and means that large technology companies, which already possess substantial computational resources, will account for a large share of the foundation model market. Most startups will build on these models, for example by adding services or providing additional training to optimize their performance.

The dominance of tech giants will be slightly offset by regional foundation models that are trained on local languages and context, such as Abu Dhabi's Falcon, an open-source LLM. The advance of regional foundation models partly results from a political drive to build domestic technology industries in countries outside the US, but it also reflects concerns about potential for bias if data collection and human review were to occur solely in the US.

The power of prompt engineering

Prompt engineering is about communicating orders or inputs in a way that enables a foundation model to work. Providing accurate and valuable prompts, usually in text form, is vital because it directly affects the quality of the model's output. For instance, a high-quality prompt usually includes information about the role the AI model should adopt when providing an answer. Prompts should focus on two things: a detailed, clear, specific explanation of the task; and context, including the required format and length. According to the International Monetary Fund, the merits of prompt engineering are manifold — notably, enhancing the accuracy of the generated text, exercising some degree of control over the output and, crucially, mitigating inherent bias.

Key risks of generative AI and foundation models

Generative AI powered by foundation models can have great potential, but risks and limitations require a human-in-the-loop approach. Some of the most common potential risks of foundation models are:

Hallucinations: The ability of foundation models to provide fictitious responses, which can result from a lack of context when prompting, biases in the training data, and low-quality training data, among other causes.
Output inaccuracies, including outdated and limited information: Foundation models' current architectures mean they are susceptible to inaccuracy, but this can be mitigated by adding deterministic controls, such as vector databases, that ground the response in real data.
Misuse: The malicious use of AI, including the generation of deepfake images or videos, or generative AI-powered cyber-attacks.
Biased responses: Outputs that could be discriminatory or unfair because the data that trains the foundation model contains human biases (gender, political, racial, etc.), the training data is not updated regularly, or it's simply not diversified enough. Other concerns about bias could be related to the entities behind the model. For example, companies or governments may decide to reflect their cultural sensitivities in a model, which could lead to biases.
Lack of transparency: Humans’ inability to explain how the model arrived at a specific solution or to replicate the model's response.
Intellectual property concerns: The possibility that models might produce outcomes that resemble existing content, which could lead to copyright violations. A precedent in the U.S. suggests that machine-generated content cannot be protected by a patent or copyright because it wasn't created by a human.
Data privacy: Concerns that sensitive information can leak into externally available models.
Infrastructure demands: Generative AI is associated with significant computational and network resource demands, which increase infrastructure spending and dampen sustainability efforts.
Copyright infringement: Multiple lawsuits in the U.S. focus on copyright infringement because of the allegedly unlawful use of media to train foundation models. The model builders claim fair use, but copyright owners, such as Getty, disagree.

Sovereigns and companies need effective and flexible governance frameworks to build trustworthy AI

As Professor Melvin Kranzberg said in 1986, "technology is neither good nor bad; nor is it neutral." This statement is particularly relevant in the case of AI. The rapid development of AI has not yet been matched with a commensurate level of oversight, be it at a supranational, national or company level. But things are changing, albeit slowly.

On the national and regional level, several AI regulations are in the making, such as the EU AI Act, a framework due to be finalized by the end of 2023. Similar regulations are being developed in the US and Asia. In addition, the development of generative AI has led to new guidance, including the Organisation for Economic Co-operation and Development's recently published "G7 Hiroshima Process on Generative Artificial Intelligence."

Policymakers and companies must build trustworthy AI frameworks to manage risks and other potential pitfalls that we may not even be aware of — hence the need for a flexible approach. Regulations and guidelines on AI governance, though they will likely differ around the world, will determine the rules of the game.

Outside a business context, the rise of generative AI and its potential ubiquity offer great possibilities to help solve problems such as climate change, world hunger, diseases, education inequality, income inequality, and the energy transition. For example, technological advancements could boost quantum technology and allow for "digital experiments" of physical processes, such as nuclear power generation. The potential for good is virtually limitless, but so is the potential for harmful consequences, intended or otherwise. That's why generative AI requires a solid, human-led, regulated ecosystem to ensure its highly disruptive nature leads to positive outcomes.

Related research

451 Research’s Generative AI Market Monitor, June 7, 2023, S&P Global Market Intelligence.
451 Research’s Machine Learning: The Fundamentals, Nov. 28, 2023, S&P Global.
The AI Governance Challenge, Nov. 28, 2023, S&P Global.
Generative AI use cases could boost document and content management software, Sept. 13, 2023, S&P Global Market Intelligence.

External research

FinBERT: A large language model for extracting information from financial text, Sept. 29, 2022, Contemporary Accounting Research.
"Technology and History: 'Kranzberg’s Laws'," Technology and Culture 27 (3), July 1986, Melvin Kranzberg.
G7 Hiroshima Process on Generative Artificial Intelligence, Sept. 7, 2023, OECD.
Technology Innovation Institute Introduces World’s Most Powerful Open LLM: Falcon 180B, Sept, 6, 2023, Technology Innovation Institute.
Generative Artificial Intelligence in Finance, Aug. 22, 2023, International Monetary Fund.