Embeddings: The Language of LLMs and GenAI

QuantumBlack, AI by McKinsey
6 min readOct 4, 2023


The recent uptick in Generative AI (GenAI) and Large Language Models (LLMs) has been all over the news, but apps like ChatGPT and Bard are only scratching the surface of what this technology can do. To fully understand the potential, you need to understand the concept of embeddings, the language of GenAI and LLMs, and how they can be used to solve business problems.

Introducing embeddings

Embeddings are made by assigning each item from incoming data to a dense vector in a high-dimensional space. Since close vectors are similar by construction, embeddings can be used to find similar items or to understand the context or intent of the data.

This approach enables the use of linear algebra and machine learning techniques, given that they rely on vectors, and is often applied to Natural Language Processing (NLP), Natural Language Understanding (NLU), recommendation systems, graph networks, and more.

How LLMS and GenAI use embeddings

Most embedding models are based on transformers, a type of architecture initially proposed in ‘Attention is All You Need’ by Vaswani et al. The central idea behind transformers is the concept of ‘attention’, which weighs the relevance of different contextual inputs, enabling the model to focus on the more important parts when predicting the output. Transformers also handle long sequences well, due to their self-attention mechanism, which considers the entire sequence context in a global manner.

LLMs are the machine learning models specifically designed for NLP or NLU tasks. They utilize transformer models to create embeddings from incoming data and then use them in predictive models such as recurrent neural networks (RNNs) or long short-term memory (LSTMs). Using the overall context and interplay of words represented by the vectors, the model can produce the most probable output based on the training data.

While LLMs are primarily concerned with language, similar GenAI models can serve purposes such as text-to-image, or audio-to-text, among others. These models can transform data between multiple modalities, including text, images, video, and audio. Regardless of purpose, each model will interpret the underlying meaning of its input and, using embeddings as an intermediary, generate the most likely output according to the training data.

Transforming your data into embeddings

Recently, LLMs have been popularized by models like BERT, GPT, and T5, to name a few. However, there are multiple different models that can be used to generate embeddings. For simplicity, we will cover the two main ways you can transform your data to embeddings, building a custom model, or using / fine-tuning a pre-trained model.

For training a custom model, you can use three techniques, supervised learning, unsupervised learning, and semi-supervised learning.

Supervised learning is the most traditional machine learning approach, where you train a model on a labeled dataset. Training involves presenting the model with labelled data, where the labels indicate similarity of context between different inputs. An often-used example is in `retrieval augmented generation’ chatbots, for instance, knowledge bots and doc Q&A, which are trained to return relevant documents to your question while also answering the question.

Unsupervised learning is where models are trained on an unlabeled dataset and try to learn some underlying structure in the data. This form of learning is more about exploring the data without specific task-oriented goals. The embeddings learned through this method might not be as effective for precise jobs such as classifying text as those learned through supervised or semi-supervised methods. Unsupervised learning is useful for very generalized tasks such as chatbots where the pattern of conversation is heterogeneous and will be different from conversation to conversation.

Semi-supervised learning: This is a combination of supervised and unsupervised learning methods, where the model is trained on a mix of labeled and unlabeled data. The idea is to use the unlabeled data to create a better learning environment for the labeled data. This is useful for recommender systems where ontological [SP1] [DF2] [KE3] understanding is relevant. For example, for private equity firms to find similar companies, ontological understanding based on labels keeps similar companies’ embeddings close within the vector space.

For leveraging a pre-trained model, trained by one of the above-mentioned techniques, you can use them as is or fine tune it for your needs.

Pre-trained models are the easiest way to get started with embeddings and can be set up to run in minutes. Generative Pre-trained Transformer (GPT) models can be used to transform data from text to embeddings, and then used for any downstream tasks like similarity search or recommender systems.

Fine-tuned models can be used by adding more examples to increase their vocabulary or changing the task the models are aiming to achieve. A fine-tuned model with more examples could be a model trained on company-specific terms that will understand the meaning of certain abbreviations or words particular to that company’s context.

Whereas task specific fine-tuning could be co-pilots where context specific examples such as code snippets or comments are used to tune a pre-trained language model to become better at tasks such as code completion or suggestion.

The specific method used to transform your data into embeddings depends on the type of data, the available resources, and the specific task at hand. The techniques discussed are all based on the same core idea: transforming data into continuous vectors such that the geometric relations in the vector space reflect the ontological relations in the original space.

Generating value from embeddings

Now that you understand what embeddings are and how you can transform your data into them, let’s walk through an example that shows how you can index a knowledge base of company information into one searchable, queryable engine.

Identifying company similarity

In the context of a company similarity use-case, you can create a vector representation for each company based on its data. The GenAI step-change is that similarity is no-longer based on a simple keyword search, but instead on an ontological understanding where similar items are close together in the embedding space. The embeddings themselves can be generated from each companies’ unstructured text data passed through an LLM. These embeddings alongside traditional structured data can create a complete picture of a company, and allow us to conduct more intelligent searches across your knowledge base. This enables a more accurate and intuitive understanding of similarity, across languages, regions and industries.

Intelligent search

To leverage the power of your embeddings, you can use them to conduct similarity searches across unstructured text data. Input queries can be transformed using the same transformer model used to index the data followed by the similarity measure the model was trained on, for example Euclidean distance[KE4] [DF5] , which can help us find the closest vectors and return relevant results. The output is a list of nearest data points ranked by their similarity. To handle large-scale scenarios using millions of high-dimensional embeddings with production-level performance, it’s recommended to use a dedicated vector database like Pinecone, Chroma or Milvus since they are optimized to scale. McKinsey’s own generative AI solution, Lilli, uses this approach to provide a streamlined, impartial search and synthesis of McKinsey’s vast stores of knowledge.

Indexing multi-modal data

Let’s say you wanted to index audio, video or text data related to your company into a centralized knowledge base and make it searchable based on its content and context. This is impossible to achieve at scale with the way search is thought of today, where metadata is added to data and keyword search, or filter-based search is used to find relevant data. Adding metadata is traditionally very manual and would involve a lot of time and effort. To speed things up, you can use an ensemble of transformer models to index all available data into a vector database for querying or to generate metadata at scale. From there you can intelligently search across data sources and modalities using natural language queries and return all relevant results based on the context in the query.

Embracing embeddings

There are many ways you can get started with embeddings and transformers. There are courses available on transformers and the architecture behind encoders and decoders which can help you understand the fundamentals of this technology and begin developing solutions with it. Learning about foundation models and how they can be used to fine tune to a certain situation will then be a good next step.

We hope you now understand what embeddings are and how they can be useful. Let us know how you plan to use them — even better, let us know if you’ve already used them, and what value they have given your organization.

Authored by: Keith Edmonds, Danny Farah, Alex De Ville, Shawn Paul, Anna Xiong

Special Contributors: Romain Thomas, Alex Arutyunyants, Stephen Xu, Daniel Herde, Ghislain Gagne



QuantumBlack, AI by McKinsey

We are the AI arm of McKinsey & Company. We are a global community of technical & business experts, and we thrive on using AI to tackle complex problems.