Think of how many times a day you use some type of search functionality across your devices and applications to discover information, find a contact, or a new job opportunity. The truth is we all depend on the ability to search for things online, and finding the right match to the information, organization, or to a job that maps to your skills and interests makes all the difference in our experiences and the knowledge we can gain.
The magic in search happens because of the technology that powers it. And one of the key technologies behind LinkedIn’s search and recommendation features is embedding based retrieval (EBR). EBR helps deliver more relevant matches for our members and customers every day.
LinkedIn is on the forefront of leveraging EBR technology to revolutionize the way we approach search and recommendation systems. Right now, you can find EBR at work in a number of features we offer, such as Jobs You Might Be Interested In (JYMBII), the content you see in LinkedIn Feed and Notifications, and a number of our search products, such as Job Search.
In this post, we will share how we are approaching and building the software infrastructure to further incorporate embeddings at scale and how we’re quickly utilizing the technology to facilitate more efficient and effective matching between job seekers and companies.
What Are Embeddings?
Before diving into the specifics of EBR at LinkedIn, let’s define a couple of key terms. Embeddings are vector representations of data points that capture the complex, high-dimensional features of the data in a way that’s flexibly used within any kind of AI model. Embeddings preserve the similarity of the data points by keeping points with similar features closer together in vector space. For example, LinkedIn members have lots of information in their profiles – such as work experience and text descriptions of their responsibilities – which can be consolidated into a member profile embedding. That can then be used to represent profile information as part of a larger AI model for member-content recommendations.
Figure 1 – Example of an Embeddings Graph
Embedding based retrieval (EBR) is a method that is used at the early stages of a recommendation or search system. It efficiently searches among a very large set of possible items (such as jobs or feed posts), returning items that are relevant to the request based on the similarity of a request-side embedding to individual item embeddings. Simplistically, the items that are “close” to the request in the embedding space are retrieved. These documents can then be ranked in a second stage AI model, to obtain the most relevant items.
You can think of the “request embedding” as encapsulating the contextual intent of the search or recommendation request in such a way that geometric proximity in the embedding space shared by the request embedding and the EBR index is highly correlated with similarity of “meaning.” For example, a nursing industry recruiter who specializes in emergency room roles issuing a search query for “senior burn ward nurse” would have a request embedding that ends up being geometrically very close to several nursing profiles with experience in that area.
In recommendation systems, EBR assists in providing personalized content or job recommendations based on member profiles and activities. It also powers both the semantic memory and Retrieval Augmented Generation in generative AI (GAI) systems. Retrieval Augmented Generation gives you the power to find useful information quickly and create new solutions using that information you just found. By working both of these actions in tandem, it can solve problems much faster and more efficiently than either one could on their own. For example, when the system needs to “remember” past interactions or consult with a knowledge-base of FAQ articles, it does this by taking the current textual exchange between a person and the chatbot, encodes it into an embedding space, and then issues a query to the similarly-encoded FAQ article EBR index or past conversational history, and finds relevant discussions or articles to feed into the LLM’s next context.
EBR Infrastructure Components
To make it easier for our engineering teams to use EBR for their applications, we’ve developed a comprehensive set of new infrastructure components designed to help them deliver more relevant search results and recommendations for our members and customers:
Authoring Composite and Multi-Task Learning Models
We’ve introduced the ability to author composite models (such as two-tower models), and support multi-task learning for the embeddings. The benefit of this component is that the consolidation of various objective functions into a single model enhances task transfer learning and expedites the individual task learning process. For example, one use is creating an embedding that broadly captures a LinkedIn member’s interests, based on their profile and their interaction with the feed, jobs, and other product experiences. This embedding can then be used as an input feature for the search and recommendation systems powering experiences like feed, notifications, and jobs. This focus on training pipeline steps which take in heterogeneous inputs with multiple objective functions, and emits co-trained models which are already part of an implicit inference graph centers the application development experience in the place where the AI engineer feels most at home: designing the recommender model itself.
Figure 2 – Authoring Composite and Multi-Task Learning Model
Feature Cloud for Offline and Streaming Embedding Generation
We’ve developed a fully hosted platform called “Feature Cloud” that combines offline and streaming embedding generation. It leverages existing services like Managed Beam and our high availability feature delivery services, as well as the Flyte job orchestrator. These pipelines handle running batch and streaming inference, as well as preparing precomputed embedding vectors into feature stores and EBR indexes. These pipelines are used for embeddings of many kinds, including those generated with the large composite models described above, as well as those from two-tower models or other architectures of various constituent embedding components of the large composite model designed and trained in the Authoring Composite and Multi-Task Learning Model.
Enhanced Hosted Search System
Our Hosted Search system, based on a Lucene-compatible EBR engine, has been enhanced to support automated embedding version management when working with Feature Cloud, and offers a variety of EBR algorithms like IVFPQ, IVF, and exhaustive search. This flexibility is required in the extremely fast-paced domain of EBR, where new algorithms operate at higher scale, lower latency, and with enhanced feature sets (like applying strict boolean or geo filtering criteria while performing an EBR query).
Figure 3 – Hosted Search System
Automated Embedding Version Management
Because EBR helps retrieve content that is semantically related to a search query, improving the relevance of search results, we had to ensure that we also incorporate an approach to effectively manage versioning. Versioning for embeddings can be particularly tricky because while individual embeddings (e.g., the member-interest embedding) can be retrained to produce a new version, the semantic meaning won’t be aligned with the target embedding in the EBR system if care isn’t taken.
An example of where this might occur is a standard “personalized search” setup, where the request embedding combines a query text embedding on the raw query, together with an interest embedding for the person doing the search. The target EBR search index has an embedding for the items being searched (such as jobs), which was trained so that cosine similarity of the item embedding is high for items that are good semantic matches for the incoming personalized embedding query. But now imagine that the team who trains the member-interest embeddings (used to personalize the search) pushes out a new version of that embedding model. There’s no reason why the item embeddings will be aligned with this model, even if the embedding dimension is the same from one version to the next.
Within the design of Feature Cloud, we ensure that when a new embedding model is to be added to an application, parallel feature store tables and embedding slots in the EBR index are provisioned (and versioned) so that a consistent A/B test can be run, with metadata linking all of the components which were co-trained together.
Model Cloud for Inference Graph Orchestration
We’ve extended our model inference stack, “Model Cloud,” to run inference graphs and a Ray Serve backend. This allows easy execution of inference graphs in a more serverless fashion, eliminating the need for app teams to orchestrate complex workflows, and enforcing the version consistency constraints implied by the overall composite model’s metadata. This also reduces the need for application teams to run their own AI middle tier and the duplicated effort of AI teams all running separate services for their applications. The result is the creation of a simpler architecture, where AI engineers can spend less time maintaining intermediary infrastructure and more time focusing on delivering customer value.
Figure 4 – Model Cloud
Increasing The Quality of Job Search
Compared to recommendation systems, search is a more challenging task that requires a good fusion of member, query, and context information to handle the level of precision because the member has more explicit intent. Before the introduction of EBR, Job Search relied heavily on text matching which did deliver results, but did not offer a deeper level of personalization, semantic matching, or granularity. We used these new core components to build a solution for upleveling our matching capabilities.
Specifically, we collect impression data. We then use positive engagement reactions and heuristic in-batch negative sampling techniques to generate training data. We train a two tower neural network model with softmax loss to minimize the cosine similarity between request embedding and job embedding. The model architecture is shown in Figure 5.
Figure 5 – EBR Model Architecture
EBR serving and job embedding indexing is based on Zelda, which is our search infrastructure EBR solution that is based on InVerted File with Product Quantization (IVFPQ). As shown in the following Zelda framework, we index the job embeddings, and during request path, we infer the request embedding in runtime and then do an approximate neighbor search online to get top relevant jobs for the request.
Figure 6 – Zelda Framework
Our goal for introducing this technology was to increase the liquidity of relevant jobs presented to our job seekers. EBR has two major benefits compared to classic term-based candidate selection. EBR can have more granular weights on different perspectives. For example, in term-based candidate selection, it is tricky to pay different attention to title matching or description matching. But in EBR, the title match could have a smaller distance so it will be prioritized. EBR also helps improve AI productivity because it provides a data- driven approach to improve candidate selection and feasibility to leverage pretrained embeddings, etc.
We launched this EBR model in the Job Search candidate selection stage and observed significant engagement metric wins (measured by number of applications, click through rate and successful job search sessions). With the new capability provided by EBR, we greatly simplified the original text term-based retrieval and even achieved some p95 latency decrease.
The implementation of EBR at LinkedIn has significantly improved the ability to deliver personalized and relevant content to our members and customers. It has streamlined the process of information retrieval and recommendation, making it more efficient and effective. As we continue to refine and expand our EBR capabilities, we look forward to unlocking even more possibilities for enhancing the LinkedIn platform and product experience.
We thank the following people for their significant contribution to the launch of our EBR infrastructure and its integration into Job Search.