2026 Netflix Workshop on Personalization, Recommendation and Search (PRS)
October 
10th 
2019

7:00pm—8:00pm

Session title

Ticket
$0000
Ticket description
Day Month 0th 0:00pm
Day Month 0th 0:00pm
100 remaining

List Item

Clear your calendar - It's going down! Splash Blocks kicks off on April 20th, and you're invited to take part in the festivities.

Portrait

Speaker Name

Short speaker biography. 

On what date does the event take place?

The event is scheduled for Wednesday, May 31, 2024.

On what date does the event take place?

The event is scheduled for Wednesday, May 31, 2024.

Placeholder image alt text
2026 Netflix Workshop on Personalization, Recommendation and Search (PRS)

Join us for a day of conversation and community building.

Friday 
June 
05
 | 
8:30AM
–
6:00PM 
PDT
RSVP
Text goes here
X
Waitlist
Text goes here
X
Placeholder image alt text
Placeholder image alt text
Placeholder image alt text

Event Details

The 2026 Netflix workshop on Personalization, Recommendation and Search (PRS) aims at bringing together practitioners and researchers working in domains to facilitate the sharing of ideas, information and approaches to build bridges between these communities. This year marks our 10th event! 🎉


 

Please register in advance using the RSVP button above. Registrations will close when we reach capacity (which we have in prior years) or by Friday, May 29th. So if you're interested, don't delay.



The event will be in-person only, at the historic Fox Theatre in Redwood City.

 


This @NetflixResearch workshop is organized by:

 

Claire Dorman 
David Fagnan 
Fernando Amat Gil 
Nathan Kallus 
Linas Baltrunas 
Matteo Rinaldi 
Colleen Chan 
Gary Tang  
Hailey Boren 
Jeff Vadukumcherry 
Liza Argent 


For questions, contact prs-organizers@netflix.com


Previous PRS workshops: 2025, 2024, 2023, 2022, 2021, 2019, 2018, 2017, 2016.

Tentative Agenda

Note

Sequencing and timing of sessions is still subject to change

8:30 AM PDT

Registration Opens


Morning Coffee

9:15 AM PDT

Welcome & Opening remarks


Kamelia Aryafar, Head of AI for Members Systems (Netflix)

9:30 AM PDT

From Ranking to Reasoning: Building The Next Generation of Consumer AI at DoorDash

 

Sudeep Das (Doordash)

10:00 AM PDT

GenPage: Towards Generative End-to-End Homepage Construction at Netflix

Luke Lequn Wang (Netflix)

 

10:30 AM PDT

Break



11:00 AM PDT

Thinking Fast and Slow: Conversational Agents That Think, Plan, and Grow With Us


Konstantina Christakopoulou (Google DeepMind)

11:30 AM PDT

Generative Recommenders at LinkedIn

 

Julie Choi (Linkedin)

 

12 PM PDT 

Lunch & Netflix Poster Session

1:00 PM PDT

When Recommendation Becomes Evaluation: Aggregation, Heterogeneity, and Strategic Gaming


Sanmi Koyejo (Stanford)

1:30 PM PDT

Building AI You Can Trust: Innovation & Fair Housing in Real Estate 


Ondrej Linda (Zillow)


2:00 PM PDT

The Generative Shift in RecSys at Pinterest: Foundation Models for Ranking, Sequence Generation for Retrieval

 

Jaewon Yang (Pinterest)

2:30 PM PDT

Break 

 

3:00PM PDT

Fireside chat with Becks Wood and Craig Saldanha

 

Becks Wood (Netflix) & Craig Saldanha (Yelp)

 

 

 

3:30 PM PDT

Measuring the Value of Personalization

 

Kevin Zielnicki (Netflix)

 
 

4:00 PM PDT

Closing Remarks


Kamelia Aryafar (Netflix) 

 

4:15 PM PDT

Networking Happy hour

6:00 PM PDT

End of event

The Final Countdown!
Time left for the event days hours minutes seconds
The countdown doesn't work if the event start date is set to TBD

Speakers

Portrait

Craig Saldanha

Yelp

 

Craig Saldanha is the Chief Product Officer at Yelp, where he leads global product and design teams to transform the consumer experience, local business tools, and monetization through cutting-edge AI. Previously, he directed product and engineering for Amazon Prime Video International and held multiple leadership roles across Prime Video and Kindle, shaping global streaming and digital content strategies. Craig also serves as an Executive in Residence and Lecturer at Carnegie Mellon University’s Tepper School of Business, teaching product management to MBA students and executives.


Portrait

Becks Wood

Netflix 

Becks Wood is the Senior Director for Core Discovery @ Netflix. Core Discovery includes the TV, mobile, web and personalization consumer experience. Before Netflix, Becks was a product lead on Google Search, leading consumer search for verticals such as TV/movies, recipes, sports and more. At Google, she worked specifically on the genAI search integrations as well as vertical features such as leading the Olympics launch. Becks loves the intersection of entertainment and technology. She's an avid movie & concert goer, loves a dance party, and spends time riding bikes around San Francisco. Becks holds a BA from Princeton University in economics.

Portrait

 Sanmi Koyejo

Stanford

Sanmi (Oluwasanmi) Koyejo is an Assistant Professor in the Department of Computer Science at Stanford University and an adjunct Associate Professor at the University of Illinois at Urbana-Champaign. He leads the Stanford Trustworthy AI Research (STAIR) lab, which develops measurement-theoretic foundations for trustworthy AI systems, spanning AI evaluation science, algorithmic accountability, and privacy-preserving machine learning, with applications to healthcare and scientific discovery. His research on AI capabilities evaluation has challenged conventional understanding in the field, including work on measurement frameworks cited in the 2024 Economic Report of the President.


Koyejo has received the Presidential Early Career Award for Scientists and Engineers (PECASE), Skip Ellis Early Career Award, Alfred P. Sloan Research Fellowship, NSF CAREER Award, and multiple outstanding paper awards at flagship venues, including NeurIPS and ACL. He has delivered keynote presentations at major conferences, including ECCV and FAccT. He serves in key leadership roles, including Board President of Black in AI, Board of Directors of the Neural Information Processing Systems Foundation, and other leadership positions in professional organizations advancing AI research and broadening participation in the field.

Portrait

Julie Choi

LinkedIn


Julie Choi is an Engineering Manager within LinkedIn’s Core AI organization, where she leads the team responsible for next-generation recommendation and ranking systems. Her work focuses on the development and productionization of generative AI and sequence-based modeling to enhance personalization across a wide range of LinkedIn products, including Feed and Ads. Prior to LinkedIn, Julie focused on building large-scale machine learning systems and holds a Ph.D. in Electrical Engineering with a minor in Statistics from Stanford University.

Portrait

 Konstantina Christakopoulou

Google DeepMind

 

 Konstantina Christakopoulou is a Staff Research Engineer at Google DeepMind, leading efforts around AI agents that help improve people’s lives. She co-founded and technically led Project ALLY, a Google Brain Moonshot aimed at the next generation of assistive recommendation that helps users throughout their lifelong journey. She has led multiple cross-PA efforts to align industrial recommendation platforms with human values, publishing in top-tier conferences and delivering 20+ launches while working closely with senior leadership at Google. Prior to Google, she received her PhD from the University of Minnesota, during which she completed research internships at Google Research and Microsoft Research, and first-authored influential works on conversational recommendation.


Portrait

Ondrej Linda

Zillow

Ondrej Linda has been bringing his experience in AI, machine learning, data science and ethical AI to Zillow for the past eight years. Ondrej currently leads AI Science and Engineering teams focused on exploring the frontiers of Agentic AI systems to build customer-focused features to help people find homes and move. Ondrej also plays a lead role in shaping Zillow's responsible AI-driven initiatives and is actively engaged in supporting Zillow’s ethical AI efforts, ensuring fairness in AI model implementation.

Prior to joining Zillow, Ondrej spent six years at Expedia, where he worked as a Data Scientist specializing in Natural Language Processing and Information Retrieval. He also managed the Data Science team for the Hotwire brand.

Ondrej earned his PhD in Computer Science with a focus on Machine Learning from the University of Idaho in 2012, and holds a Masters in Computer Graphics from Czech Technical University.


Portrait

Kevin Zielnicki

Netflix

Kevin Zielnicki is a research scientist in the Machine Learning and Inference Research team at Netflix. His work focuses on recommendation systems and how users interact with recommendations to make choices. He received his PhD from the University of Illinois at Urbana-Champaign and previously developed recommendation systems for Stitch Fix.

Portrait

Sudeep Das

DoorDash


 Sudeep Das is a senior machine learning and artificial intelligence leader with over 15 years of experience building large-scale, consumer-facing AI systems. He currently serves as Head of Machine Learning & AI for New Business Verticals at DoorDash, where he leads

personalization, search, catalog intelligence, and decision-making systems across rapidly

expanding consumer experiences including grocery, convenience, alcohol, and retail. At DoorDash, Sudeep focuses on applying deep learning, recommender systems, and

generative AI to create highly adaptive, real-time consumer experiences. His work spans

ranking and retrieval, large language model–powered discovery, and agentic systems that reason across user intent, context, and constraints to deliver personalized outcomes rather than static recommendations. Previously, Sudeep was a Machine Learning Lead at Netflix, where he helped develop next-generation personalization and discovery algorithms used by hundreds of millions of users worldwide. He holds a Ph.D. in Astrophysics from Princeton University and brings a strong scientific foundation to practical AI leadership at scale. Sudeepis a frequent speaker at leading international conferences including RecSys, SIGIR, ICML, and QCon, where he shares insights on production ML systems, personalization, and the future of intelligent consumer platforms.


Portrait

Luke Lequn Wang

Netflix

Luke Lequn Wang is a Research Scientist and Engineer at Netflix, specializing in the intersection of generative AI and personalization. He leads the development of Netflix's generative homepage recommender, a customized and efficient LLM that constructs personalized homepages in real-time. His broader research background encompasses reinforcement learning, user interactive systems, LLMs, and trustworthy machine learning. He obtained his Ph.D. in Computer Science from Cornell University.

Portrait

Jaewon Yang

Pinterest

Jaewon Yang is a Principal Machine Learning Engineer at Pinterest, where he specializes in advancing machine learning technologies for recommender systems, generative AI, and representation learning across Pinterest’s products. Before joining Pinterest, he was a Distinguished Machine Learning Engineer at Nextdoor and a Principal ML Engineer at LinkedIn. Jaewon holds a Ph.D. in Machine Learning and a Master’s in Statistics from Stanford University’s Infolab. He has published over 50 papers with more than 9,000 citations and has served on the senior program committees of top-tier conferences such as SIGKDD and CIKM for years. His contributions have been recognized with five best paper awards from SIGKDD, WSDM, and ICDM, including three test-of-time awards.


Talks & Abstracts

Fireside Chat: Product Leadership in the AI Era
Speakers: Becks Wood (Netflix) & Craig Saldanha (Yelp)


Title: When Recommendation Becomes Evaluation: Aggregation, Heterogeneity, and Strategic Gaming

Speaker: Sanmi Koyejo (Standford)

Abstract:

Collaborative filtering and matrix factorization were designed to predict what individual users want. Increasingly, they do something else: rank AI systems on public leaderboards and certify which facts are true on platforms used by billions. Recommendation has become evaluation infrastructure, and that repurposing has consequences. We examine two. In AI evaluation, preference heterogeneity gets averaged away, producing aggregate rankings that hide systematic disagreement across annotator populations. In crowdsourced fact-checking (deployed across X, Meta, TikTok, and Google), matrix factorization repurposed for consensus detection admits coordinated manipulation: a small number of adversarial accounts can fabricate cross-ideological agreement with no exploit required. Robust evaluation is a mechanism design problem, and the recommendation community is well-positioned to address it.


Title: Generative Recommenders at LinkedIn

Speaker: Julie Choi (LinkedIn)

Abstract:

Modern recommender systems can observe long, rich user histories, but most production rankers still struggle to use that history directly at serving time. In this talk, we describe LinkedIn’s work on Generative Ranking, which models member behavior as long sequences and serves candidates through amortized shared-context attention. This design made long-history ranking practical in production and has been deployed in LinkedIn Feed and Ads.
We then discuss Semantic IDs as a compact way to reuse rich signals across ranking systems. Instead of serving every dense representation or full behavior sequence directly, SIDs convert behavior, text, content, or LLM-derived embeddings into discrete tokens that downstream rankers can train on under their own objectives. Together, these techniques show how long-history user understanding can move from offline modeling into production-scale ranking systems.


Title: The Value of Personalized Recommendations: Evidence from Netflix

Speaker: Kevin Zielnicki (Netflix)

Abstract:

Personalized recommendation systems shape much of user choice online, yet their targeted nature makes separating out the value of recommendation and the underlying goods challenging. We build a discrete choice model that embeds recommendation-induced utility, low-rank heterogeneity, and flexible state dependence and apply the model to viewership data at Netflix. We exploit idiosyncratic variation introduced by the recommendation algorithm to identify and separately value these components as well as to recover model-free diversion ratios that we can use to validate our structural model. We use the model to evaluate counterfactuals that quantify the incremental engagement generated by personalized recommendations. First, we show that replacing the current recommender system with a matrix factorization or popularity-based algorithm would lead to 4% and 12% reduction in engagement, respectively, and decreased consumption diversity. Second, most of the consumption increase from recommendations comes from effective targeting, not mechanical exposure, with the largest gains for mid-popularity goods (as opposed to broadly appealing or very niche goods).



Title: From Ranking to Reasoning: Building The Next Generation of Consumer AI at DoorDash

Speaker: Sudeep Das (DoorDash)

Abstract:

Consumer AI is rapidly evolving beyond static ranking and one-shot recommendations into intelligent systems that can reason, remember, guide, and adapt across the entire shopping journey. In this talk, we will share how DoorDash transformed its Search and Discovery stack from traditional deep learning–based approaches into a dynamic, contextual, and generative AI–powered platform for novel consumer experiences. We will explore how this evolution enables dynamic merchandising moments, richer search experiences, hyper-personalized browsing, and conversational shopping assistants that support multimodal interaction. The talk will cover the hybrid architecture behind these experiences, combining the strengths of traditional embeddings, rankers, and sequential models with the semantic reasoning, multimodal understanding, and adaptability of LLMs and VLMs. Finally, we will share key lessons from building a conversational shopping assistant, including how we designed an evaluation harness, developed a flexible consumer memory system, and used personalization to create more intuitive and engaging shopping experiences.


Title: Building AI You Can Trust: Innovation & Fair Housing in Real Estate

Speaker: Ondrej Linda (Zillow)

 Abstract:

Real estate is one of the most complex and regulated consumer domains with months-long journeys spanning multiple personas. In this talk, we will introduce Zillow AI Mode, a conversational AI experience that turns traditional home search into coordinated action by leveraging Zillow's unique combination of live housing data, consumer behavioral context, and real-estate platform infrastructure. AI Mode delivers contextual understanding, experience navigation, multi-modal responses via widgets, and timely connections to human experts. Deploying such a system responsibly requires compliance with fair housing requirements, which prohibit housing discrimination based on protected characteristics. We describe our approach to building fair housing guardrails around the AI Mode experience.


Title: GenPage: Towards Generative End-to-End Homepage Construction at Netflix

Speaker: Luke Lequn Wang (Netflix)

Abstract:

In this talk, I'll present GenPage, Netflix's end-to-end generative approach to homepage construction — a single transformer that replaces our traditional multi-stage recommender stack. GenPage treats user context as a prompt, and autoregressively generates the entire structured, multi-row homepage as the response. We adapt the LLM training recipe: pretraining followed by post-training via weighted binary classification (WBC) or reinforcement learning (RL). For industry-scale deployment, we introduce techniques addressing cold start, model freshness, business-rule enforcement, and serving efficiency. In online A/B tests against our mature, highly optimized production homepage recommender, GenPage delivered statistically significant improvements on our core user engagement metric, while reducing end-to-end serving latency by 20%. Offline experiments yield two findings worth highlighting: enriching the prompt yields a larger improvement than scaling model capacity in our current regime, and RL post-training increases homepage diversity even though diversity is not part of the objective.


Title: The Generative Shift in RecSys at Pinterest: Foundation Models for Ranking, Sequence Generation for Retrieval

Speaker: Jaewon Yang (Pinterest)

Abstract:

Pinterest's recommender system is shifting to a generative paradigm: large transformers trained to predict next items and served in real time. This shift has two main characteristics. First, pretraining and fine-tuning amortize training cost across surfaces. Second, sequence engineering replaces feature engineering, allowing new tasks to be addressed by changing the input sequence rather than the model or features. We describe two systems built on this paradigm. PinRec is a generative retrieval model: a causal transformer pretrained on cross-surface user activity and fine-tuned for each surface through input sequence changes. It introduces outcome-conditioned generation, enabling retrieval of candidates aligned with specific business objectives. PinFM is a 20B-parameter foundation model for ranking. It is pretrained on long-term user activity and fine-tuned by appending candidate Pins to the input sequence, allowing attention to directly model user-candidate interactions. To control serving cost, a request-level transformer deduplicates user sequence processing across candidates.

Both systems share the same recipe: pretrain once, then adapt through sequence engineering. Deployed across Homefeed, Search, and Related Pins, they are now the primary drivers of engagement gains across Pinterest's major surfaces.

Posters:

Semantic ID–Powered Personalization for Notifications

Aria Li 

 

Generative Conversational Search for Mobile and Voice TV

Aditya Sinha, Dhinesh Dhanasekaran, Shahrzad Naseri, Spencer L'Heureux, Vito Ostuni, Matteo Rinaldi

 

Semantic ID @ Netflix: From Generation to Integration

Sejoon Oh, Fernando Amat Gil, Dawit Mureja Argaw, Mark Thornburg, Moumita Bhattacharya, Ashish Rastogi

 

Semantic IDs and LLM-friendly tokenization of titles for Member LLM

Dawit Mureja Argaw

 

From Sparse Coverage to Production-Scale Calibration: A Multi-Strategy GenAI Framework for Trustworthy Retrieval Benchmarks

Ehsan Gholami, Ding Tong, Shahrzad Naseri, Lucas Zhang

  

Self-Reflection in Personalized Explanation Generation and Evaluation

Emma Kong, JJ Tan, David Fagnan

 

Multimedia Asset Personalization via Multimodal Embeddings at Netflix

Emma Kong, Aditya Deshpande, David Fagnan, Ashish Rastogi 

 

MediaFM: A Multimodal Content Model Powering Personalization at Netflix

Avneesh Saluja, Santiago Castro, Bowei Yan, Ashish Rastogi

 

Building a Vertical Video Ranker from Scratch: A Crawl-Phase Journey from Cold Start to Calibrated Engagement

Erik Schmidt, Ramya Nagarajan, Ishita Verma, Yunan Hu, James McInerney 


From Classical Recommenders to LLM‑Native Recommendation Systems at Netflix

Shradha Sehgal, Ying Li, Arjun Rao, Linas Baltrunas


Posters

 The following posters will be presented from 12:30-1:30 at the workshop.


Enhancing Large Language Models with Domain-Specific Content Knowledge for Improved Recommendations

Zhe Zhang, Yesu Feng (Netflix)

In the entertainment sector, content significantly influences viewer decisions, necessitating a Large Language Model (LLM) with specialized knowledge of the entertainment catalog. Traditional LLMs, while proficient in capturing broad factual information, struggle with static knowledge and lack domain-specific expertise. This paper explores post-training knowledge injection via instruction tuning to address these challenges. By converting content into diverse Question-Answer pairs, the model gains a nuanced understanding of titles, enhancing its ability to generalize to new instructions. This enriched context is expected to improve the model's accuracy in recommending the next title, aligning more closely with individual viewer preferences and histories.

LiNR: Model Based Neural Retrieval on GPUs at LinkedIn

Madhu Arun (LinkedIn)

The poster LinkedIn’s large-scale, GPU-based retrieval system for out-of-network feed. The new retrieval system supports a billion-sized index on GPU models where both items and model weights are integrated into the model binary. Viewing index construction as a form of model training, we describe scaling our system for large indexes, incorporating full scans and efficient filtering. A key focus is on enabling attribute-based pre-filtering for exhaustive GPU searches, addressing the common challenge of post-filtering in KNN searches that often reduces system quality. We believe this represents one of the industry’s first Live-updated model-based retrieval indexes. Applied to out-of-network post recommendations on LinkedIn Feed, it has contributed to a +0.1% lift in daily unique professional users. We envisage this as a step towards integrating retrieval and ranking into a single GPU model, simplifying complex infrastructures and enabling end-to-end optimization of the entire differentiable infrastructure through gradient descent.

Counterfactual Inference under Thompson Sampling

Olivier Jeunen (Aampe)

Recommender systems exemplify sequential decision-making under uncertainty, strategically deciding what content to serve to users, to optimise a range of potential objectives.
To balance the explore-exploit trade-off successfully, Thompson sampling provides a natural and widespread paradigm to probabilistically select which action to take.
Questions of causal and counterfactual inference, which underpin use-cases like offline evaluation, are not straightforward to answer in these contexts.
Specifically, whilst most existing estimators rely on action propensities, these are not readily available under Thompson sampling procedures.

We derive exact and efficiently computable expressions for action propensities under a variety of parameter and outcome distributions, enabling the use of off-policy estimators in Thompson sampling scenarios.
This opens up a range of practical use-cases where counterfactual inference is crucial, including unbiased offline evaluation of recommender systems, as well as general applications of causal inference in online advertising, personalisation, and beyond.

LLM for Member Taste Summarization

Thea Wang (Netflix)

In today’s fast-evolving digital space, personalization is key to enhancing user experience and satisfaction. As LLM demonstrates remarkable capabilities in NLP tasks, we explore fine tune open source LLMs for member taste summarization to improve personalization.

Tweedie Regression for Video Recommendation System

Qiang Chen (Tubi)

Modern recommendation systems aim to increase click-through rates (CTR) for better user experience, through commonly treating ranking as a classification task focused on predicting CTR. However, there’s a gap between this method and the actual objectives of businesses across different sectors. In video recommendation services, the objective of video on demand (VOD) extends beyond merely encouraging clicks, but also guiding users to discover their true interests, leading to increased watch time. And longer users’ watch time will leads to more revenue through increased chances of presenting online display advertisements. This research addresses the issue by redefining the problem from classification to regression, with a focus on maximizing revenue through user viewing time. Due to the lack of positive labels on recommendation, the study introduces Tweedie Loss Function, which is better suited in this scenario than the traditional mean square error loss. The paper also provides insights on how Tweedie process capture users’ diverse interests. Our offline simulation and online A/B test revealed that we can substantially enhance our core business objectives: user engagement in terms of viewing time and, consequently, revenue. Additionally, we provide a theoretical comparison between the Tweedie Loss and the commonly employed viewing time weighted Logloss, highlighting why Tweedie Regression stands out as an efficient solution. We further outline a framework for designing a loss function that focuses on a singular objective.

Finding Interest Needle in Popularity Haystack: Improving Retrieval by Modeling Item Exposure

Amit Jaspal, Nicolas Bievre (Meta)

 

Recommender systems operate in closed feedback loops, where user interactions reinforce popularity bias, leading to over-recommendation of already popular items while under-exposing niche or novel content. Existing bias mitigation methods, such as Inverse Propensity Scoring (IPS) and Off- Policy Correction (OPC), primarily operate at the ranking stage or during training, lacking explicit real-time control over exposure dynamics. In this work, we introduce an exposure- aware retrieval scoring approach, which explicitly models item exposure probability and adjusts retrieval-stage ranking at inference time. Unlike prior work, this method decouples exposure effects from engagement likelihood, enabling controlled trade-offs between fairness and engagement in large-scale recommendation platforms. We validate our approach through online A/B experiments in a real-world video recommendation system, demonstrating a 25% increase in uniquely retrieved items and a 40% reduction in the dominance of over-popular content, all while maintaining overall user engagement levels. Our results establish a scalable, deployable solution for mitigating popularity bias at the retrieval stage, offering a new paradigm for bias-aware personalization.

Leveraging Multimodality for Netflix Recommender Systems

Emma Kong, Asad Abbasi, David Fagnan, Bowei Yan, Aneesh Vartakavi, Dhaval Patel, Elliot Chow (Netflix

When logging into Netflix, members are greeted with a diverse array of titles on the homepage. To enrich member’s browsing and discovery experience, we present diverse evidence assets for each title placed in different locations. For example, the most prominent asset is the artwork, or box-art image, which showcases the main characters and underscores the title's theme. We also provide a comprehensive synopsis and evidence cards to outline the content, eye-catching badges to draw users' attention, and supplemental videos (trailers) to offer a sneak peek into the shows.

With the thrilling advancements on the multimodal models, computer visions, and large language models (LLMs), we are motivated to incorporate rich asset information, such as images, text, and videos, into our recommendation algorithms.

Our initial move is to leverage multimodal embeddings in the evidence personalization domain. We have established the paved path to experiment various types of embedding in our image/video/text evidence personalization, and specific applications like query-aware evidence on the search page.

Personalized explanations via GenAI

JJ Tan, Emma Kong, David Fagnan (Netflix)

Recent advancements in Generative AI (GenAI) empower us to generate free-text personalized explanations that connect a recommended title given a member's viewing history. These explanations are not only more expressive but can also incorporate a broader context, offering an unprecedented level of personalization with evidence creation that surpasses the conventional evidence assets. Our goal is to harness AI to empower our recommendation system with reasoning and explainability, make our recommendation algorithm more transparent for our Netflix members.

Real-Time Recommendation Reranking with Goal-Oriented Linear Optimization

Shreyas S Vidyarthi, Sukanya Moorthy (Intuit Credit Karma)

This work presents a novel approach to real-time personalization of recommendations by integrating user-specific goals through linear optimization. Traditional recommendation systems often rely on static models trained on delayed data, limiting their ability to adapt to dynamic user preferences and immediate needs. We address this limitation by introducing a two-stage architecture: (1) A genAI agent invokes recsys tool and generates an initial ranked list of recommendations based on a robust but potentially outdated feature set. (2) Upon user interaction, we capture their specific goal in real-time and employ a lightweight linear optimization layer to dynamically re-rank the initial recommendations by treating the user-specified goal as a constraint within the optimization problem. This approach allows us to maintain the efficacy of the pre-generated recommendations while ensuring precise alignment with the user's immediate objectives. Our method offers a computationally efficient and scalable solution for real-time personalization, enhancing user experience and engagement within the FinTech domain.

Reward Alignment for Recommendation Systems using Two Stage Training

Swanand Joshi, Jaewook Yu, Varad Pathak, Anuj Shah, Gary Tang, Kriti Kohli (Netflix)

Our mission at Netflix is to entertain every member by recommending the right shows to them. While conventional recommendation systems typically focus on immediate metrics such as clicks or short-term engagement, these measures often fail to reflect a user's enduring satisfaction. Our approach aims to recommend content that delivers both immediate appeal and lasting enjoyment, ultimately providing greater value to our members and fostering long-term retention. Over time, achieving this leads to each surface ranker optimizing its own set of reward-maximizing policies, making it harder to innovate on the entire recommendation stack holistically. Moreover, this significantly increases the compute costs associated with training and maintaining bespoke reward-aligned surface policies.

To address these challenges, we are transitioning from independent policies for each surface recommendation model to implementing a centralized "core behavior value" that aligns all ranking models through a consistent algorithmic procedure—two-stage training.

We propose this two-stage training process that decouples reward optimization from training extensive recommendation models. The first stage involves pre-training a policy without specific reward optimization, while the second stage incorporates engineered proxy rewards that better reflect long-term satisfaction. This approach offers backward compatibility and multiple implementation options, including fine-tuning the base policy on smaller datasets optimized for longer-term objectives.

By adopting this two-stage training methodology, we've achieved significant improvements in compute resource efficiency—approximately 70% savings in pre-processing and model training pipelines across various surface models including homepage, category tabs, and candidate generation. Additionally, this approach enables independent innovation tracks for researchers working separately on engagement models and long-term satisfaction metrics.

CATE-based Treatment Covariate Calibration for Messaging Personalization

Ishan Gupta, Matthew Wood (Netflix)

The Messaging Selection Algorithm (MESA) is the core algorithm used by the Netflix Messaging Personalization System (MPS) to create a personalized ranking of messages for a user based on their interests. MESA relies on accurate predictions of the individual treatment effect of a message on a user’s engagement. MESA faces challenges with unpredictable fluctuations in the treatment effect estimation when there is insufficient historical data for specific messages which can negatively impact user experience when specific message intents are under- or over-selected. We present CATE-based Treatment Covariate Calibration, a technique for calibrating treatment effects of any blackbox model based on IPS-weighted causal effect estimates. Our approach is designed with extensibility in mind, allowing for straightforward incorporation of treatment covariates such as message intent or delivery channel into the calibration action space. This flexibility enables more granular calibration across multiple dimensions of the recommendation system. In practical applications to MESA, this algorithm has demonstrated improvements in the calibration of individual treatment effect estimates for models trained using both S-Learner and Double Machine Learning causal learning frameworks.

Multi-layer Bandit Algorithm for Personalizing Ranking and Pacing of Netflix Messages

Ishan Gupta, Kevin Mercurio, Sergi Perez, Matthew Wood (Netflix)

The Messaging Personalization System (MPS) is the Netflix algorithm responsible for providing a personalized message experience for each user by choosing the best messages and delivery channel (e.g. push or email) as well as the timing and frequency of messages. Personalizing the frequency of messages is critical for a high quality user experience as sending too many messages may lead to fatigue or opt-out, while too few messages may lead to reduced user engagement. We present the design of a two-layer algorithm that simultaneously optimizes for both short- and long-term user engagement. The first layer, a slow policy, runs on a weekly cadence to determine the optimal message pacing for each user across different delivery channels and targets the causal effect of messaging on long-term engagement signals (e.g. weekly aggregated). The personalized message pacing is then sent as an input to the second layer, a ranking policy that is triggered on a more frequent cadence (e.g. daily). This fast ranking policy targets short-term engagement signals and makes the final ranking and targeting decisions for a user while adhering to the constraints established by the slow policy.

Towards Addressing Title Cold-Starting in LLM GenRec

Yongchang Hao, Rein Houthooft, Jiangwei Pan, Justin Basilico (Netflix)

Large Language Models (LLMs), pre-trained on extensive internet discussions, present a new paradigm for improving title (i.e., shows and movies) recommendations. Generative recommenders (GenRec) using LLMs have already shown promising results compared to previous baselines. However, the frequent release of new content introduces the challenge of title cold-starting: new titles, not present in interaction sequence datasets, are not learned by the model for recommendation. To alleviate this issue, this work explores methods to incorporate new titles into LLM-based recommendation systems without compromising their inherent strengths, aiming to preserve the advantage LLMs offer in predicting and recommending newly released titles.

Adaptive Multi Turn Intent Classification and Discovery

Sukanya Moorthy , Shreyas Vidyarthi (Intuit Credit Karma)

This work introduces a novel, multi-stage framework for robust intent detection in production dialogue-based search assistants and query rerouting systems. It addresses challenges such as ambiguity in multi-turn interactions, malicious intent detection, continuous adaptation to new intents, and initial data scarcity. The framework uses a staged approach: starting with few-shot learning via Large Language Models (LLMs) and a high-level ontology, it transitions to a supervised classifier with a sliding window for improved contextual awareness as data grows. Unsupervised clustering and weak supervision enable continuous learning, granular intent detection, and identification of harmful requests. A final hybrid architecture integrates the classifier, clustering, and LLMs to handle edge cases and novel intents, ensuring rapid deployment and long-term scalability.

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

 Mingfu Liang, Xi Liu, Huayu Li, Jiyan Yang, Nancy Yu, et al. (Meta) 

Ads recommendation is a prominent service of online advertising systems and has been actively studied. Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement. However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in industrial-scale applications. First, training and inference budgets are restricted for the model to be served, exceeding which may incur latency and impair user experience. Second, large-volume data arrive in a streaming mode with data distributions dynamically shifting, as new users/ads join and existing users/ads leave the system. We propose the External Large Foundation Model (ExFM) framework to address the overlooked challenges. Specifically, we develop external distillation and a data augmentation system (DAS) to control the computational cost of training/inference while maintaining high performance. We design the teacher in a way like a foundation model (FM) that can serve multiple students as vertical models (VMs) to amortize its building cost. We propose Auxiliary Head and Student Adapter to mitigate the data distribution gap between FM and VMs caused by the streaming data issue. Comprehensive experiments on internal industrial-scale applications and public datasets demonstrate significant performance gain by ExFM.

Generative Search: Building an Interactive Discovery Experience by Leveraging LLMs

Netflix

 Aditya Singha, Chris Samarinas, Ding Tong, Ehsan Golami, Matteo Rinaldi, Shahrzad Naseri, Spencer L'Heureux, Sudarshan Lamkhede, Vito Ostuni, Yesu Feng, Zhe Zhang (Netflix)

We discuss how we built a more interactive discovery experience for Netflix using LLMs to understand complex user queries for providing useful recommendations. Audience will be able to try out the Beta as well.

Sponsors

Venue

waitlist
Text goes here
X

Parking

We recommend the Jefferson Avenue Garage, located steps from the theatre, with ample parking. Additional public garages are available nearby. There will be a small fee associated with parking.

Get to know us

NETFLIX RESEARCH


Get your tickets for the event

Tickets closed
Submit
Text goes here
X
[confirmation_headline]
[confirmation_messaging]
Add to Calendar
Text goes here
X

Your hosts

Ruoxi Wang

Senior Software Engineer, Google

Ruoxi is a senior software engineer in Google Brain, focusing on fundamental deep learning research and its applications in recommenders, especially on learning better feature interactions and memory and computational efficient models. She also works very closely across teams in Google's major organic and Ads products to put her research into practice. Ruoxi received her Ph.D. from computational mathematics at Stanford University, where her research interests are numerical linear algebra, randomized algorithms and machine learning. When she's not thinking about how math can improve deep learning models and Google products, she really enjoys hanging out with her paw friend MeiMei, and has been trying to get better at swimming.


Share with Friends
Facebook
Twitter
LinkedIn
Link
Powered by Splash
CONTACT THE ORGANIZER
Google   Outlook   iCal   Yahoo

RSVP

Google Icon
Google
Outlook Icon
Outlook
Apple Icon
Apple
Yahoo Icon
Yahoo