|
Nikhil Mehta
I am a Staff Research Scientist at Google DeepMind,
where I lead research on GenRetrieval and Semantic IDs for personalization in Large Language Models.
I received my Ph.D. from Duke University, advised by
Professor Lawrence Carin,
with a thesis on Continual Learning of Deep Neural Networks.
During my PhD, I interned at Meta FAIR, Google Brain, and Google AutoML.
Prior to Duke, I was a research associate at IIT Kanpur, where I worked with
Professor Piyush Rai
and Professor Gaurav Pandey.
Email  / 
Google
Scholar  / 
Linkedin / 
Github
|
|
|
Selected Publications
My research focuses on Generative Retrieval, personalization with LLMs, and deep learning.
For a complete list, see my Google Scholar.
|
|
|
Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities
Google DeepMind
Technical Report, 2025
Research on video tokenization recipes for understanding and VQA tasks. Preparing and implementing evaluation benchmarks for VQA using YouTube videos.
|
|
|
PLUM: Adapting Pre-trained Language Models for Industrial-scale Generative Recommendations
R. He, L. Heldt, L. Hong, R. Keshavan, S. Mao, N. Mehta, Z. Su, A. Tsai, Y. Wang, et al.
Technical Report, 2025
Adapting pre-trained language models for industrial-scale generative recommendations at Google.
|
|
|
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models
Y. Deldjoo, N. Mehta, M. Sathiamoorthy, S. Zhang, P. Castells, J. McAuley
ACM SIGIR, 2025
A framework for holistic evaluation of recommender systems powered by generative models.
|
|
|
Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations
A. Singh*, T. Vu*, N. Mehta*, R. Keshavan, M. Sathiamoorthy, Y. Zheng, L. Hong, L. Heldt, L. Wei, D. Tandon, E. Chi & X. Yi
ACM RecSys, 2024 (Runner-up Best Short Paper)
Semantic IDs for representing items in corpus, enabling better generalization in ranking for recommendations.
|
|
|
Aligning Large Language Models with Recommendation Knowledge
Y. Cao, N. Mehta, X. Yi, R. Keshavan, L. Heldt, L. Hong, E. Chi & M. Sathiamoorthy
Findings of NAACL, 2024
Methods for aligning large language models with recommendation knowledge for improved personalization.
|
|
|
HyperMix: Out-of-Distribution Detection and Classification in Few-Shot Settings
N. Mehta, K. Liang, J. Huang, F. Chu, L. Yin, & T. Hassner
WACV, 2024
A novel method for out-of-distribution detection and classification in few-shot learning settings.
|
|
|
Meta-Learned Attribute Self-Interaction Network for Continual and Generalized Zero-Shot Learning
V. Verma*, N. Mehta*, K. Liang, & L. Carin
WACV, 2024
A meta-learned approach for attribute self-interaction in continual and generalized zero-shot learning.
|
|
|
Recommender Systems with Generative Retrieval
S. Rajput*, N. Mehta*, A. Singh, R. Keshavan, T. Vu, L. Heldt, L. Hong, Y. Tay, V. Tran, J. Samost, M. Kula, E. Chi & M. Sathiamoorthy
NeurIPS, 2023
Invented GenRetrieval for recommendation systems, now adopted across multiple companies including Spotify, Meta, Snap, Amazon, Kuaishou, and Doordash.
|
|
|
Pushing the Efficiency Limit using Structured Sparse Convolutions
V. Verma*, N. Mehta*, S. Si & L. Carin
WACV, 2023
Methods for pushing the efficiency limit of deep networks using structured sparse convolutions.
|
|
|
Efficient Feature Transformations for Discriminative and Generative Incremental Learning
V. Verma, K. Liang, N. Mehta, P. Rai, & L. Carin
CVPR, 2021
A task-specific feature map transformation strategy for continual learning, which adds minimal parameters to the base architecture while outerperformaing previous methods in discriminative and generative tasks.
|
|
|
WAFFLe: Weight Anonymized Factors for Federated Learning
W. Hao, N. Mehta, K. Liang, P. Cheng, M. Khamy & L. Carin
IEEE Access, Volume 10
A Bayesian nonparametric framework using shared rank-1 weight factors for
federated learning.
|
|
|
Continual Learning using a Bayesian Nonparametric Dictionary of Weight Factors
N. Mehta, K. Liang, V. Verma, & L. Carin
AISTATS, 2021
Continual Learning using Bayesian Nonparametrics and layer-wise dictionary of
weight factors.
|
|
|
Counterfactual Representation Learning with Balancing Weights
S. Assaad, S. Zeng, C. Tao, S. Datta, N. Mehta, R. Henao, F. Li, & L. Carin
AISTATS, 2021
Counterfactual Representation Learning with Balancing Weights
|
|
|
Graph Representation Learning via Ladder Gamma VAE.
A. Sarkar*, N. Mehta*, & P. Rai
AAAI, 2020. Early version appeared at Graph Representation Learning
Workshop (NeurIPS).
A Gamma Ladder Variational Autoencoder for graph-structured data that brings
together the interpretability in hierarchical, multilayer latent variable models
and the strong representational power of graph encoders.
|
|
|
Stochastic Blockmodels meet Graph Neural Networks
N. Mehta, L. Carin & P. Rai
ICML, 2019
A deep generative framework for overlapping community discovery and link
prediction combining the interpretability of stochastic blockmodels, such as the
latent feature relational model, with the
modeling power of deep generative models.
|
|
|
Estimation and Sampling of
Unnormalized Statistical Models with Stein Score Matching
N. Mehta*, Jiachang Liu*, Chenyang Tao & L.
Carin,
Workshop on Stein’s Method. ICML, 2019
We propose using Stein's Method to estimate the parameters of a
distribution based on given samples. We also estimate the score function of the
empirical distribution and propose a new generative model combining Stein
score matching with the Langevin flow.
|
|
|
Deep Topic Models for Multi-Label Learning
R. Panda, A. Pensia, N. Mehta, M.Zhou & P. Rai
AISTATS, 2019
We present a probabilistic framework for multi-label learning based on a deep
generative model for the binary label vector associated with each observation.
|
|
|
Survival Cluster Analysis
P. Chapfuwa, C. Li, N. Mehta, L. Carin and R. Henao
CHIL, 2019
A Bayesian nonparametric model for jointly inferring the time-to-event and the
individualized risk-based cluster assignments.
|
|
|
Robust and Fast 3D Scan Alignment using Mutual
Information
N. Mehta, J. McBride & G. Pandey
ICRA, 2018
A mutual information based algorithm for the estimation of full
6-degree-of-freedom (DOF) rigid body transformation between two overlapping
point clouds.
|
|