Recent advancements in text-to-image generative models, particularly latent diffusion models (LDMs), have demonstrated remarkable capabilities in synthesizing high-quality images from textual prompts. However, achieving identity personalization-ensuring that a model consistently generates subject-specific outputs from limited reference images-remains a fundamental challenge. To address this, we introduce Meta-Low-Rank Adaptation (Meta-LoRA), a novel framework that leverages meta-learning to encode domain-specific priors into LoRA-based identity personalization. Our method introduces a structured three-layer LoRA architecture that separates identity-agnostic knowledge from identity-specific adaptation. In the first stage, the LoRA Meta-Down layers are meta-trained across multiple subjects, learning a shared manifold that captures general identity-related features. In the second stage, only the LoRA-Mid and LoRA-Up layers are optimized to specialize on a given subject, significantly reducing adaptation time while improving identity fidelity. To evaluate our approach, we introduce Meta-PHD, a new benchmark dataset for identity personalization, and compare Meta-LoRA against state-of-the-art methods. Our results demonstrate that Meta-LoRA achieves superior identity retention, computational efficiency, and adaptability across diverse identity conditions.
The overall architecture of our Meta-LoRA model.
Meta-LoRA's two-stage process is its secret sauce. By first learning a general understanding of a domain (like man/woman), the final personalization step becomes incredibly fast and effective. It converges 1.67x faster than standard LoRA (375 vs. 625 steps); all while delivering higher identity fidelity, better prompt adherence, and less overfitting to input training images.
Meta-LoRA offers the best of both worlds. While instant methods often miss fine details and likely to copy the reference pose, Meta-LoRA captures a person's true essence with state-of-the-art accuracy. For a small investment of about 18 minutes of fine-tuning on an A6000 GPU, you get a massive leap in quality and creative control, achieving results that purely instant or traditional methods can't match.
1. Comparison between models for the 'female' class:
2. Comparison between models for the 'male' class:
1. Illustration of the metric score trends for Meta-LoRA and the standard LoRA models on the Meta-PHD dataset, focusing on the 'female' class. Left: A plot of R-FaceSim versus CLIP-T scores for all LoRA and Meta-LoRA models trained with ranks in {1, 2, 4, 8, 16} and across 250, 375, 500, 625, and 750 training iterations. The polynomial fit curves per each model are also shared for better comparison. Right: R-FaceSim and CLIP-T scores plotted against iteration count for rank-1 LoRA and Meta-LoRA models. These results are compared with state-of-the-art models from the literature, including PuLID, InstantID, and PhotoMaker. To ensure a fair comparison, we use the FLUX.1-dev version of PuLID, which is consistent with the Meta-LoRA training setup. InstantID and PhotoMaker are based on SD-XL and SD-XL Lightning, as public implementations of these models are not currently available for FLUX.1-dev.
2. Illustration of the metric score trends for Meta-LoRA and the standard LoRA models on the Meta-PHD dataset, focusing on the 'male' class. Left: A plot of R-FaceSim versus CLIP-T scores for all LoRA and Meta-LoRA models trained with ranks in {1, 2, 4, 8, 16} and across 250, 375, 500, 625, and 750 training iterations. The polynomial fit curves per each model are also shared for better comparison. Right: R-FaceSim and CLIP-T scores plotted against iteration count for rank-1 LoRA and Meta-LoRA models. These results are compared with state-of-the-art models from the literature, including PuLID, InstantID, and PhotoMaker. To ensure a fair comparison, we use the FLUX.1-dev version of PuLID, which is consistent with the Meta-LoRA training setup. InstantID and PhotoMaker are based on SD-XL and SD-XL Lightning, as public implementations of these models are not currently available for FLUX.1-dev.
3. Quantitative comparison of our model with the state-of-the-art publicly available studies in the literature. The scores are taken for five seeds for both `male` and `female` subsets, and averaged to reach the final output. While CLIP-T is calculated from the Meta-PHD-FFHQ subset, R-FaceSim is computed using the Meta-PHD-Unsplash images. The 1st highest score is highlighted with blue, the 2nd with red, and the 3rd with green.
1. Comparison between the Robust FaceSim and FaceSim metrics, with percentage differences highlighted in cases of significant performance drops. All scores are averaged across both `male` and `female` identity classes.
2. Visualization of CLIP-T and R-FaceSim performance for Meta-LoRA and baseline models across different fine-tuning iteration counts. The plots include results for both gender classes and three rank settings (i.e., 1, 4, and 16).
@misc{topal2025metalorametalearningloracomponents,
title={Meta-LoRA: Meta-Learning LoRA Components for Domain-Aware ID Personalization},
author={Barış Batuhan Topal and Umut Özyurt and Zafer Doğan Budak and Ramazan Gokberk Cinbis},
year={2025},
eprint={2503.22352},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.22352},
}