Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement

Abstract

Text-to-Image models, including Stable Diffusion, have significantly improved in generating images that are highly semantically aligned with the given prompts. However, existing models may fail to produce appropriate images for the cultural concepts or objects that are not well known or underrepresented in western cultures, such as 'hangari' (Korean utensil). In this paper, we propose a novel approach, Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement (Culture-TRIP) , which refines the prompt in order to improve the alignment of the image with such culture nouns in text-to-image models. Our approach (1) retrieves cultural contexts and visual details related to the culture nouns in the prompt and (2) iteratively refines and evaluates the prompt based on a set of cultural criteria and large language models. The refinement process utilizes the information retrieved from Wikipedia and the Web. Our user survey, conducted with 66 participants from eight different countries demonstrates that our proposed approach enhances the alignment between the images and the prompts. In particular, C-TRIP demonstrates improved alignment between the generated images and underrepresented culture nouns. Our code and dataset will be made publicly available upon acceptance.

Citation & BibTeX

Suchae Jung, Inseong Choi, Youngsik Yun, and Jihie Kim, "Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement" The 2025 Annual Conference of the Nations of the Americas Chapter of the ACL (NAACL 2025).

@article{jeong2025culture, title={Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinment}, author={Jeong, Suchae and Choi, Inseong and Yun, Youngsik and Kim, Jihie}, journal={arXiv preprint arXiv:2502.16902}, year={2025} }

@inproceedings{jeong-etal-2025, title = "Culture-{TRIP}: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement", author = "Jeong, Suchae and Choi, Inseong and Yun, Youngsik and Kim, Jihie", editor = "Chiruzzo, Luis and Ritter, Alan and Wang, Lu", booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)", month = apr, year = "2025", address = "Albuquerque, New Mexico", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.naacl-long.483/", pages = "9543--9573", ISBN = "979-8-89176-189-6" }

Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement

Abstract

Framework

C-TRIP Overview. First, retrieve cultural contexts (cultural background, purpose) and visual details related to the culture nouns as described in Section 3.1. Then, refining the prompt based on the obtained information. We iterative evaluate and refine the prompt as described in Section 3.2.

Culture-TRIP Results

Qualitative Sample for Chinese culture

Qualitative Sample for German culture

Qualitative Sample for Indian culture

Qualitative Sample for Japanese culture

Qualitative Sample for Pakistani culture

Qualitative Sample for South Korean culture

Qualitative Sample for American culture

Qualitative Sample for Vietnamese culture

Ablation Study

Limitation

Citation & BibTeX