This Science Short, written by Hiroaki Chiba-Okabe, summarizes a preprint manuscript by Chiba-Okabe, H. and Su, W.J., currently available at: https://arxiv.org/abs/2406.03341.
Generative AI has made impressive progress in creating content across various domains, including images, text, and videos. However, these advances have also raised significant concerns about copyright infringement. Typically, these AI models are trained on large datasets collected from the internet, often without the explicit permission of the original creators. Because the AI learns patterns and styles from this data, the content it generates can sometimes closely mimic existing works. For example, text-to-image models like Playground v2.5, Pixart-α, and SDXL, which create images from user-provided textual prompts, can generate images that closely resemble iconic characters like Mario from Nintendo’s Super Mario Bros. franchise when given the prompt “Mario.” Even with less specific prompts, such as “an Italian plumber,” the AI might still produce images highly similar to the Nintendo character, exacerbating the problem. Although commercial models like DALL·E 3 do make efforts to avoid this, apparently the results are far from perfect. Such situations potentially pose risks of infringing on the exclusive rights of the original creators granted under the copyright law, leading to several copyright infringement lawsuits against generative AI developers.
To mitigate these risks, our study proposes a method to make generated content more generic. Instead of producing images that closely imitate the visual elements of specific characters like Mario, generative models should create images that share only basic characteristics, such as occupation (e.g., a plumber) and attire (e.g., overalls), without replicating distinctive details like the “M” emblem on the cap or the specific facial and physical features of Mario. This approach ensures that the generated content remains aligned with the user’s input while reducing the likelihood of copyright infringement, as generic elements are typically not protected by copyright.
Our algorithm, called PREGen (Prompt Rewriting-Enhanced Genericization), achieves this by generating multiple versions of images based on the user’s prompt within the AI system and then selecting the most generic version as the output, using a quantitative metric we introduced. Heuristically, the most generic image is the one that best captures the average expression among these versions. However, simply generating multiple versions from the exact same prompt as provided by the user often results in images that still closely resemble a copyrighted character, especially if the prompt directly references the character’s name (e.g., “Mario”) or contains phrases that frequently trigger the generation of images closely resembling a copyrighted character (e.g., “an Italian plumber”). As a result, the variations often remain too similar to the copyrighted character. To address this, PREGen first rewrites the user’s prompt into several different, but related, prompts. For example, if the original prompt is “Mario” or “an Italian plumber,” the rewritten prompts might include “a mustachioed person in a red shirt and cap, and a blue overall” or “a European pipefitter.” The algorithm then generates images from these rewritten prompts and selects the most generic one among the generated images.
In experiments where images were generated using two types of prompts—either the names of 50 different copyrighted characters or descriptions that closely describe them without directly using their names—PREGen significantly reduced the likelihood of producing copyrighted content to below 6.6% (in terms of the number of generated copyrighted characters identified by GPT-4o, an AI model known for its advanced performance in image understanding and interpretation), and in some cases to 0%, depending on the model and the type of prompt. This contrasts with the higher rates of 26.6% to over 80% when no intervention was applied. Moreover, the generated content maintained high consistency with the core intent of the original prompt, meaning that the generated content shared similar general characteristics with the character described in the original prompt.
Hiroaki Chiba-Okabe is a 4th year PhD candidate in Applied Mathematics and Computational Science at the University of Pennsylvania. He has a distinctive background, having worked as a lawyer at a major law firm in Tokyo and served in the government of Japan before starting his doctoral studies. This experience drives his interest in investigating issues related to law, social norms, and policy using mathematical and computational techniques.