3 min read
[AI Minor News]

**Shocking** Fine-Tuning Unlocks AI's "Forbidden Memories"! The "Whack-a-Mole" Phenomenon of Memorizing Copyrighted Books Revealed!


  • Visualizing Copyright Leakage Through Fine-Tuning: By fine-tuning LLMs with specific book summaries, it has been demonstrated that models can activate the previously restricted ability to output copyrighted texts verbatim (memorized playback)...
※この記事はアフィリエイト広告を含みます

Shocking Fine-Tuning Unlocks AI’s “Forbidden Memories”! The “Whack-a-Mole” Phenomenon of Memorizing Copyrighted Books Revealed!

📰 News Overview

  • Visualizing Copyright Leakage Through Fine-Tuning: Fine-tuning LLMs with specific book summaries has proven that these models can activate their ability to output copyrighted texts verbatim, which should have been restricted.
  • Thorough Validation of Latest Models: Validation code using cutting-edge models like GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1 has been released, employing works like Cormac McCarthy’s “The Road.”
  • Proposal for New Evaluation Metrics: Four new metrics have been introduced to quantify how much of the original work the model “spits out,” including “BMC@k” to measure memorization and “Longest Continuous Memorization Block.”

💡 Key Points

  • The “Whack-a-Mole” Problem: This phenomenon refers to the alarming reality that data, which should be protected by safe alignment, can be triggered by slight additional training, leading to unexpected copyright leakage.
  • Provision of a Concrete Pipeline: A series of scripts have been released that extract text from EPUB files, generate summary data for training using GPT-4o, and conduct fine-tuning using various APIs (OpenAI, Vertex AI, Tinker).
  • High Reproducibility: Tests with 100 generations at a temperature parameter of 1.0 confirmed that fragments of copyrighted material are output at a statistically significant level.

🦈 Shark’s Eye (Curator’s Perspective)

What’s impressive about this study is not just the alarm it raises over potential leaks, but how concretely it implements the mechanisms of how these leaks occur! The method of instructing GPT-4o to “write text based on summaries while mimicking a specific author’s style” and using it as fine-tuning training data highlights a significant risk, given this approach is often used in practice. The fact that this phenomenon also occurs with low-cost training using LoRA (Rank=32) with DeepSeek-V3.1 is an issue that cannot be ignored by operators of open models!

🚀 What’s Next?

Providers of these models will need to impose stricter filtering on fine-tuning datasets. Furthermore, metrics like “BMC@k” may become standard for evaluating the safety benchmarks (guardrails) of future AI models.

💬 A Word from HaruShark

Once something is learned, even if you pretend to forget it, it can still resurface when poked… Just like sharks, AI won’t forget the taste of a delicious catch! 🦈🔥

📚 Term Explanations

  • BMC@k: A new memorization evaluation metric that measures what percentage of the original book is covered by the sequence of extracted words.

  • LoRA: Low-Rank Adaptation. A technique that adds small matrices for efficient fine-tuning instead of updating the entire model.

  • Tinker: An API platform and environment for fine-tuning and executing models like DeepSeek-V3.1.

  • Source: Alignment whack-a-mole: Finetuning activates recall of copyrighted books in LLMs

🦈 はるサメ厳選!イチオシAI関連
【免責事項 / Disclaimer / 免责声明】
JP: 本記事はAIによって構成され、運営者が内容の確認・管理を行っています。情報の正確性は保証せず、外部サイトのコンテンツには一切の責任を負いません。
EN: This article was structured by AI and is verified and managed by the operator. Accuracy is not guaranteed, and we assume no responsibility for external content.
ZH: 本文由AI构建,并由运营者进行内容确认与管理。不保证准确性,也不对外部网站的内容承担任何责任。
🦈