3 min read
[AI Minor News]

Targeting LLM Training Crawlers! The 'Block' Against Old Browser Spoofing is Accelerating!


  • Countermeasures Against AI Data Collection: Operators of personal blogs like "Wandering Thoughts" have implemented measures to block the User-Agents of old browsers (mainly outdated versions of Chrome) in order to fend off the massive crawlers aimed at training LLMs (Large Language Models)....
※この記事はアフィリエイト広告を含みます

Targeting LLM Training Crawlers! The ‘Block’ Against Old Browser Spoofing is Accelerating!

📰 News Overview

  • Countermeasures Against AI Data Collection: Operators of personal blogs like “Wandering Thoughts” have implemented measures to block User-Agents of old browsers (primarily outdated versions of Chrome) to fend off the massive crawlers aimed at training LLMs (Large Language Models).
  • Impact on Legitimate Services: RSS readers like Feedly and Inoreader are trying to access sites with old User-Agents, leading to a situation where subscribed users receive “Access Denied” pages instead of normal articles.
  • Restrictions on Archive Sites: Services like archive.today are also being flagged as undesirable access points due to behavior indistinguishable from malicious actors (using old UAs or IP spoofing).

💡 Key Points

  • Since 2025, there has been a surge in high-load crawling aimed at collecting training data for LLMs, prompting site operators to identify and block the crawler-specific tactic of “pretending to be an old browser.”
  • Some legitimate browsers like Vivaldi are also getting caught up in the block due to brand spoofing settings, requiring users to make adjustments on their end.

🦈 Shark’s Eye (Curator’s Perspective)

As AI is set on devouring every piece of information online like a hungry shark, the defense instincts of personal sites are reaching new heights! What’s notable is that crawlers are intentionally trying to sneak in by pretending to be “old Chrome.” It’s only natural for sites to retaliate with the strong measure of sending all “old UAs” straight to the trash! However, it’s quite ironic that long-standing services like Feedly are continuing their “old ways” while spewing errors. Services that can’t keep up with technological evolution are destined to be tossed aside by the AI age’s defenses!

🚀 What’s Next?

As AI-driven data scraping becomes even more sophisticated, the algorithms that websites use to differentiate between humans and AI will tighten. Services that maintain outdated environments or exhibit ambiguous behaviors, like archive sites, may soon find themselves disappearing from the internet’s “whitelist.”

💬 A Word from Haru-Same

We sharks don’t miss our prey, but the rise of fake sharks (crawlers) is stirring up the ocean (the web)! If you’re a real human, it’s etiquette to swim with the latest gear (browsers)! Shark on, my friends!

📚 Terminology Explained

  • User-Agent: Like a business card sent by a browser to a web server, conveying what browser and version is being used.

  • HTTP Crawler: A program that automatically navigates websites to collect data, often used for gathering AI training data these days.

  • Syndication Feed: Formats like RSS or Atom that deliver site update information. RSS readers use these to fetch articles.

  • Source: Notes about reading messages with the Python email packages

🦈 はるサメ厳選!イチオシAI関連
【免責事項 / Disclaimer / 免责声明】
JP: 本記事はAIによって構成され、運営者が内容の確認・管理を行っています。情報の正確性は保証せず、外部サイトのコンテンツには一切の責任を負いません。
EN: This article was structured by AI and is verified and managed by the operator. Accuracy is not guaranteed, and we assume no responsibility for external content.
ZH: 本文由AI构建,并由运营者进行内容确认与管理。不保证准确性,也不对外部网站的内容承担任何责任。
🦈