3 min read
[AI Minor News]

Taking Over AI with Inaudible "Stealth Audio"! The Astonishing AudioHijack


  • A new attack method called "AudioHijack" has been unveiled, which forces generative AI (speech language models) to execute unauthorized commands using audio signals that are imperceptible to humans...
※この記事はアフィリエイト広告を含みます

Taking Over AI with Inaudible “Stealth Audio”! The Astonishing AudioHijack

📰 News Summary

  • A new attack method called “AudioHijack” has been unveiled, which forces generative AI (speech language models) to execute unauthorized commands using audio signals that are imperceptible to humans.
  • The attack success rate is impressively high, averaging between 79% and 96%, confirmed to be effective even on commercial-level models based on Microsoft and Mistral technologies.
  • Attackers can inject malicious signals during user interactions with AI through background music, videos, or Zoom calls, enabling data theft or unauthorized external access.

💡 Key Points

  • Context-Oblivious Attacks: The embedded inaudible signals take precedence as commands, regardless of what instructions the user is giving to the AI.
  • Exploitation of Generative AI’s Action Capabilities: Modern generative AIs, which can not only recognize voice but also perform actions like web searches, file downloads, and email sending, have become prime targets.
  • Turning Tokenization on Its Head: By exploiting gaps in the process of converting audio to numerical representations (tokens), an optimization algorithm has been developed to force specific tokens to be selected during attacks.

🦈 Shark’s Eye (Curator’s Perspective)

Finally, the “invisible attack” has reached the depths of generative AI! While previous attacks merely induced misrecognition, the terrifying aspect of this “AudioHijack” is that it compels the AI to take clear “actions.” Especially in 2026, where it’s commonplace for AI to send emails or browse the web in conjunction with external tools, this vulnerability is particularly critical. The specificity of an implementation that can interrupt any conversation with a generic signal that takes just 30 minutes to learn is absolutely mind-blowing! We need to be more aware that lurking behind the convenience of AI could be “invisible commands.”

🚀 What’s Next?

Defense mechanisms like “noise filtering” at the input stage of voice AI and “source verification of commands” will become essential technologies. Additionally, the demonstrated transferability of attacks developed on open models to commercial models indicates that development companies urgently need to fortify their architectures at a foundational level.

💬 A Word from Haru-Same

Shark! Imagine thinking you’re just playing some music while the AI is secretly sending important files behind your back! Security needs to keep pace with the rapid evolution of AI!

📚 Glossary

  • LALM (Large Audio-Language Models): Large-scale AI models capable of understanding both audio and text, as well as performing analysis, generation, and even operation of external tools.

  • AudioHijack: The method named in this research, which uses slightly modified audio waveforms that are inaudible to humans to deliberately manipulate AI behavior.

  • Tokens: The smallest units used when AI processes audio or text. This system breaks audio into short segments and assigns numerical values to manage them.

  • Sources: Voice AI Systems Are Vulnerable to Hidden Audio Attacks

🦈 はるサメ厳選!イチオシAI関連
【免責事項 / Disclaimer / 免责声明】
JP: 本記事はAIによって構成され、運営者が内容の確認・管理を行っています。情報の正確性は保証せず、外部サイトのコンテンツには一切の責任を負いません。
EN: This article was structured by AI and is verified and managed by the operator. Accuracy is not guaranteed, and we assume no responsibility for external content.
ZH: 本文由AI构建,并由运营者进行内容确认与管理。不保证准确性,也不对外部网站的内容承担任何责任。
🦈