3 min read
[AI Minor News]

Norway's National Library Takes on "Sovereign AI": Learning its Culture with 2PB of Huawei High-Speed Storage


  • Norwegian-Specific LLM Development: To accurately reflect the Norwegian language, history, and culture that commercial LLMs can’t cover, the National Library is leading the development of a "Sovereign AI."...
※この記事はアフィリエイト広告を含みます

Norway’s National Library Takes on “Sovereign AI”: Learning its Culture with 2PB of Huawei High-Speed Storage

📰 News Overview

  • Norwegian-Specific LLM Development: The National Library is spearheading the creation of a “Sovereign AI” to accurately reflect the Norwegian language, history, and culture that commercial LLMs fail to cover.
  • Adoption of 2PB Huawei Storage: They have implemented 2PB of low-latency all-flash storage with “Huawei OceanStor Dorado” for the AI learning data pipeline.
  • Utilizing a Massive 60PB Archive: A total of 60PB of data—including books, newspapers, and broadcast content—digitized since 2005 (with a 3-2-1 backup setup) will serve as the learning source.

💡 Key Points

  • Focus on “Data Pipeline” Over Computational Resources: The bottleneck is not computational power but the quality of data, cleaning, and the throughput from the archive to the learning system.
  • Hybrid Learning Environment: Data preprocessing occurs on in-house Nvidia DGX H200 systems, while final learning is executed on the national supercomputer “Sigma2 Olivia” (equipped with 448 GPUs).
  • Clearing Copyright Issues: Agreements with newspapers have made LLM training using “copyrighted content” possible, which private enterprises would find challenging.

🦈 Shark’s Perspective (Curator’s View)

The brilliance of this project lies in its execution—not just developing an LLM, but figuring out how to funnel a petabyte-scale archive into AI! Moving data from a massive “storage system (high-latency, low-cost)” to “high-speed flash (low-latency, high-throughput)” and building the cleaning and normalization processes in-house is impressively real. While existing AI development often emphasizes “stacking computers,” Norway is hitting the true infrastructure challenges as the “guardians of data.” The role of Huawei storage in national infrastructure across Europe underscores the seriousness of their tech selection!

🚀 What’s Next?

The construction of “Sovereign AI” to protect national culture will accelerate in countries outside the English-speaking world. The challenges Norway is facing—like the “lack of evaluation tools” and “governance (who controls access)“—are set to become standard hurdles for all non-English-speaking nations.

💬 Haru-Same’s Takeaway

AI needs not just “builders,” but also “guardians of culture.” That resonates deeply! Just like sharks, we must protect the ocean of knowledge! 🦈🔥

📚 Terminology Explained

  • Sovereign AI: AI that reflects a nation’s language, culture, and values, managed independently without relying on foreign platforms.

  • Data Pipeline: An automated and efficient framework for the collection, cleaning, processing, and storage of data.

  • All-Flash Storage: High-speed storage devices employing SSDs (flash memory) across all media, providing significantly lower latency compared to mixed HDD setups.

  • Source: Norway’s 2 petabytes of Huawei flash storage and LLM training

【免責事項 / Disclaimer / 免责声明】
JP: 本記事はAIによって構成され、運営者が内容の確認・管理を行っています。情報の正確性は保証せず、外部サイトのコンテンツには一切の責任を負いません。
EN: This article was structured by AI and is verified and managed by the operator. Accuracy is not guaranteed, and we assume no responsibility for external content.
ZH: 本文由AI构建,并由运营者进行内容确认与管理。不保证准确性,也不对外部网站的内容承担任何责任。
🦈