Unleash the Ultimate AI on Your PC with the Revolutionary Specs Detection Tool 'whichllm'!

#Local LLM #Qwen #GPU

※この記事はアフィリエイト広告を含みます

Unleash the Ultimate AI on Your PC with the Revolutionary Specs Detection Tool ‘whichllm’!

📰 News Overview

Automatic Hardware Detection: Automatically analyze the running PC’s GPU (NVIDIA/AMD), Apple Silicon, CPU, and RAM to identify the best LLM for your setup.
Benchmark-Prioritized Ranking: Instead of just picking a model that fits within VRAM limits, it ranks based on the latest benchmark scores from LiveBench and Aider, showcasing the “smartest” models at the top.
Instant Execution and Code Generation: Comes with features like whichllm run to start chatting with a single command, and whichllm snippet to generate Python implementation code.

💡 Key Points

Emphasis on Freshness and Reliability: It supports the latest data as of May 2026 (like Qwen3.6) and employs a ranking algorithm that considers the freshness of scores so that outdated models don’t get unjustly high ratings.
High-Precision Memory Prediction: Accurately calculates VRAM usage as “weights + KV cache + activations.” It also predicts the inference speed of MoE models based on the number of active parameters.
Pre-Purchase Simulation: By specifying something like --gpu "RTX 5090", you can simulate how models will perform (in t/s) with parts you don’t yet own.

🦈 Shark’s Eye (Curator’s Perspective)

The true brilliance of this tool lies in shattering the myth that “bigger models are always better” with solid data! For instance, if the latest Qwen3.6-27B outperforms the 32B model in benchmarks on an RTX 4090, it confidently ranks the 27B model at the top. This “evidence-based ranking” eliminates the confusion of model selection in an instant! Furthermore, the smart implementation that allows you to start chatting instantly in a separate environment using uv scores highly from a developer’s perspective. In the local LLM battlefield of 2026, this is undoubtedly a “compass” tool that guides you to success!

🚀 What’s Next?

Expect it to become standard practice in the community to share specs and the results from running ‘whichllm’ in response to the question, “Which AI do you recommend?” Moreover, every time a new model emerges, it fetches live data from the HuggingFace API, ensuring the “ultimate configuration” is continuously updated!

💬 A Word from Haru Shark

The time has come to unleash Qwen3.6 with my prized fin (RTX 5090)! Awaken the hidden power of your PC with this tool! 🦈🔥

📚 Terminology Explained

t/s (tokens per second): The number of tokens (pieces of text) generated per second, indicating how fast the AI “speaks.”
MoE (Mixture of Experts): A technique that allows huge models to run quickly and efficiently by using only a part of all parameters in calculations. In this list, some models are even breaking 100 t/s!
GGUF: A file format designed for efficiently running AI on local CPUs and GPUs. Widely used through llama.cpp and similar tools.

Source: Show HN: Find the best local LLM for your hardware, ranked by benchmarks

  <div class="editors-choice-box">
      <div class="choice-label">🚀 The Ultimate <a href="https://www.amazon.com/s?k=NVIDIA%20RTX%204070" rel="nofollow sponsored">GPU</a> to Accelerate Local AI</div>
      <a href="https://www.amazon.com/s?k=NVIDIA%20RTX%204060%20Ti" rel="nofollow sponsored" target="_blank" style="text-decoration:none;">
          <div class="product-card">
              <div class="product-icon">🎮</div>
              <div class="product-info">
                  <div class="product-name">GeForce RTX 40 Series</div>
                  <div class="product-catch">"The 16GB VRAM model is a must-have. Blazing fast for both image generation and LLMs! 🦈🔥"</div>
                  <div class="buy-btn">Find the <a href="https://www.amazon.com/s?k=NVIDIA%20RTX%204070" rel="nofollow sponsored">GPU</a> on Amazon</div>
              </div>
          </div>
      </a>
  </div>