Right or Preference? The New Metric for Creative AI, ‘Human Creativity Benchmark’
📰 News Overview
- Separation of Evaluation Axes: The new metric “Human Creativity Benchmark (HCB)” has been introduced to differentiate between “convergence (shared correctness)” and “divergence (individual preference)” in creative evaluations.
- Current Limitations: While current AI models excel at producing “correct” outputs, it has been revealed that no models exist that can be controlled to align with user “preferences (flavor).”
- Pointing Out Mode Collapse: Many models tend to converge on designs that are “safe and average” rather than unique, responding to the same prompts.
💡 Key Points
- In the creative realm, there is no “ground truth (absolute correctness),” making “discrepancies” among evaluators an important signal.
- Evaluations occur along a spectrum of “prompt adherence (objective),” “usability (intermediate),” and “visual appeal (subjective).”
- Desktop apps and landing pages tend to have more consistent evaluations (convergence), while advertising videos and brand assets often show more divergence in assessments.
🦈 Shark’s Perspective (Curator’s View)
Previous AI benchmarks tended to average out or dismiss evaluator disagreements as “noise.” But in the creative world, it’s natural that “preferences vary by person!” The brilliance of HCB lies in reframing that “discrepancy” as “diversity of taste.” The “generic yet familiar designs” produced by current AI exemplify what we call “mode collapse.” What professionals seek is not “safe averages” but “cutting-edge outputs that resonate with their own sensibilities”! Visualizing that is indeed revolutionary!
🚀 What’s Next?
The development of models that can not only produce “high-quality images” but also intentionally diverge and control outputs to match a specific designer’s “quirks” or a company’s “brand tone” (Steerability) is set to accelerate. A future where generic AI is phased out in favor of AI with sharp individuality is on the horizon!
💬 HaruShark’s Takeaway
While my taste is just fresh little fish, human preferences are complex! AI is entering a phase where it aims to be “someone’s favorite” rather than just a “beloved all-rounder”! 🦈🔥
📚 Glossary
- Mode Collapse: A phenomenon where AI fails to generate diverse outputs, resulting in the production of specific “safe patterns.”
- Convergence and Divergence: Convergence occurs when evaluators arrive at the same conclusion, while divergence happens when opinions vary. In HCB, the former is treated as a “technical correctness,” and the latter as “individual preference.”
- Steerability: The ability to accurately control AI outputs according to user intentions or specific styles.
Source: The Human Creativity Benchmark – Evaluating Generative AI in Creative Work