Model Taste TestingGuess what made it.
A spoiler-safe evaluation game for identifying model families, specific model versions, prompt patterns, and human-authored artifacts.
1
Question at a time
0
Hidden answers before commit
+1
Per correct first answer
Evaluation flight
A clean loop for blind review, reveal, and aggregate signal.
Players see one artifact, commit an answer, then review the correct answer, explanation, distribution, and their own score. First answers drive public scoring.
01 / Mode
Model family
Start broad with Claude, GPT, Gemini, Llama, Mistral, or human.
02 / Mode
Specific model
For selected rounds, narrow the answer to the exact model version.
03 / Mode
Prompt family
Identify the instruction style behind the artifact.
04 / Mode
Human or model
Separate human-authored work from generated output.
Artifact coverage
Built for mixed outputs
Spoiler-safe by default
Questions never ship hidden model or prompt metadata before a guess is committed.
First answers count
Leaderboards use first-answer scoring, while revised guesses stay useful for learning.
Admin-curated sets
Curate, import, generate, review, and publish artifacts with source and license metadata.