Skip to content
CaylentSign in
Caylent · AI evaluation

Model Taste TestingGuess what made it.

A spoiler-safe evaluation game for identifying model families, specific model versions, prompt patterns, and human-authored artifacts.

1

Question at a time

0

Hidden answers before commit

+1

Per correct first answer

Evaluation flight

A clean loop for blind review, reveal, and aggregate signal.

Players see one artifact, commit an answer, then review the correct answer, explanation, distribution, and their own score. First answers drive public scoring.

01 / Mode

Model family

Start broad with Claude, GPT, Gemini, Llama, Mistral, or human.

02 / Mode

Specific model

For selected rounds, narrow the answer to the exact model version.

03 / Mode

Prompt family

Identify the instruction style behind the artifact.

04 / Mode

Human or model

Separate human-authored work from generated output.

Artifact coverage

Built for mixed outputs

ProseCodeUIImageAudio

Spoiler-safe by default

Questions never ship hidden model or prompt metadata before a guess is committed.

First answers count

Leaderboards use first-answer scoring, while revised guesses stay useful for learning.

Admin-curated sets

Curate, import, generate, review, and publish artifacts with source and license metadata.