You are in Guest Mode

Welcome! You can view the scoring interface, but adding, deleting, and scoring benchmarks requires an authorized account. Please register first (if you are an authorized person).

Benchmark Scoring

Manually and automatically score LLM performance on generated benchmarks.

Scoring for: llama3.2

Scoring for: gemma3

Scoring for: deepseek-r1

Scoring for: mistral small

Scoring for: qwen2.5

Scoring for: claude sonnet3.5

Scoring for: phi4

Overall Score Comparison

Overall Time Comparison