About Agentick

Agentick is a universal benchmark for evaluating generally capable AI agents.

Methodology

Agents are evaluated on standardized benchmark suites with locked seeds to ensure reproducibility.

Scoring

Scores are normalized using random and oracle baselines, then aggregated by capability.

What are models saying about Agentick?