Goodeye LabsGoodeye Labs
Try Truesight

Insights from Goodeye Labs

Expert insights on LLM evaluation and AI quality assessment

Article

The AI Mirror Effect: Why Your AI Evaluations Need Domain Experts

Randy Olson, PhD·January 26, 2026·5 min read

The Anthropic Economic Index shows that the quality of what you put into AI almost perfectly predicts the quality of what you get out. If AI mirrors expertise, so must your AI evaluations.

Read article →
PresentationPortland AI Engineers

Beyond the Demo: Building Reliable AI with LLM Evaluations

Randy Olson, PhD·January 14, 2026

Learn how to build reliable AI systems using LLM evaluations. This talk covers why traditional testing breaks with stochastic systems, how generic LLM-as-Judge approaches miss domain nuance, and practical steps to implement contextual evaluations that actually work.

View presentation →
Article

2025 Year in Review for LLM Evaluation: When the Scorecard Broke

Randy Olson, PhD·December 28, 2025·15 min read

In 2025, we discovered we'd been measuring memorization, not intelligence. Models scored 80-90% on static benchmarks but dropped to 60-70% on truly novel problems. This year exposed the fundamental crisis in AI evaluation, and taught us what to build instead.

Read article →
© 2026 Goodeye Labs·Truesight·Pricing·Privacy Policy·Insights
Book a Demo·hello@goodeyelabs.com·
··