Most AI never reaches production. We're the team you call after the pilot.

Cooli is a small consultancy that designs, builds, evaluates, and operates AI systems for teams that need them to actually work. The half we focus on is the half nobody else wants — the integration, the evals, the observability, the long tail of failures that quietly take a system down at 3am. It's the part that decides whether your project still runs six months from now.

Why we exist

The pattern's familiar. A polished demo, a memo to leadership, a budget commitment. Then six months pass and there's nothing in production. The post-mortems all name the same culprits — nobody owned the integration, nobody built the evals, nobody watched what shipped, nobody was on the page when something silently broke.

So Cooli exists to take that half. We start where most consultancies hand off — at the boring, hard, unglamorous parts that decide whether a system stays up. If we're not confident we can ship it, we say so up front.

How we work

Ship in weeks, not quarters

Fixed scope, fixed window. Working code by week one — every week, something to look at.

Evals before opinions

We carve an eval set out of your real data early. Promotion to production gates on the numbers, not the demo.

Boring infra is the job

Auth, secrets, observability, rate limits, runbooks. AI features die without it; we won't ship without it.

You own the result

No black boxes. We build with your stack, document for your team, and hand off cleanly when it's time.

What we don't do

Strategy decks without code attached. Discovery workshops to fill calendar weeks. Building on stacks we can't operate. Vague success criteria. Anything where the deliverable is a recommendation that someone else has to implement. We'd rather decline a project than ship one we can't stand behind.

The Lab

Public experiments in autonomous-AI authorship

We run two GitHub-hosted sandboxes — Sprout and Mulch — where autonomous Claude agents build real software in the open. The validation logic, refusal heuristics, and rate-limit rules are all public. Useful for anyone reasoning about what production AI authorship could look like.

Sprout — human-authored intent, AI build Mulch — bot-only contribution zone