Entrepreneur · CTO · Principal AI architect · Consulting

Will Frasier

Principal AI architect and novelist. Entrepreneur and CTO: consulting and multi-agent architecture for teams shipping AI to production.

Consulting openings · Q3 2026

Will Frasier — Multi-agent engineering

The tech is easy. Knowing what good is the real work.

Most AI systems do not fail because the models are wrong. They fail because nobody decided what good looks like, and the system grew faster than anyone's ability to evaluate it.

Many of todays multi-agent systems, are distributed systems wearing new clothes.

I have worn many hats in my twenty years at Microsoft: Principal AI Architect, Data Scientist, Front-End Developer, and Engineering Lead, Quality Engineer, and more. I have seen many of the patterns that block small teams from shipping high quality and reliable AI systems.

But they're easier to solve than you might think.

My last year at Microsoft was working on a project that used machine learning to optimize data centers training runs of foundational models. The same insights apply to multi-agent systems. The hard part is not the agents, its deciding what to do with them.

  • Context management. This is the new frontier. Filling up a context window with junk is a great way to burn tokens and hallucinate. The more fine-tuned your context, the better your system performs.

  • Observability that catches what matters. Latency and error rates are the easy half. The hard half is measuring quality of output, drift over time, and cost per outcome. If your dashboards only watch the things a traditional web service watches, you're flying blind on the things that actually break AI systems.

  • Graduated rollout, measurably better or not at all. New model, new prompt, new edge function. None of it ships to 100% until shadow traffic, A/B, and your eval suite all say it's an improvement. "Looks better" is not a deploy criterion. The point isn't caution. It's that confidence comes from data, not from the person pushing the change.

How I work

Three phases. Customer in the room for all of them.

01Deeply understand

The actual problem, not the one in the slide deck. A few sessions, a lot of questions, and a written read of what I think you are trying to solve. You tell me where I am wrong. We do not move until we agree.

02Innovate and review

A plan, reviewed with you before it touches code. A tight written proposal with the architecture choices, the tradeoffs, and the bets I am making. You push back. We adjust. Then we build.

03Deliver

Incremental, fast, communicated. Small wins that stack into the end product. Weekly demos against real data. No surprises at the end because there is no "end" — just the next working slice.

Selected work

Story Stream

A multi-agent platform for novelists.

A platform that hundreds of writers now trust with their manuscripts. Specialist agents own pacing, theme, character, and cultural sensitivity. An orchestrator synthesizes their advice into feedback that holds up under a writer's scrutiny, with a real-time layer giving authors guidance as they write.

Founded May 2025. Took the product from concept to launch to paid users as sole founder, owning architecture, product, go-to-market, and customer feedback loop.

Honest Tally · Founder

Tracing political money flow through public records.

An AI and data science system that traces causal chains across messy public datasets: company → parent company → PAC → politician → votes on specific bills. Users see which policies the companies they buy from are actually funding.

The bar is higher than Story Stream's. A wrong story-craft suggestion is just unhelpful. A wrong claim about who funded what is libel-adjacent. That changes how you design evals: every claim has to carry provenance back to the underlying records, and the system has to know when it does not know, and say so.

Edge · Microsoft

Inferring user intent from browser telemetry.

A behavioral analytics system that worked out what users were actually doing as they used the browser, so the product could help them better. Telemetry signals on one side, intent classification on the other, with several iterations of pattern matchers and classifiers in between, learning which signal combinations mapped to which user tasks.

Running this at the browser's install-base scale taught the lesson AI evals teach you, just earlier: at any meaningful scale, the thing you cannot measure is the thing that kills you. Same problem as eval drift in a multi-agent system, different decade.

Autonomous Coding Workflow

Production issue to merged PR, in hours.

A system where issues detected in deployment trigger agents that diagnose, write code changes, and submit them through a senior PR review agent — a Claude skill I wrote, encoding the same standards I'd apply on a real review — before reaching a human. The human validates against a deployed preview build before merge.

The point is not speed for its own sake. It is that fast and rigorous are compatible when you build the rigor into the system.

Microsoft · 2005-2025

Principal-level work across four disciplines and four orgs.

Twenty years shipping production systems at consumer scale. Principal AI Architect, Data Scientist, Front-End Developer, and Engineering Lead, with hands-on delivery across Azure, Edge, Music & TV Studios, and Visual Studio. The Edge behavioral platform alone ran across hundreds of millions of users in a top-tier privacy environment.

Microsoft Global Hackathon winner — 2021 (Real-Time Emotion Parsing) and 2023 (AI Digital Twins, later patented). Multi-agent work before "multi-agent" was the term.

Engagements

Three different shapes depending on where your build is stuck. Teams hire me when they are stuck, and stuck looks different for different teams.

Eval Sprint

2 weeks · fixed scope

Build your evals before you build anything else.

The thing most AI teams skip, and the thing that determines whether the system works. Two weeks. I leave you with a working eval harness, ground truth where it can exist, a calibration approach where it cannot, and a clear answer to the question "how do we know this got better?"

If you cannot measure the output, no architecture choice downstream will save you.

Best fit: teams getting model upgrades but not knowing if they are improvements.

Build Review

1 week · on-site or remote

Should you keep building, or start over?

I come in for a week, look at what you have, and tell you. Architecture, evals, infrastructure choices, team capacity. An honest read on whether the next six months of effort is going to land or just compound the debt.

Most teams will not get the "start over" answer. The ones who need it really need it, and most consultants will not say it out loud.

Best fit: tech leads and founders whose AI build has grown faster than their conviction in it.

Custom Build

6-12 weeks · embedded

Multi-agent systems for domains where quality is subjective.

The places where a single LLM call gives you a plausible answer that is wrong in ways you cannot catch. Intake, ground truth, orchestration, eval, deploy. The same discipline I used building Story Stream, right-sized to your domain and data.

Best fit: teams where output quality matters more than throughput, and where "looks right" is not good enough.

Available for new engagements starting July 2026

Consulting · Seattle, WA

Will Frasier

Multi-agent engineering