Tip
Chip Huyen talks about system thinking in debugging: “I was using AI to deploy an application and it kept getting a bug, so I asked AI to fix it. And it kept changing things—maybe change the environment variable, fix the code, maybe change from this function to that function, maybe change the language. It didn’t work. I read documentation myself and it turns out I’m on the wrong tier—the feature I want is not available in this tier. The issue with AI was just trying to focus on fixing things from one component versus the issue is from a different component. Understanding how different components work together and where the source of issue might come from—you need to give a holistic view of it.”
Turns out debugging AI implementations works the same way.
Your team is deploying an AI feature—RAG system for customer support. It’s been three weeks. Response quality is inconsistent, latency is terrible, and the AI keeps giving outdated answers. Your lead ML engineer has tried everything: fine-tuned the model, adjusted the prompts, increased context window, changed the embedding model, experimented with different chunking strategies. Nothing works. He’s convinced the problem is the model architecture.
Younger technical leads debug locally—they assume the problem is in the component they’re touching. They drill deeper into model parameters, experiment with different frameworks, read research papers on better embedding techniques. They’re solving the wrong problem brilliantly.
You’ve deployed enough systems over decades to think holistically. You step back and ask basic questions: “When was the last time we updated the knowledge base?” Silence. Turns out, the vector database hasn’t been refreshed in 6 weeks—completely stale data. “How are we handling the data pipeline?” Turns out, the automated refresh job failed a month ago and nobody noticed. The problem wasn’t the model, the prompts, or the embeddings. It was the data pipeline nobody was looking at.
This judgment—knowing to step back and examine the whole system instead of drilling deeper into one component—comes from watching enough implementations fail for surprising reasons. Junior engineers think linearly: “model isn’t working → tune the model harder.” You’ve learned to think systemically: “model isn’t working → check data freshness, check pipeline health, check integration points, check access permissions, check infrastructure tier limits.” You know from experience that the problem is often two components away from where everyone’s looking.
Context
Chip Huyen is an AI researcher, educator, and author of “Designing Machine Learning Systems” and “AI Engineering.” She teaches ML systems at Stanford and works with companies on AI strategy. Her debugging example illustrates a critical difference between local optimization (fixing one component) versus system thinking (understanding how components interact).
For experienced executives overseeing AI implementations, this holistic perspective is critical—you’ve watched enough complex systems fail to know the problem is rarely where the junior engineers are looking. That wisdom comes from seeing the full stack fail in surprising ways repeatedly.