
This is the first in a series of articles where the Lean Enterprise Institute and I will explore the unfolding intersection of Lean and AI in the years ahead. Each month, we'll examine different facets of this convergence—the promising possibilities, the practical applications, and yes, the cautionary tales that come with any transformative technology. To be fully transparent, I began this journey as a deep skeptic. After two years of experimentation, learning, and sometimes surprising discoveries, I've largely flipped my position. I now believe AI can be a significant net positive for Lean practitioners, though certainly not in every dimension and not without careful consideration. In this inaugural article, let me share what changed my mind.
When I first encountered Large Language Models through ChatGPT-3, my reaction was dismissive: "cute but ineffective." The technology felt like a parlor trick—impressive on the surface but fundamentally unsuited for the rigorous demands of reasoning or problem-solving.
My disappointment centered on what I came to call the "70-80% problem." The AI would generate responses that seemed reasonable at first glance, but closer examination revealed critical flaws. It hallucinated facts with complete confidence, offering made-up statistics and non-existent case studies as if they were gospel truth. Its responses were frustratingly generic, providing the kind of surface-level analysis you might get from someone who had read about a topic like Lean in a business magazine but never lived it on the shop floor. Most troubling was its overconfidence—it never acknowledged uncertainty or the limits of its knowledge, presenting shallow insights with the same authoritative tone as genuine expertise. For someone serious about root cause analysis and systematic problem-solving, this combination was a non-starter. My initial verdict was clear: interesting technology perhaps, but not ready for real work.
Despite my doubts, I was intrigued by the potential of Large Language Models and how they might augment human thinking in areas like problem definition, analysis, and reflection. I threw myself into learning mode—devouring books, online courses from Stanford University, academic papers, YouTube videos, and technical tutorials. Topics like prompt engineering, Retrieval Augmented Generation (RAG), fine-tuning, reinforcement learning, transformer architecture, and AI coding began to dominate my daily news feed. The more I studied, the more I realized the potential scope was far larger than I'd initially grasped. I began to understand some of the "hype" flooding the news cycle, even if I remained skeptical about near-term practical applications for Lean work.
About a year ago, I decided to stop theorizing and start building. I needed to test things myself to see what might actually be possible. A timely message from a retired scientist at one of our national labs—where I've contracted for many years—added urgency to this exploration. He was a big fan of my book “4 Types of Thinking” and urged me to investigate how AI could enhance basic problem-solving and skills development, seeing potential I hadn't yet recognized. I began creating small prototypes and running experiments. Each subsequent test build left me more intrigued and less doubtful than before. The gap between what I expected and what I discovered was widening, and not in the direction I'd anticipated.
About a year ago, I was becoming proficient enough with coding (with massive AI assistance, I should note) to start controlling model outputs in more sophisticated ways. While 99% of people interact with tools like ChatGPT through the dialogue box—essentially using it as an improved Google Search—I discovered something far more powerful. By writing code to call models via API keys, I could declare specific behaviors, have the AI consult databases of "preferred knowledge," and shape its reasoning process. The improvement in response quality was dramatic. Instead of generic answers, I was getting targeted, contextualized analysis that actually reinforced Lean principles.
Then came the convergence that changed everything. One year ago just as I was getting proficient with these techniques, the big three LLM providers—OpenAI, Anthropic, and Google—all released significant capability upgrades almost simultaneously. GPT-4, Claude 3.5, and Gemini 2.0 represented not just incremental improvements but fundamental leaps in reasoning ability. Recent models have improved incrementally further as well. These two forces—my newfound ability to control the models and their dramatically enhanced capabilities—created an unexpected multiplier effect.
I decided to test this convergence by building something substantial: "RootCoach," a 5 Why thinking aid for problem-solving. What emerged shocked me and people I shared the example with in meetings. The tool was producing coaching assistance that was better than 90% of the coaching I'd observed in the real world. The quality was remarkable, the speed was instantaneous, and the cost was negligible. For the first time, I wasn't comparing AI to some theoretical ideal—I was comparing it to actual human performance in the field. And AI was winning on multiple fronts.
My "aha" moment came with RootCoach and the 5 Why staircase coaching feedback tool I'd built. But the implications went far beyond that single application. If the model could effectively coach and critique root cause thinking, I realized, it could handle the other steps of problem-solving as well. This sparked a six-month phase of intensive development, building apps that attempted to mimic each step of systematic problem-solving and generate A3-type reports.
The early versions were, frankly, unimpressive. But each iteration got noticeably better. Here I need to give a huge shout-out to Fabrice Bernhard, co-founder of Theodo, author of the Lean Tech Manifesto, and friend of the Lean Enterprise Institute. He patiently critiqued my early errant attempts, suggesting improvements to user interfaces and ways to handle charts and images that transformed clunky prototypes into usable tools.
Let me be clear about my approach: I never asked "can the models think for me and do my homework?" That's the wrong question entirely. Instead, I consistently asked "can the models critique my thinking?" and "would I value this advice?" This distinction is crucial. I wasn't trying to replace human judgment—I was trying to augment it with consistent, high-quality coaching feedback.
What finally made everything click was a combination of converging improvements: better models with enhanced reasoning capabilities, my improved ability to control outputs through code, the compound effect of multiple incremental improvements, and most importantly, applying Lean thinking to the problem itself. I wasn't just throwing technology at the issue; I was systematically working to get the AI to focus and respond the way an experienced, helpful coach would in the real world. When I finally arrived at a solution that "worked" to my satisfaction level, I knew something fundamental had shifted—not just in the technology, but in what was now possible for Lean practitioners everywhere.
Leaving my skeptic stance behind, I started to reflect on what this could mean for Lean thinking in general, and problem-solving specifically. The implications were staggering.
There is a profound shortage of high-quality coaches in the Lean world. I was fortunate to work in Japan for Toyota Motor Company, surrounded by excellent coaches—but even at Toyota, I often had to wait until day's end or a convenient break in the work to get coaching feedback. The best coaching was always in short supply, always constrained, always delayed even in Toyota. This is the reality everywhere: good coaches are overwhelmed, great coaches are rare, and most practitioners get little to no coaching at all.
One unique feature of AI models completely changes this equation: they're available 24/7 and can scale in ways humans simply cannot. With an AI model coded the right way, I was never more than 30 seconds away from solid coaching advice. No scheduling, no waiting, no hesitation about "bothering" a busy expert. And I could even give it the personality I desired.
During a phone call with LEI's management staff, I made what started as a joke: "I could put John Shook's coaching brain for problem-solving into an LLM." But even as I said it, I realized it wasn't really a joke anymore. By building a database with LEI's materials and creating a specialized knowledge base, I could essentially replicate effective problem-solving coaching at scale. The experiment isn't complete—it's not yet at the 100% standard I'm aiming for—but it's already delivering higher quality feedback than most practitioners are likely to receive in their organizations.
Even if stuck at this 90% level, the democratization potential became clear. This could eliminate one of Lean's most persistent obstacles: the coaching bottleneck. It wouldn't solve every challenge in problem-solving, but it could remove a barrier I'd previously seen no way around. For the first time, every employee, engineer, and manager could have access to consistent, high-quality coaching. This wasn't just an improvement—it could fundamentally change how organizations build problem-solving capability.
This is what ultimately flipped my opinion over 24 months from skeptic to believer. But let me be clear: I'm not hyping AI models the way the tech sector often does. I don't want AI to replace workers or think for people. I am not interested in “$1 Billion Unicorn” companies that enamor Silicon Valley and venture capital funds. The basic formula I've started repeating almost daily is simple: "Humans + AI > Problems."
The primary value I see for Lean thinking—and problem-solving in particular—is that humans augmented by AI can think faster and better about their current work and associated challenges. This could help shrink the lead time to solve problems, increase the quality of outputs, all at extremely low cost, while still honoring "Respect for People" that stands as a fundamental pillar of Lean thinking.
I could be totally naive or wrong in the long run. Predicting the future has a notoriously long list of impressive failures throughout recorded history. But I'm optimistic that AI models will help us remove one persistent barrier to improvement. I call it the "in-between coach"—available in all those critical gaps: after training but before a review session; after an initial team review but before the next trip to the gemba; after a team meeting but before an executive review; or just overnight while preparing for the next stage of analysis. We could all have a coach at our fingertips providing valuable feedback when the actual coach or manager isn't available.
I don't see AI as a replacement for training or workers, but as a powerful lever for improvement. What if we could all become 10X problem solvers in terms of capability? Or even 2X? More problems solved means less waste and more capable people. More capable people can develop new value-added products, processes, and services. The virtuous cycle accelerates.
Humans + AI > Problems.
This glimpse into a possible future has given me genuine reasons for optimism and some for concern. Technology can always be misused like a sharp knife. However in the very near term every employee could have access to world-class coaching. We could achieve standardization and scaling that human coaching alone can't deliver. We could remove one of the biggest barriers to Lean implementation. And rather than replacing human coaches, we could amplify their impact—freeing them to focus on the complex, nuanced work that requires human judgment while AI handles the foundational coaching that everyone needs.
The hallmarks of Lean thinking remain continuous improvement and respect for people. AI models will continue to grow in capability, and while I fully expect a tech crash on par with the dot-com bubble of 2000, the tools themselves are here to stay. I have no idea if or when they'll achieve the poorly defined goal of AGI—I'll leave that speculation to the experts. What I do know is that these models have already achieved a level of capability I didn't believe possible 24 months ago.
I'm going to continue building apps for problem-solving and adjacent fields in the years to come, if only as a hobby. In Toyota Japan, we had the phrase that TPS is about making things—"monozukuri." In the past, I made engines, production processes, and management systems. Now I enjoy making applications that seek to utilize AI in human-augmenting ways. The medium has changed, but the spirit of making things better remains the same.
If this topic interests you, I invite you to follow the LEI newsletter series we'll be publishing going forward. We'll explore both the potential and the pitfalls, cutting through hype to examine how AI will likely affect Lean practice in the real world. On September 17th, LEI is hosting a webinar on Lean and AI where my friend John Shook and I will discuss these themes and offer our preliminary thoughts on what this convergence might mean. John and I go back nearly 40 years to Toyota City in Japan, and we'll bring both the long incremental view and the shorter-term perspective on this sudden shift in what's possible.
If this resonates with you—whether you're skeptical, curious, or somewhere in between—please join us for the webinar and share your opinions and reactions. The future of Lean and AI will be shaped by practitioners like you, testing, learning, and finding what truly adds value.
Hope to see you there.