Skills Over Scripts

I’ve started to become pretty skeptical of heavily scripted approaches to AI-assisted software development. The more I work with modern models, the less convinced I am that the future is a giant chain of tightly controlled agents passing work between each other like some kind of assembly line.

Right now, I’m betting more on skills and on the model’s ability to recognize when those skills should be applied.

By “skills,” I mean reusable bits of organizational knowledge. Things like how we modify UI components, how we handle caching safely, how we validate accessibility, how we structure tests, how we deploy, or even how we update Jira tickets. They’re less like commands and more like guardrails and guidance that the model can pull in when it realizes they’re relevant.

That feels very different from the more deterministic style of AI development that’s becoming popular right now. In those systems, development tends to move through a predefined set of agents with very specific inputs and outputs. There’s definitely value there, especially for consistency, but I also think there’s a tradeoff. The more tightly you constrain the model, the less room there is for it to actually think.

And honestly, I approach management in almost the exact same way. I’ve never liked the idea that good leadership means prescribing every step for another engineer. Most strong engineers don’t need someone hovering over them telling them exactly how to solve a problem. They need context, boundaries, guidance, and room to operate. Sometimes I’ll suggest a direction or explain tradeoffs, but creativity usually comes from giving people enough freedom to think for themselves. AI models feel surprisingly similar. If you over-script everything, you can end up with something that looks safe and predictable on paper but loses a lot of the flexibility that makes modern reasoning models useful in the first place. At some point the model stops behaving like a thinking system and starts behaving like a very expensive bash script.

Guideposts, Not Rails

That doesn’t mean this approach is perfect. A weak model can still make bad choices, and even strong models can wander into poor architectural decisions if the surrounding guidance isn’t mature enough yet. Sometimes you don’t even notice the mistake until days later when someone realizes a questionable pattern has started spreading through the codebase. That’s why review and validation still matter so much.

What’s interesting, though, is that you can shape a model over time without turning it into a fully deterministic system. You don’t have to define every move it makes. You just need enough guideposts to stop it from going wildly off course while still allowing it to reason and adapt.

At least from what I’ve seen, those guideposts usually fall into three buckets. The first is the model itself. Modern reasoning models are already pretty good at stepping back and checking their own work. They’ll reconsider earlier decisions, compare approaches, and sometimes catch mistakes before you ever see the output. That alone has changed the equation quite a bit over the last year or so.

The second bucket is skills and agent files. These are the reusable instructions that teach the model how your organization works. A caching skill might explain how to safely work with ISR or server components. A deployment skill might define validation steps before release. A testing skill might describe expectations around accessibility or regression coverage. Over time, these skills start acting less like isolated instructions and more like institutional memory that the AI can pull from whenever it encounters a familiar situation.

The third bucket is hard validation. Linting. Type checking. Automated tests. Performance checks. Deployment validation. The stuff that doesn’t care how confident the model sounds. Either the output passes or it doesn’t.

Early on, it’s pretty normal to realize the AI made a questionable architectural choice after the code has already landed. When that happens, we usually add a new skill, refine an agent file, or tighten up a validation step so the same mistake becomes less likely in the future. Over time, these guideposts start piling up, and the repository slowly becomes more than just source code. It starts becoming a system that teaches the AI how the organization builds software.

Somewhere In The Middle

The interesting thing is that there’s really a whole spectrum here. On one side, you have heavily procedural systems where everything is rigid and predefined. Specific agents. Specific workflows. Specific execution paths. On the other side, you basically just have a prompt and a codebase with no structure, no standards, and no guidance at all.

I doubt either extreme is the right answer.

My guess is that the sweet spot depends on the models you’re using, the maturity of the codebase, the risk tolerance of the business, and honestly just the personality of the engineering organization itself. But at least right now, I’m increasingly convinced that the future is less about forcing AI through rigid pipelines and more about building environments that quietly shape how the AI operates.

Not controlling every decision. Just helping it make better ones.

© Karim Shehadeh
  • X
  • BlueSky
  • RSS
  • LinkedIn
  • StackOverflow
  • Github