Skip to main content

The Hour I Kept Losing - Starting from scratch with AI to build secure tools that work

  • May 14, 2026
  • 0 replies
  • 14 views
rob
Employee
Forum|alt.badge.img

Investigating a path search API issue for one of my customers routinely took me about an hour per cycle — constructing the complex call, running it, and parsing the multi-path response. In a Customer Success role full of Slack pings, meetings, and other customer calls, those investigations were constantly interrupted, and because I had no way to save my place mid-investigation, every interruption meant starting over from scratch while a real customer waited for an answer.

The reasons I needed to find a better way came down to three things:

  • A single path search investigation was roughly 60 minutes of careful, focused work
  • Interruptions were a constant of the job, not the exception
  • With no way to save state, each interruption reset the work to zero, multiplying the time-to-resolution for the customer

 

It All Starts with Chess

 

I'm really into chess. I'd been trying to get better, but I kept making the same kinds of mistakes. I couldn't pin down what they were just by playing. I wanted a tool that could analyze my chess.com games and surface the patterns for me.

Last year I built that with ChatGPT, but very much in the pilot seat. I'd say, “I want to write this function. How do I do it?” and paste snippets back and forth. I was outsourcing specific pieces of the work, but I was still the one driving the code. It was rudimentary, but it worked.

Earlier this year I tried it again. This time with Claude, and this time I flipped the seats. Instead of staying at the wheel, I described what I wanted and got out of the way.

The very first time I ran the file Claude generated, it worked.

That floored me.

Up until then, I'd mostly thought of AI as an assistant for isolated coding tasks. It was useful, but still something I had to tightly supervise. The chess project was the first time I saw a model take a loosely defined problem and produce a working system with surprisingly little intervention from me.

And once I saw that happen, I started looking differently at the repetitive work I was doing in my actual job.

The path search investigations had a lot in common with the chess project:

  • A structured data source
  • Repetitive manual analysis
  • A predictable workflow I was executing over and over again
  • And a problem frustrating enough that I genuinely wanted it gone

That was the moment the path search work stopped looking like “just part of the job” and started looking automatable.

Three things from the chess experiment carried over directly into the path search API project:

  • The first attempt can be surprisingly good. These tools can produce something genuinely useful very early in a project.
  • Handing over the wheel beats micromanaging. Staying in control of every function was what slowed me down with ChatGPT; letting Claude design the system, and reviewing what it produced, was both faster and more effective.
  • Personal frustration is a good place to start. The best automation candidates are often the repetitive problems annoying you enough to address and you already understand every edge case by heart.

 

Where It Fell Short — And What I Had to Change

 

What worked surprisingly well on smaller projects started behaving differently as complexity increased. In both the chess analyzer and the path search toolkit, growth introduced new friction: inconsistencies became more common, bugs became harder to isolate, and keeping multiple pieces working together required a more deliberate process.

The shortcomings showed up in three ways.

  • The first was context. As the toolkit grew, Claude, like any model operating inside a finite context window , started forgetting things it had just done. It would propose a fix, I'd implement it, and the original bug would still be there. Asking it to diagnose its own prior work often became circular.
  • The second was cross-app memory, or the lack of it. By the time I'd built six tools, I learned the hard way that a fix in one app didn't automatically carry over to the next. Unless I explicitly pointed Claude at the working implementation elsewhere, it would often reinvent the solution from scratch.
  • The third was trust. The toolkit asks users to enter API credentials, and I wasn't willing to take any single model's word for “yes, this code handles them safely.” One reviewer is not enough when the cost of being wrong is somebody else's credentials sitting on disk.

The adaptations came down to three changes:

  • Pair the models. I started using Claude to write and ChatGPT to review. Giving the models each other's work from fresh contexts surfaced issues neither consistently caught alone. Claude would regularly read ChatGPT's critique and effectively respond, “Yes, that's a very good catch.”
  • Make security a written contract, not an assumption. Before either AI touched a single feature in the cleanup pass, I wrote the rules down in plain English: credentials only sent to the configured Forward URL, never written to disk, never transmitted anywhere else. Then I asked Claude to audit against that contract, and asked ChatGPT to perform the same audit independently. Only when both came back clean did I move on.
  • Maintain the process itself. The prompts that worked when the project was small stopped working as it scaled, and so did the assumption that Claude would remember what had already been solved elsewhere. I had to start explicitly pointing it at working implementations: “Look at how this was solved in the other tool, and do that here.” I also had to step back periodically and ask whether I was trying to do too much in a single turn, or whether I'd quietly broken a workflow that had been working two features earlier.

In the end, it’s not just the code that needs maintenance.

 

What Came Out of It

 

The toolkit didn't start as a toolkit. It started as a much smaller idea:

“Let me save a query so I can pick up where I left off.”

From there, it grew incrementally because each improvement solved a real operational problem:

  • Saving and resuming investigations
  • Reusing common query patterns
  • Reducing repetitive manual parsing
  • Moving between investigations without losing context

The biggest change wasn't that the work disappeared. It was that I could step away from an investigation and return to it without losing momentum.

What had previously required uninterrupted hour-long blocks became manageable in smaller chunks, which dramatically reduced the operational friction of supporting customers during a busy day.

 

If You're Thinking About Getting Started

 

  • Start with the smallest possible version of what you want. Don’t build a platform, not a toolkit, just address the one friction point that keeps eating your day.
  • Catch yourself when you start mentally planning the implementation. That habit is probably the single biggest tax on your time now. A lot of the boilerplate simply isn't yours to write anymore.
  • If the project touches anything sensitive, write your security requirements in plain English before you write any code, and have multiple systems audit against them independently. One reviewer isn't enough when the consequences belong to someone else.
  • And when you get stuck,  because you will, don't just ask the model for another answer. Ask whether the problem is actually in the way you're framing the task.
  • In my experience, learning how to manage the collaboration matters more than learning how to generate the code.

 

 

The Forward path search toolkit is on GitHub if you want to see what came out of all this, or fork it.

https://github.com/donotquestionauthority/fwd-path-search-toolkit.git

If you build something with the same approach, I'd genuinely love to hear about it. Please share any questions and successes in the comments.