“Vibe coding” — using AI models to help write code — has become part of everyday development for a lot of teams. It can be a huge time-saver, but it can also lead to over-trusting AI-generated code, which creates room for security vulnerabilities to be introduced.
Intruder’s experience serves as a real-world case study in how AI-generated code can impact security. Here’s what happened and what other organizations should watch for.
When We Let AI Help Build a Honeypot
To deliver our Rapid Response service, we set up honeypots designed to collect early-stage exploitation attempts. For one of them, we couldn’t find an open-source option that did exactly what we wanted, so we did what plenty of teams do these days: we used AI to help draft a proof-of-concept.
It was deployed as intentionally vulnerable infrastructure in an isolated environment, but we still gave the code a quick sanity check before rolling it out.
A few weeks later, something odd started showing up in the logs. Files that should have been stored under attacker IP addresses were appearing with payload strings instead, which made it clear that user input was ending up somewhere we didn’t intend.
The Vulnerability We Didn’t See Coming
A closer inspection of the code showed what was going on: the AI had added logic to pull client-supplied IP headers and treat them as the visitor’s IP.
This would only be safe if the headers come from a proxy you control; otherwise they’re effectively under the client’s control.
This means the site visitor can easily spoof their IP address or use the header to inject payloads, which is a vulnerability we often find in penetration tests.
In our case, the attacker had simply placed their payload into the header, which explained the unusual directory names. The impact here was low and there was no sign of a full exploit chain, but it did give the attacker some influence over how the program behaved.
It could have been much worse: if we had been using the IP address in another manner, the same mistake could have easily led to Local File Disclosure or Server-Side Request Forgery.
The threat environment is intensifying and attackers are moving faster with AI.
Built on insights from 3,000+ organizations, Intruder’s Exposure Management Index reveals how defenders are adapting. Get the full analysis and benchmark your team’s time-to-fix.
Why SAST Missed It
We ran Semgrep OSS and Gosec on the code. Neither flagged the vulnerability, although Semgrep did report a few unrelated improvements. That’s not a failure of those tools — it’s a limitation of static analysis.
Detecting this particular flaw requires contextual understanding that the client-supplied IP headers were being used without validation, and that no trust boundary was enforced.
It’s the kind of nuance that’s obvious to a human pentester, but easily missed when reviewers place a little too much confidence in AI-generated code.
AI Automation Complacency
There’s a well-documented idea from aviation that supervising automation takes more cognitive effort than performing the task manually. The same effect seemed to show up here.
Because the code wasn’t ours in the strict sense — we didn’t write the lines ourselves — the mental model of how it worked wasn’t as strong, and review suffered.
The comparison to aviation ends there, though. Autopilot systems have decades of safety engineering behind them, whereas AI-generated code does not. There isn’t yet an established safety margin to fall back on.
This Wasn’t an Isolated Case
This wasn’t the only case where AI confidently produced insecure results. We used the Gemini reasoning model to help generate custom IAM roles for AWS, which turned out to be vulnerable to privilege escalation. Even after we pointed out the issue, the model politely agreed and then produced another vulnerable role.
It took four rounds of iteration to arrive at a safe configuration. At no point did the model independently recognize the security problem – it required human steering the entire way.
Experienced engineers will usually catch these issues. But AI-assisted development tools are making it easier for people without security backgrounds to produce code, and recent research has already found thousands of vulnerabilities introduced by such platforms.
But as we’ve shown, even experienced developers and security professionals can overlook flaws when the code comes from an AI model that looks confident and behaves correctly at first glance. And for end-users, there’s no way to tell whether the software they rely on contains AI-generated code, which puts the responsibility firmly on the organizations shipping the code.
Takeaways for Teams Using AI
At a minimum, we don’t recommend letting non-developers or non-security staff rely on AI to write code.
And if your organization does allow experts to use these tools, it’s worth revisiting your code review process and CI/CD detection capabilities to make sure this new class of issues doesn’t slip through.
We expect AI-introduced vulnerabilities to become more common over time.
Few organizations will openly admit when an issue came from their use of AI, so the scale of the problem is probably larger than what’s reported. This won’t be the last example — and we doubt it’s an isolated one.
Book a demo to see how Intruder uncovers exposures before they become breaches.
Author
Sam Pizzey is a Security Engineer at Intruder. Previously a pentester a little too obsessed with reverse engineering, currently focused on ways to detect application vulnerabilities remotely at scale.
Sponsored and written by Intruder.





