The unnerving thing about AI-generated code is not that it looks strange.
It is that it often looks completely normal.
The function is named well. The error handling is plausible. The test passes. The comments are confident. The implementation borrows the shape of code that has worked before. A reviewer can read it quickly and feel the small relief of familiarity.
That relief is the trap.
AI code security is hard because the code can be syntactically clean and locally reasonable while being wrong for the system it has entered.
What is AI code security?
AI code security means managing the security risks created when AI systems help write, review, modify, or reason about software.
The phrase has two common meanings.
First, it can mean securing AI-generated code: reviewing the code that models produce before it reaches production.
Second, it can mean using AI for code security: applying models to code review, vulnerability triage, remediation, and AppSec workflows.
Both matter. This post focuses on the first problem: generated code that looks safe enough to merge but lacks the organizational context that makes software secure.
Plausibility is the vulnerability
Human reviewers are pattern-matchers. We skim for things that look off: raw SQL, missing auth middleware, suspicious dependencies, secrets in logs, strange parsing, obvious injection.
AI-generated code can avoid many of those visual alarms while still making a bad product assumption.
It might check that a user is logged in but not that the user owns the object. It might use the right validation library with the wrong schema. It might add a convenient export endpoint without preserving the audit trail. It might follow a public example instead of the local convention that carries the real security control.
The danger is not cartoonishly bad code. The danger is code that looks reviewable enough to pass.
A tiny plausible bug
Imagine a generated handler for downloading a project report.
app.get('/projects/:projectId/report', requireLogin, async (req, res) => {
const report = await reports.buildProjectReport(req.params.projectId);
if (!await membership.canView(req.user.id, req.params.projectId)) {
return res.status(403).send({ error: 'Forbidden' });
}
return res.send(report);
});At a glance, the code has login, authorization, and a 403. It looks responsible.
The bug is order. The report is built before the membership check. If report generation reads sensitive data, writes an audit event, warms a cache, calls another service, or logs details, the sensitive work has already happened.
A scanner may or may not catch that. A reviewer comparing this code to local convention might notice that every other report endpoint checks membership before touching report data.
The secure version is not clever. It is boring:
app.get('/projects/:projectId/report', requireLogin, async (req, res) => {
if (!await membership.canView(req.user.id, req.params.projectId)) {
return res.status(403).send({ error: 'Forbidden' });
}
const report = await reports.buildProjectReport(req.params.projectId);
return res.send(report);
});The test that matters is not “owner can download report.” The test that matters is “a user from another project never triggers report generation.”
This is why AI code security needs context, not just pattern matching.
The risks that hide in plain sight
Missing product context
AI does not know which customers have access to which records unless the code, prompt, or retrieval context tells it. It does not know that a certain field is sensitive because of a contract. It does not know that a support role can view metadata but not content. It does not know that the billing workflow has a special audit requirement because of an incident two years ago.
Those facts live in the organization as much as in the repository.
When generated code touches business logic, reviewers should assume the model had less context than the production system requires.
Local convention drift
Security teams love policies. Codebases run on conventions.
Use this helper. Read through this layer. Redact with this logger. Emit this audit event. Put external calls behind this wrapper. Do not access that table directly. Do not return raw ORM objects from customer-facing APIs.
AI-generated code can drift from those conventions while still looking idiomatic. That drift is a security issue because the convention may be where the control lives.
Confident dependency choices
An AI assistant can recommend a package with the confidence of someone who will not be on call when it breaks.
Maybe the package is unnecessary. Maybe it is abandoned. Maybe it solves a problem your platform already solved. Maybe it pulls in a transitive dependency tree nobody reviewed. Maybe the example configuration is fine for a tutorial and bad for your production environment.
AI-generated code does not get a supply chain discount.
Tests that prove the wrong thing
Generated tests often prove that the generated implementation behaves as generated. That is not the same as proving the security property.
A test that says an owner can export a project is useful. A test that says a user cannot export someone else’s project is security evidence. A test that says malformed input fails safely is security evidence. A test that says sensitive fields do not appear in logs or responses is security evidence.
If the tests only walk the happy path, they may be theater.
Prompt and log exposure
AI code security is not only about the code that lands in the repository. It is also about the workflow used to produce and debug that code.
Developers may paste stack traces, schemas, customer examples, internal code, or production logs into AI tools. Generated features may pass user data into prompts or store model outputs in places nobody threat-modeled.
The prompt is now a data path. Treat it like one.
Scanning is necessary and insufficient
Static analysis, dependency scanning, secrets detection, and secure coding rules still matter. OWASP’s Code Review Guide and Top 10 remain useful baselines.
But AI-generated code pressures the space between known patterns and product-specific judgment. A scanner may not know that the new endpoint should be tenant-scoped. It may not know that a particular log field violates a customer promise. It may not know that generated code bypassed a service boundary that exists for audit reasons.
The answer is not to throw away scanners. The answer is to add review that understands change, ownership, and context.
A review habit for AI-generated code
When reviewing AI-generated code, ask a different set of questions.
What did the model probably not know?
Which local convention did this change bypass or reimplement?
What sensitive data moved, and where did it land?
What new dependency, permission, route, queue, prompt, or external call appeared?
Which security property should the tests prove?
Who owns the risk if this behaves differently in production than the author expects?
That last question is uncomfortable on purpose. AI assistance does not remove human ownership. It makes ownership more important.
Questions teams ask
Is AI-generated code less secure than human-written code?
Not automatically. The risk is that generated code can be produced faster, accepted with less understanding, and merged before someone checks product-specific security assumptions.
Can SAST catch AI-generated code vulnerabilities?
Sometimes. SAST is still valuable for known patterns. It is weaker when the vulnerability depends on business logic, local conventions, authorization order, tenant context, or data handling promises.
What is the first control to add?
Review AI-generated pull requests that touch sensitive paths before merge. Start with authorization, customer data, dependencies, infrastructure, prompts, logs, and external integrations.
How should AppSec talk about this without sounding anti-AI?
Frame it as ownership and evidence. Developers can use AI. The code still needs proof that it follows the system’s security model.
The useful posture
Do not treat AI-generated code as magical. Do not treat it as radioactive. Treat it as code written by a very fast contributor with no memory of your last incident review and no intuition for your customers.
That contributor can be useful. It can also be dangerously convincing.
If your team is trying to bring this review into the pull request instead of a late security queue, request an Enclave demo.
AI-generated code does not need to look dangerous to be dangerous. That is exactly the problem.
