Secure Code Review Checklist for AI-Generated Pull Requests

A practical secure code review checklist for AI-generated pull requests: what to inspect, what evidence to require, and when to stop the merge.

EnclaveMay 3, 2026

A lantern-lit stone checkpoint hallway represents a secure code review checklist for AI-generated pull requests

Print this mentally before you open the pull request:

AI-generated code is an untrusted dependency that arrived disguised as first-party code.

That does not mean the code is bad. It means the review should require proof.

This secure code review checklist is intentionally time-boxed. It is for the reviewer who has twenty minutes before standup, a pull request in front of them, and a vague feeling that “the AI wrote most of it” is not a security control.

The fast pass/fail map

Before the detailed review, scan for these gates.

Safe to keep reviewing: the author can explain the change, the code follows local security patterns, sensitive data paths are obvious, and tests cover at least one failure case.

Ask for changes: the PR touches authorization without negative tests, returns internal objects directly, adds a runtime dependency casually, logs or prompts sensitive data, or bypasses the usual safe path.

Escalate: the risk depends on business policy, customer promises, architecture, compliance, or a new trust boundary.

Pause the merge: nobody can explain why the generated diff is safe.

Now do the twenty-minute review.

Minute 0 to 2: name the change

Do not start with the code. Start with the claim.

Write one sentence:

This pull request lets which actor do which new thing?
Which account, tenant, repository, integration, workflow, or data set does it touch?
What would be bad if this behavior were available to the wrong person?

If nobody can answer that quickly, slow down. A pull request that cannot be explained cannot be securely reviewed.

Minute 2 to 5: check the trust boundary

Look for the point where the system stops trusting the caller.

Ask:

Is authentication required before work begins?
Is authorization checked against the right resource, not just the right role?
Does the code validate tenant, organization, workspace, repository, or project ownership?
Did a check move later in the request path?
Did the change create a background job, webhook, export, or callback that bypasses the normal request middleware?

AI-generated code often includes a check that looks right at a glance. The review is whether it is the right check in the right place.

Minute 5 to 8: follow the data

Trace the sensitive thing.

It might be a token, email address, source file, customer record, prompt, log line, report, secret, or internal identifier. Follow it from input to storage to output.

Red flags:

Sensitive data enters logs, analytics, error messages, traces, or model prompts.
A response includes fields copied from an internal object without a deliberate allowlist.
A cache key ignores tenant or permission context.
An export path uses an internal query that was never meant to be customer-facing.
A webhook sends more data than the integration needs.

If the pull request touches AI prompts or retrieval, include those in the data flow. The prompt is part of the system now.

Minute 8 to 11: compare against the house style

Security controls often live in boring consistency.

Find the closest existing feature and compare:

Same middleware?
Same permission helper?
Same data access layer?
Same logging and redaction pattern?
Same transaction boundaries?
Same error behavior?
Same tests for unauthorized access?

Generated code loves to invent a parallel path. Parallel paths are where controls go to die.

A useful reviewer sentence is: “This works, but it does not look like how this codebase normally does the safe version of this operation.”

Minute 11 to 14: interrogate dependencies and snippets

AI assistants are very good at producing code that imports one more thing.

Ask:

Did this add a runtime dependency for something the platform already provides?
Did the dependency change arrive inside a feature branch where nobody was looking for supply chain risk?
Is the package maintained, necessary, and approved for this kind of use?
Did the generated code copy an example with permissive defaults?
Is the cryptography, authentication, parsing, sandboxing, or file-handling code hand-rolled?

A new dependency is not just an implementation detail. It is a new trust relationship.

Minute 14 to 17: read the tests like an attacker

Happy-path tests prove the feature works. They do not prove the feature is safe.

Look for tests that fail when:

The user is unauthenticated.
The user is authenticated but owns the wrong resource.
The tenant, organization, or repository does not match.
Input is missing, malformed, oversized, or hostile.
A downstream service fails.
A sensitive field is present but should not be returned.

If the change modifies authorization and only tests success, the review is incomplete.

Minute 17 to 20: choose the outcome

Do not leave a vague comment and hope the next reviewer understands your concern. Pick one of four outcomes.

Approve when the change is low-risk, follows local patterns, and has tests for the relevant failure modes.

Ask for changes when you found a concrete issue the author can fix before merge: missing ownership check, unsafe output, unnecessary dependency, weak validation, bad logging.

Escalate when the code might be okay, but the decision belongs to AppSec, architecture, legal, compliance, or the service owner.

Pause when nobody can explain the generated diff well enough to own it.

That last outcome is underrated. “The AI wrote it” is not a substitute for engineering accountability.

A filled-out example

A generated pull request adds an endpoint for downloading a project activity report.

The author can explain the feature: project admins need a CSV. Good.

The trust-boundary review finds the handler checks that the user is logged in but does not check project membership until after building the report. Bad.

The data-flow review finds the CSV includes internal actor IDs and support notes that are not shown in the UI. Bad.

The dependency review finds no new package. Fine.

The tests cover “admin downloads report” but not “user from another project attempts download.” Incomplete.

Outcome: ask for changes. Move the membership check before the report query, use an allowlist for CSV columns, and add a wrong-project test.

That is a useful secure code review. It does not require a week-long audit. It requires enough evidence to avoid trusting plausible generated code by default.

Questions reviewers ask

Is this only for AI-generated code?

No. This checklist also works for human-written pull requests. AI-generated code gets special attention because the author may understand the desired behavior better than the implementation details.

Should every AI-generated pull request get AppSec review?

No. Route by risk. Documentation, tests, and isolated refactors do not need the same scrutiny as auth, billing, tenant isolation, exports, secrets, dependencies, infrastructure, or AI prompt/data paths.

What is the most important test to add?

The negative test that proves the security property. Wrong user, wrong tenant, missing permission, malformed input, sensitive field absent, or unsafe dependency path blocked.

What if the code looks fine?

Ask for evidence anyway. Plausible code is not proof. The companion post, AI Code Security: The Real Risk of AI-Generated Code Is Plausibility, explains why.

Keep the checklist short enough to use

A checklist that takes two hours will not survive contact with a real engineering org. The point is not to make every pull request feel like a security audit. The point is to catch the AI-generated changes that look harmless while crossing a boundary.

If your team wants this kind of evidence in the pull request instead of in a late security queue, talk to Enclave.

If nobody can explain why the generated diff is safe, the diff is not ready.