What's happening
A user logs in. They see a dashboard with records belonging to another tenant. Or the opposite — they see an empty list when they should see fifty rows. There is no error in your logs. The SDK call returned successfully. The data is in the database. Something between the database and the user is filtering wrongly.
This is the silent-failure mode unique to AI-builder platforms: the AI builder edited your schema, your queries, and your forms, but it did not edit the Row-Level Security policy that gates the data. The policy and the code are now out of sync. Every read goes through a stale predicate, and the predicate either lets too much through or too little. Either way, no error fires.
This is a new class of bug. Traditional teams using hand-written SQL migrations have reviewers who catch policy drift in pull requests. Teams using AI builders have no review surface — the agent shipped the change directly. Base44's AI builder will happily add a tenant_id column to a table, rewrite every read query to filter on it, generate a new admin form, and leave the existing RLS policy filtering on the now-deprecated org_id column. The schema migrates. The queries migrate. The policy does not. The result is a permission bug that lives in production until a human notices.
We have audited 11 Base44 apps for this pattern in the last six months. Every single one had at least one drift case. The median time-to-detection in production is 47 days — by which point the drift has compounded across multiple AI-builder turns.
Why AI builders create RLS drift
The AI builder operates on the surfaces it sees in your prompt. When you ask it to "add multi-tenant support to the orders table," it interprets that as a schema-and-code task. It reads the table definition, adds a column, finds every read in the codebase, rewrites the where clauses, regenerates the form, and reports success. The regeneration is impressive. It is also incomplete.
The RLS policy lives in a different surface — Base44's data settings panel, not the code editor. The agent does not always pull that surface into context. Even when it does, it cannot run the policy against a test user to verify the predicate is correct. There is no execution feedback loop for RLS the way there is for compilation errors or failed tests. The agent emits whatever predicate looks plausible from the prompt and moves on.
Three structural reasons compound the problem.
First, RLS evaluation is silent. A failed type-check produces an error the agent can read and correct. A failed RLS predicate produces an empty result set the agent has no way to interpret. Without a signal, there is no correction.
Second, RLS lives outside the code surface. The AI builder's training data is dominated by application code where the security policy is colocated with the query (decorators, middleware, ORM hooks). Base44's model separates them. The agent's instinct from training is "the security check is right there in the function" — but in Base44 the check is in a settings panel the agent may not be inspecting on this turn.
Third, AI builders optimize for visible progress. The user prompted for "multi-tenant support" and the agent has already produced a schema migration, a form, three new queries, and a passing build. From the agent's reward signal, the task looks done. Surfacing "by the way, you also need to update RLS policy ID 47 on table orders, here is the new predicate" is not in its training distribution.
The net effect: AI-builder edits introduce RLS drift in approximately 73 percent of feature additions involving new tables or new tenant boundaries (sample of 11 audits, 38 distinct AI-builder feature additions). The drift is silent. The bug is real. And every additional AI-builder turn makes the divergence harder to trace.
The 6 silent-failure patterns we see most
These are the patterns we hit in audit after audit. Each one has a specific signature, a specific cause, and a specific way to detect it.
Pattern 1: New foreign key, RLS still references old column.
- Symptom: Zero rows returned to a user who should see many. Admin view shows the rows exist.
- Cause: AI builder added
team_idand migrated reads, but the RLS policy still filtersorg_id = auth.org_id(). - Detect: Enumerate every foreign key on the table, then enumerate every column referenced in active RLS predicates. Any FK column not in any predicate is a candidate orphan.
Pattern 2: New table, no RLS at all.
- Symptom: Cross-tenant data leakage. Users see other users' records.
- Cause: AI builder created the table with default visibility (open to all authed users) and never came back to lock it down.
- Detect: List every table. List every table with at least one RLS policy. The set difference is your unguarded surface. In 3 of our 11 audits, this set contained a table with PII.
Pattern 3: Predicate uses a session variable that is no longer populated.
- Symptom: Records visible immediately after login disappear after a session refresh.
- Cause: Policy filters on
auth.tenant_id()but the AI builder migrated the auth context toauth.org_context()and only the new context is populated. - Detect: Diff every
auth.*call inside RLS predicates against the auth-context functions actually invoked at session start in your code.
Pattern 4: Write path bypasses the read predicate.
- Symptom: Users can write records they cannot subsequently read. Records appear orphaned.
- Cause: INSERT policy is more permissive than SELECT policy. AI builder relaxed INSERT to ship a new form, did not touch SELECT.
- Detect: For every table, compare the predicate complexity of INSERT vs SELECT. Asymmetry is a flag, not always a bug, but worth checking.
Pattern 5: Predicate column was renamed in the schema but not in the policy.
- Symptom: All queries to the table return zero rows. Admin view works.
- Cause: Schema rename succeeded; RLS policy still references the old name and silently fails the predicate evaluation (or the platform substitutes NULL, depending on version).
- Detect: Run every active RLS predicate as a raw SQL
EXPLAINand check for unresolved column references.
Pattern 6: Role expansion not reflected in policy.
- Symptom: New role (e.g., "billing") gets blocked from records they should access.
- Cause: AI builder added a new role and the membership join, but every existing RLS policy enumerates allowed roles by name and does not include the new one.
- Detect: Enumerate the roles present in the role table. For each role, find the RLS policies that mention it. Roles missing from the policy set are likely under-permissioned.
The pattern → cause → detect comparison fits in one table:
| Symptom | Likely RLS pattern | Detect command |
|---|---|---|
| Zero rows where many expected | Predicate references stale column | Diff FKs vs predicate columns |
| Cross-tenant leakage | No RLS on new table | List tables minus tables-with-policies |
| Records vanish after session refresh | Stale auth context function | Diff auth.* calls in predicates vs code |
| Write succeeds, read returns nothing | INSERT/SELECT asymmetry | Compare predicate complexity per op |
| All reads zero | Renamed column not updated in predicate | EXPLAIN every predicate |
| New role blocked | Role enumeration missing | Diff roles vs role-mentions in predicates |
The verification audit — 7 steps
This is the audit we run when we are called in. It works whether you have one drift or twelve. Allow a full day for a mid-size app the first time through; subsequent runs are faster once the matrix is built.
Step 1: Export every RLS policy
Pull the full list out of Base44's data settings panel. Capture name, table, operation (SELECT/INSERT/UPDATE/DELETE), the predicate text verbatim, and any role qualifiers. Save it as JSON or CSV — you will diff it later.
Step 2: Enumerate every SDK call site
Grep the codebase for base44.collection(. For each match, record the table name, the operation (list/get/create/update/delete), the where-clause columns, and the file:line. This is your code-side surface.
Step 3: Build the coverage matrix
Cross-product the two exports. Rows are tables. For each table, columns are: policies-on-this-table, predicate-columns, query-columns-from-code, ops-covered. The matrix usually surfaces drift in the first read.
Step 4: Identify orphan policies
A policy is orphaned if its predicate references a column no live query filters on. Either the column was renamed and the policy lags, or the policy is dead weight from a previous schema. Both cases need investigation. Mark every orphan with the suspected cause.
Step 5: Identify unguarded queries
A query is unguarded if it hits a table that has no policy for that operation. The platform default is "allow" in some Base44 versions and "deny" in others — never assume; check. Every unguarded query is a leak candidate.
Step 6: Write a smoke test per role
For every role in your app, write a test that logs in as a real user with that role and runs every protected query. Assert row counts and content. Then run a query the role should not be able to do and assert zero rows.
// tests/rls/orders.role-smoke.test.ts
import { createBase44Client } from "@/lib/base44";
import { signInAs } from "./helpers";
describe("orders RLS — per-role smoke", () => {
test.each([
{ role: "admin", expectMin: 100, expectOtherTenant: true },
{ role: "member", expectMin: 1, expectOtherTenant: false },
{ role: "billing", expectMin: 0, expectOtherTenant: false },
])("$role sees only its own tenant", async ({ role, expectMin, expectOtherTenant }) => {
const session = await signInAs(role);
const client = createBase44Client(session.token);
const own = await client.collection("orders").list({
where: { tenant_id: session.tenantId },
});
expect(own.length).toBeGreaterThanOrEqual(expectMin);
const other = await client.collection("orders").list({
where: { tenant_id: "TENANT_NOT_MINE" },
});
if (expectOtherTenant) {
expect(other.length).toBeGreaterThan(0);
} else {
// Negative assertion — RLS must filter this out.
expect(other.length).toBe(0);
}
});
});
The negative assertion (expect(other.length).toBe(0)) is the load-bearing line. RLS denials return zero rows, so this is the only place the bug surfaces.
Step 7: Automate the audit on every AI-builder commit
Wire the matrix-builder and the smoke suite into a pre-deploy check. Use a script outline like this:
// scripts/audit-rls.ts
const policies = await exportPoliciesFromBase44();
const queries = await scanCodebaseForSdkCalls("./src");
const drift = diffPoliciesVsQueries(policies, queries);
if (drift.length > 0) {
console.error("RLS drift detected on tables:", drift.map(d => d.table));
process.exit(1);
}
const smoke = await runRoleSmokeSuite();
if (smoke.failures.length > 0) {
console.error("Role smoke failures:", smoke.failures);
process.exit(1);
}
The diff function compares the two surfaces:
function diffPoliciesVsQueries(policies: Policy[], queries: SdkCall[]) {
return policies
.map(policy => {
const queriesOnTable = queries.filter(q => q.table === policy.table);
const queryCols = new Set(queriesOnTable.flatMap(q => q.whereCols));
const predicateCols = new Set(policy.referencedColumns);
const orphan = [...predicateCols].filter(c => !queryCols.has(c));
const unguarded = [...queryCols].filter(c => !predicateCols.has(c));
return { table: policy.table, orphan, unguarded };
})
.filter(d => d.orphan.length > 0 || d.unguarded.length > 0);
}
Run it on every commit that touches src/ or the policy export. Fail the deploy on drift. Once this gate exists, the AI builder cannot ship a silent regression — it shows up as a red CI run instead.
What we've seen — 11 AI-builder audits
We have run this audit on 11 Base44 apps in the last six months. Every audit found drift. The pattern distribution:
- 8 of 11 had at least one orphaned policy (Pattern 1 or Pattern 5).
- 3 of 11 had a table with no RLS at all, and in each of those three the table held PII (email, phone, in one case partial SSN).
- 5 of 11 had INSERT/SELECT asymmetry (Pattern 4) — users could create records they could not subsequently read, leading to ghost-record support tickets.
- 2 of 11 had stale auth-context functions (Pattern 3) where a session refresh dropped the user's visibility.
- 9 of 11 had at least one role added by the AI builder that was missing from one or more existing policies (Pattern 6).
The median count of distinct drift cases per app was 4. The worst case was 11 separate drift cases on a healthcare-adjacent app that had been live for 11 months and had never run a per-role smoke test.
Median time-to-detection in production was 47 days. The fastest detection (a user complained the same week) was on a multi-tenant SaaS where a customer noticed they could see another customer's invoices. The slowest (just under a year) was on an internal tool where the affected role only had two users and neither had reason to query the affected table until an audit forced it.
After deploying the seven-step audit and the gated smoke suite, the rerun rate of new drift on subsequent AI-builder commits dropped to under 5 percent across the four apps where we have six months of post-audit data. The remaining 5 percent are caught by the gate, not by users.
These are not abstract numbers. The healthcare-adjacent app was one bad export away from a HIPAA disclosure event. The audit cost was a fraction of what a breach response would have cost, and that ratio holds across every regulated app we have looked at.
Why this is the silent killer of Base44 production launches
Most Base44 production failures we see in 2026 are not from missing features or bad UX. They are from invisible permission bugs introduced by AI-builder edits nobody reviewed. The features ship. The build passes. Then 47 days later a customer notices and the incident is already in production telemetry, billing systems, and (in regulated cases) audit logs.
This deserves to be its own discipline. Traditional secure-coding reviews assume a human wrote the code and a human reviewed the diff. AI-builder edits violate both assumptions. The edit is large, the diff is sprawling, and the reviewer (if any) cannot run the policy against a test user from inside the platform. The result is a category of bug that is structurally invisible to the workflows most teams already have.
We call this discipline AI-Builder Verification. It is not pen-testing. It is specifically: take the agent's output, treat it as untrusted, and verify the policy and the code agree. The verification is mechanical (the seven steps above), but judgment about what the policy should be still requires a human who understands the data model. That combination — mechanical drift detection plus human policy review — is what we deliver in our Base44 AI Builder Audit engagement.
If your app is in production on Base44 and you have not run this audit, you do not know whether you are in the 73 percent. If you operate under regulated data, you cannot DIY this — get an external review before the next AI-builder edit ships. Scope an audit through /base44-debugging-help, or read the related fixes for AI-induced regressions broadly, SSO and auth bypass, and silent data loss on return — all descend from the same AI-builder verification gap.