ARTICLE · 9 MIN READ

Base44 Database Best Practices: Entity Schema, Queries, and Scale

The Base44 entity layer is a wrapped Postgres with opinions about how you can interact with it. The opinions matter: no transactions across entities, opaque indexing, no bulk delete, default-permissive reads. Designing entities for production means working with the platform's grain rather than against it. This article covers schema patterns that scale, query patterns that perform, and the platform-specific gotchas that bite teams once data grows past prototype scale.

Last verified: 2026-05-01
Published: 2026-05-01
Read time: 9 min
Words: 1,713

DATABASE
ENTITIES
SCHEMA
BEST-PRACTICES

Why this matters

The data layer is where most production-readiness questions become concrete. Schema decisions you make in week one shape what's possible in year three. On Base44, the platform's data-layer opinions push you toward certain patterns and away from others. Designing with the grain produces apps that scale cleanly. Designing against the grain produces apps that hit walls and require migration to fix.

This article covers the patterns we apply on every greenfield Base44 build and the patterns we recommend when auditing existing apps that have grown into pain.

What the entity layer is

Base44 entities are a wrapped Postgres with these properties:

Schema defined in the IDE, not as code or migrations.
Per-entity CRUD via the SDK; no JOINs, no aggregates.
Limited filter syntax: equality, $in, $ne, $gt, $lt, $or, $and.
Indexing managed by the platform; not user-configurable.
No transactions across entities.
5,000-record-per-list cap as of November 2025.
Default-permissive reads; ownership and tenant filtering is your job.

Knowing this shape lets you design with it.

Pattern 1: every entity has created_by

Even if you don't think you need user ownership today, add the field. It's free if unused and essential if needed.

User entity:
  - id (string, auto)
  - email (string)
  - full_name (string)
  - role (string, default "user")
  - created_date (timestamp, auto)

Todo entity:
  - id (string, auto)
  - title (string)
  - done (boolean, default false)
  - created_by (string, indexed-by-platform)  // set automatically by the platform on create
  - created_date (timestamp, auto)

Every list query filters by created_by. Every update checks that the calling user owns the record. ESLint rules flag any query missing the filter.

This isn't optional. Base44's default behavior leaks data without it. We covered the security implications in the SDK reference and security hardening checklist.

Pattern 2: denormalize aggressively

Without JOINs, the cost of fetching related data is high. Denormalize what you display together.

// Bad: every list of todos requires a second list of projects
Todo:
  - project_id (string)

// Better: store the displayed fields on the Todo
Todo:
  - project_id (string)
  - project_name (string, denormalized)
  - project_color (string, denormalized)

When the project changes, update the denormalized fields on every todo. This is fine for apps where projects rarely change; for apps where they do, write a backend function that propagates project changes to all dependent todos.

The trade: denormalization adds write complexity and storage cost; it dramatically reduces read complexity and latency. On Base44, where reads are the slow path, denormalization usually wins.

Pattern 3: tenant_id everywhere for multi-tenant apps

For B2B SaaS:

Organization entity:
  - id
  - name
  - plan
  - owner_user_id

User entity:
  - id
  - email
  - tenant_id (string)  // foreign key to Organization
  - role

Todo entity (or any tenant-scoped entity):
  - id
  - title
  - tenant_id (string)
  - created_by

Every list call filters by both tenant_id and (where appropriate) created_by:

const myTeamTodos = await base44.entities.Todo.list({
  tenant_id: currentUser.tenant_id,
});

Test with two-account scenarios. Account A in tenant X should never see anything from tenant Y, ever. This is the bug we find in over half of multi-tenant audits.

Pattern 4: paginate every list

Since November 2025, list() caps at 5,000 records per call. Even below that cap, large unpaginated lists destroy performance.

const PAGE_SIZE = 50;
const todos = await base44.entities.Todo.list(
  { tenant_id, created_by: me.email },
  "-created_date",
  PAGE_SIZE
);

For pages that need more than the first page, use cursor-based pagination on a sortable field:

async function loadMoreTodos(cursorDate: string | null) {
  const filter: Record<string, unknown> = {
    tenant_id,
    created_by: me.email,
  };
  if (cursorDate) {
    filter.created_date = { $lt: cursorDate };
  }
  return base44.entities.Todo.list(filter, "-created_date", PAGE_SIZE);
}

Avoid offset-based pagination; the platform's list() doesn't support offset cleanly, and offset gets slow as you go deeper into the result set anyway.

Pattern 5: aggregate via pre-computation

There's no SUM, COUNT, or GROUP BY. For aggregates, pre-compute and store.

TodoStats entity (one record per user per day):
  - user_id
  - date
  - total_count
  - completed_count
  - tenant_id

A scheduled function runs nightly:

// backend/functions/computeDailyStats.ts
export default async function handler(req: Request) {
  const today = new Date().toISOString().split("T")[0];
  const users = await base44.entities.User.list();

  for (const user of users) {
    const todos = await base44.entities.Todo.list({
      created_by: user.email,
      created_date: { $gte: today + "T00:00:00Z", $lt: today + "T23:59:59Z" },
    });

    await base44.entities.TodoStats.create({
      user_id: user.id,
      date: today,
      total_count: todos.length,
      completed_count: todos.filter(t => t.done).length,
      tenant_id: user.tenant_id,
    });
  }

  return new Response("OK", { status: 200 });
}

For real-time aggregates (today's count up to this moment), increment counters on write. This is more code but avoids the daily-aggregate latency.

Pattern 6: avoid cross-entity transactions

Base44 has no transaction() primitive. A multi-entity update can fail halfway, leaving inconsistent state.

Solutions:

Idempotent operations. Design so that re-running the same update produces the same result. If the update fails halfway, you can re-run safely.

Compensating writes. If updating entity A succeeds and entity B fails, write a compensating update to A to roll back. Document this explicitly.

Single-entity sources of truth. If two entities have related state, make one canonical and derive the other. Stripe-as-source-of-truth for billing is a common example: the User entity stores a snapshot of the subscription state, but Stripe is canonical, and a reconciliation job catches drift.

Avoid the situation entirely. Often a "transaction across entities" is a sign of bad schema. If two entities always update together, they might be one entity.

Pattern 7: indexed query patterns

You can't create indexes, but the platform indexes some fields automatically. Empirically, fields that are reliably fast to filter on:

id
created_date
created_by
Any field with high cardinality that's frequently filtered on

Fields that may not be indexed:

Boolean flags (low cardinality)
Recently-added fields (platform may not have re-indexed)
Fields used in $or or complex predicates

Design queries to filter on indexed-likely fields first, with low-cardinality filters as secondary refinements:

// Good: filter by user (high-cardinality, indexed) first
await base44.entities.Todo.list({
  created_by: me.email,
  done: false,  // secondary refinement
});

// Worse: filter primarily by boolean
await base44.entities.Todo.list({
  done: false,  // platform may scan more records
});

Test query latency in production. If a query is slow, restructure the filter.

Pattern 8: external Postgres mirror for production scale

For any app expected to grow past 100K records in a hot entity, mirror to Postgres:

// backend/functions/mirrorTodoChange.ts
// Triggered on every Todo create/update/delete via webhook or polling
export default async function handler(req: Request) {
  const { event, todo } = await req.json();

  // Push to Supabase
  await fetch(`${SUPABASE_URL}/rest/v1/todos`, {
    method: event === "delete" ? "DELETE" : "POST",
    headers: {
      apikey: SUPABASE_SERVICE_KEY,
      "content-type": "application/json",
      prefer: "resolution=merge-duplicates",
    },
    body: event === "delete" ? null : JSON.stringify(todo),
  });

  return new Response("OK", { status: 200 });
}

The mirror gives you:

SQL query capability (analytics, complex reports).
Real backup/restore.
Disaster recovery if Base44 has an extended outage.
Migration runway if you ever leave the platform.
Real indexing control.

The trade: more moving parts, more cost ($25/month Supabase), more places where state can drift. Worth it past a certain scale.

Pattern 9: archive old data

Apps that retain data forever hit performance walls. Move old data out:

// backend/functions/archiveOldTodos.ts (scheduled monthly)
export default async function handler(req: Request) {
  const sixMonthsAgo = new Date(Date.now() - 180 * 24 * 60 * 60 * 1000).toISOString();

  const old = await base44.entities.Todo.list(
    { done: true, completed_at: { $lt: sixMonthsAgo } },
    "completed_at",
    500
  );

  for (const todo of old) {
    // Write to archive table (external Postgres or another entity)
    await pushToArchive(todo);
    // Delete from active entity
    await base44.entities.Todo.delete(todo.id);
  }

  return new Response(JSON.stringify({ archived: old.length }), { status: 200 });
}

Archiving keeps the active entity small enough to query fast. Archived data goes somewhere queryable for compliance/legal needs (regulatory retention, customer data export).

Pattern 10: never let the agent run schema migrations

The AI agent treats schema changes as cosmetic. It will rename a field, drop a column, or change a type without warning that the operation is destructive. Schema changes lose data.

Rule: every schema change is propose-then-apply. Have the agent suggest. You snapshot the entity. You apply manually. You verify. We covered this in detail in schema migration best practices.

Common database mistakes on Base44

Trusting Entity.list() defaults. Returns global data. Always filter.

Skipping the second-account isolation test. Multi-tenant bugs hide from solo testing.

Designing schema as if JOINs exist. They don't. Denormalize.

Synchronous N+1 reads. Batch with $in.

Unbounded list queries. Always paginate.

Trying to compute aggregates on the fly. Pre-compute and store.

Cross-entity transaction assumptions. Design idempotent operations and compensating writes.

Hot entity above 500K records. Mirror to Postgres or archive.

Letting the AI agent migrate. Always propose-then-apply.

Forgetting tenant_id on a new entity. Every multi-tenant app drifts here as new entities get added.

Database design checklist

Want us to audit your database design?

Our $497 audit reviews every entity for ownership filters, tenant isolation, query patterns, and scale readiness. Most apps have 4–10 fixable issues. For larger redesigns or external-Postgres migration, we run a dedicated database engagement. Order an audit or book a free 15-minute call.

Base44 Schema Migration Best Practices — the playbook for safely changing schemas after design.
Base44 SDK Reference — the API surface that compounds with these design patterns.
Base44 Performance Optimization Guide — query optimization that follows from the schema patterns above.

QUERIES

Frequently asked questions

Q.01Does Base44 support SQL or relational queries across entities?

A.01

No. The Base44 SDK exposes per-entity CRUD and a limited filter syntax. There are no JOINs, no aggregate queries (SUM, COUNT with GROUP BY), no subqueries, and no raw SQL access. For multi-entity reads you do client-side or backend-function-side joins by listing entities and merging in code. For aggregates, either pre-compute and store the result in a separate entity, or mirror the data to a real database where SQL is available.

Q.02Can I create indexes on Base44 entities?

A.02

Not directly. The platform manages indexing internally based on whatever queries it observes. You cannot create a composite index, a partial index, or a full-text index. For query performance, your only lever is to design schemas that match the platform's indexing assumptions: filter on individual fields rather than combinations, sort by indexed timestamp fields, and avoid filters that scan the entire entity. If you need explicit indexing control, mirror the data to Postgres via a backend function and query Postgres for complex reads.

Q.03How big can a Base44 entity get before performance degrades?

A.03

Empirically, entities work well up to roughly 100,000 records, become noticeably slower above 500,000, and show frequent failures (timeouts on common operations) above 1 million. The exact thresholds depend on field count and filter complexity. The platform does not publish hard limits. Apps approaching 500K records in any single entity should plan for either aggressive archival of old data or a partial migration of that entity to an external database.

Q.04How do I implement multi-tenancy on Base44?

A.04

Add a tenant_id field on every entity that needs tenant scoping, and filter every list query by tenant_id. There is no native tenant-isolation primitive on the platform. Multi-tenant isolation is your discipline. Add ESLint rules that flag list queries missing tenant_id. Test with two-account scenarios where each user is in a different tenant; verify that user A cannot see user B's data through any path. We see tenant-isolation bugs in roughly 60% of multi-tenant Base44 apps we audit.

Q.05Can I run database transactions on Base44?

A.05

Not across entities. Each Entity.create, update, or delete is its own atomic operation. There is no Base44.transaction(() => { ... }) primitive. If a multi-entity update fails partway, you are in an inconsistent state with no automatic rollback. Design schemas to minimize cross-entity transactional needs: denormalize, use idempotent operations, or implement compensating writes that undo a partial update on failure. Critical financial flows should not run on Base44 entities — use Stripe as the source of truth and mirror state.

Q.06Should I mirror Base44 data to an external database?

A.06

For any production app at meaningful scale: yes. A nightly or near-real-time mirror to Postgres (Supabase, RDS, self-hosted) gives you SQL query power, real backup/restore, real indexing, and a recovery path during platform outages. The cost is one backend function that streams entity changes plus a Postgres instance. Implementation is 4–12 hours; the value is significant disaster recovery improvement and dramatically better analytics capability.

NEXT STEP

Need engineers who actually know base44?

Book a free 15-minute call or order a $497 audit.

Book a free call Order audit