Why do base44 functions hibernate without active users?

Base44 runs functions on a serverless Deno sandbox that scales to zero when idle. Cold starts incur latency and the platform aggressively reclaims idle resources. For interactive web apps this is fine — a user request wakes the runtime. For background webhooks, no user request means no wake-up trigger, and the function may take 5–30 seconds to cold-start, by which time the calling service has already timed out.

Doesn't every serverless platform have cold starts? Why is base44 worse?

Cold starts are universal but well-architected platforms (AWS Lambda, Cloudflare Workers, Vercel) tune cold-start latency under 1 second and offer warm-pool options. Base44 does not expose those controls and the cold-start latency is closer to Deno-Deploy-cold-start territory. Combined with no provisioned concurrency option and no warm-up scheduling, the practical effect is that webhook reliability is materially worse than industry-standard serverless.

What's the most painful symptom of this issue?

Stripe subscription renewals at 3am that fail. Lowcode.agency captured it: 'Subscription renewal at 3am? Nope. Failed payment retry while customer's at lunch? Not happening.' The Stripe webhook arrives, base44 cold-starts too slowly or returns an error, Stripe retries with backoff, and after enough failures Stripe disables the endpoint entirely. Customers paid but never got service.

Can I work around this with a 'keep-alive' ping from outside?

Partially. Pinging the function every 1–5 minutes from a free service like Cron-Job.org or UptimeRobot keeps the runtime warmer. It does not fully solve the problem because base44 still scales down individual function instances and the keep-alive may hit a different instance than the actual webhook. It reduces failure rates but does not eliminate them.

What's the production-grade fix?

Move the webhook receiver out of base44 entirely. Run a tiny endpoint on Vercel, Cloudflare Workers, or a dedicated server that accepts the webhook, persists it to its own queue, and forwards to base44 via SDK or REST when base44 wakes up. The external service is always-on; base44 catches up asynchronously. This decouples your most critical event flows from base44's hibernation behavior.

Is this a sign I should migrate off base44?

If your business depends on background events — payments, scheduled jobs, third-party integrations — yes, this is a meaningful signal. Webhooks-require-active-users joins the missing-bulk-delete and no-bulk-update primitives as evidence that base44 is built for interactive use, not for background work. Production teams running revenue-critical webhooks tend to migrate within 6–12 months of hitting this issue.

Base44 Webhooks Only Fire While Users Are Active

What's happening

Your Stripe subscription renewals fail at 3am. Your scheduled email sends at 6am drop randomly. Your inbound webhook from a third-party service — Twilio, SendGrid, Hugging Face — returns 500 or times out when triggered while no users are logged in. By the time you check at 9am the next day, you have a handful of missed events, a few angry customers, and Stripe has disabled your webhook endpoint after consecutive failures.

The lowcode.agency review captured the user-facing pain in two sentences: "Subscription renewal at 3am? Nope. Failed payment retry while customer's at lunch? Not happening." The pattern is well-known in the base44 community but rarely surfaces until production volume is high enough that nighttime events matter.

The deceptive part is that during business hours everything works. You test the integration during the day, see it succeed, and ship to production. The failure mode appears only when traffic is low — which is exactly when uninterrupted background processing is most important.

Why this happens

Base44 runs your backend functions on a serverless Deno sandbox. Like most serverless platforms, it scales function instances to zero when idle, then cold-starts a fresh instance on the next request. The architecture is fine for interactive use — when a user clicks a button, the request wakes the runtime, the function runs, the user sees a response 200–800ms later.

For webhooks the architecture is broken. Three connected reasons.

First, cold-start latency. A cold base44 function takes 5–30 seconds to start, depending on dependencies and runtime conditions. Stripe webhooks have an effective timeout of 10 seconds — if you do not return 2xx within that window, Stripe records the delivery as failed and retries on its own backoff schedule. SendGrid, Twilio, Hugging Face, and most third parties are similarly strict. A cold start that exceeds the third-party timeout drops the event silently.

Second, no warm-pool option. Properly architected serverless platforms (AWS Lambda with provisioned concurrency, Cloudflare Workers, Vercel) let you pin warm instances to keep cold starts under 100ms for traffic-critical endpoints. Base44 exposes no such control. Every function instance can scale to zero at any time. You cannot pay for warmth.

Third, hibernation feedback loops. When a webhook fails, the third-party service backs off. Backoff means longer delays between retries. Longer delays mean a higher chance the next retry hits a cold runtime again. The system spirals into more failures before any retry succeeds. By the time the runtime is warm, the third party may have given up.

The deeper architectural issue is that base44 was designed around the idea that "the user is always present" — a fair assumption for an interactive web app but wrong for production-grade SaaS. Background processing is a first-class production need that base44's runtime model does not serve well.

Base44 has not announced a fix for this and is unlikely to ship one without rethinking the runtime contract. The mitigation lives outside the platform.

Source: lowcode.agency/blog/base44-not-working-errors-fixes; feedback.base44.com webhook threads; Stripe webhook reliability documentation; Deno Deploy cold-start benchmarks.

How to reproduce

Set up a base44 function that handles an inbound webhook (any third party — Stripe test mode is easiest).
During business hours, send a test webhook. Confirm it succeeds with low latency.
Wait until 2–4am local time when no users have touched the app for 6+ hours.
Send another test webhook from the third-party dashboard.
Inspect the third party's delivery logs. You will commonly see one of: a 5–30 second response time, a timeout, or a 500 error from a Deno cold-start failure.
Repeat across multiple nights. The failure rate is non-zero.

Step-by-step fix

The production-grade fix is to move the webhook receiver out of base44. Five steps.

1. Stand up an external webhook receiver

Pick a stable always-on platform. Recommendations in order of simplicity:

Vercel Edge Functions — free tier handles low webhook volume, near-zero cold start.
Cloudflare Workers — free tier, similar profile.
A small Node server on Fly.io or Render — for teams that want a long-running process.

The receiver does only two things: accept the webhook (verify signature, return 2xx fast) and persist the raw event to durable storage.

2. Persist events to a durable queue

Use Cloudflare KV, a small Postgres table, Supabase, or Upstash Redis. The receiver writes the raw webhook payload, event type, source, and timestamp. This is your audit log and your retry queue.

// edge function pseudo-code
export async function handler(request: Request) {
  const event = await verifyWebhookSignature(request);
  await db.insert('webhook_events', {
    id: event.id,
    source: 'stripe',
    type: event.type,
    payload: JSON.stringify(event),
    received_at: new Date().toISOString(),
    processed: false,
  });
  return new Response('OK', { status: 200 });
}

3. Forward events to base44 asynchronously

A second worker reads unprocessed events from the queue and calls the base44 function via the SDK or a REST endpoint. The base44 function does the actual business logic — provision the customer, send the email, update the order.

If base44 is cold and slow, the worker waits and retries. The third party never sees the slowness.

4. Add a reconciliation job

Once a day, compare your queue against the third party's "list events" API (Stripe, SendGrid, Twilio all expose this). Any event in the third party that is not in your queue indicates a delivery failure on the receiver — rare but possible. Fetch and replay the missed events.

5. Migrate one webhook at a time

Do not rewire every webhook in one push. Start with the highest-stakes one (typically Stripe), validate over a week, then move the next. This limits blast radius if you misconfigure something.

DIY vs hire decision

This fix sits at the boundary between "solo builder DIY" and "needs an engineer." The mechanics are not hard for someone comfortable with cloud infrastructure. The traps are subtle.

DIY if you have:

Comfort writing edge functions or small Node services.
A basic queue/persistence layer in mind.
Time to validate over a week before moving traffic.

Hire if any of these apply:

Stripe webhooks are involved and downtime means lost revenue.
You have multiple third-party integrations and need them all hardened in parallel.
Your team has no infrastructure experience outside base44.

We have built this exact pattern for ~20 base44 clients. Standard implementation takes 3–7 days and survives base44's hibernation indefinitely.

Need this fix shipped this week?

We treat this as a complex multi-bug fix because it touches infrastructure, third-party integrations, and reconciliation. Standard scope: external receiver, durable queue, async forwarder, reconciliation job, and migration of all critical webhooks. 1–2 week turnaround.

Order a complex fix or book a free 15-minute call to scope which webhooks to migrate first.

Stripe integration breaks after updates — the Stripe webhook is usually the first to fail and the most painful.
Functions stop working after hours — same cold-start root cause, different symptom.
No SLA — outage risk — base44 has no contractual commitment to background-event reliability.