What's happening
Your Stripe subscription renewals fail at 3am. Your scheduled email sends at 6am drop randomly. Your inbound webhook from a third-party service — Twilio, SendGrid, Hugging Face — returns 500 or times out when triggered while no users are logged in. By the time you check at 9am the next day, you have a handful of missed events, a few angry customers, and Stripe has disabled your webhook endpoint after consecutive failures.
The lowcode.agency review captured the user-facing pain in two sentences: "Subscription renewal at 3am? Nope. Failed payment retry while customer's at lunch? Not happening." The pattern is well-known in the base44 community but rarely surfaces until production volume is high enough that nighttime events matter.
The deceptive part is that during business hours everything works. You test the integration during the day, see it succeed, and ship to production. The failure mode appears only when traffic is low — which is exactly when uninterrupted background processing is most important.
Why this happens
Base44 runs your backend functions on a serverless Deno sandbox. Like most serverless platforms, it scales function instances to zero when idle, then cold-starts a fresh instance on the next request. The architecture is fine for interactive use — when a user clicks a button, the request wakes the runtime, the function runs, the user sees a response 200–800ms later.
For webhooks the architecture is broken. Three connected reasons.
First, cold-start latency. A cold base44 function takes 5–30 seconds to start, depending on dependencies and runtime conditions. Stripe webhooks have an effective timeout of 10 seconds — if you do not return 2xx within that window, Stripe records the delivery as failed and retries on its own backoff schedule. SendGrid, Twilio, Hugging Face, and most third parties are similarly strict. A cold start that exceeds the third-party timeout drops the event silently.
Second, no warm-pool option. Properly architected serverless platforms (AWS Lambda with provisioned concurrency, Cloudflare Workers, Vercel) let you pin warm instances to keep cold starts under 100ms for traffic-critical endpoints. Base44 exposes no such control. Every function instance can scale to zero at any time. You cannot pay for warmth.
Third, hibernation feedback loops. When a webhook fails, the third-party service backs off. Backoff means longer delays between retries. Longer delays mean a higher chance the next retry hits a cold runtime again. The system spirals into more failures before any retry succeeds. By the time the runtime is warm, the third party may have given up.
The deeper architectural issue is that base44 was designed around the idea that "the user is always present" — a fair assumption for an interactive web app but wrong for production-grade SaaS. Background processing is a first-class production need that base44's runtime model does not serve well.
Base44 has not announced a fix for this and is unlikely to ship one without rethinking the runtime contract. The mitigation lives outside the platform.
Source: lowcode.agency/blog/base44-not-working-errors-fixes; feedback.base44.com webhook threads; Stripe webhook reliability documentation; Deno Deploy cold-start benchmarks.
How to reproduce
- Set up a base44 function that handles an inbound webhook (any third party — Stripe test mode is easiest).
- During business hours, send a test webhook. Confirm it succeeds with low latency.
- Wait until 2–4am local time when no users have touched the app for 6+ hours.
- Send another test webhook from the third-party dashboard.
- Inspect the third party's delivery logs. You will commonly see one of: a 5–30 second response time, a timeout, or a 500 error from a Deno cold-start failure.
- Repeat across multiple nights. The failure rate is non-zero.
Step-by-step fix
The production-grade fix is to move the webhook receiver out of base44. Five steps.
1. Stand up an external webhook receiver
Pick a stable always-on platform. Recommendations in order of simplicity:
- Vercel Edge Functions — free tier handles low webhook volume, near-zero cold start.
- Cloudflare Workers — free tier, similar profile.
- A small Node server on Fly.io or Render — for teams that want a long-running process.
The receiver does only two things: accept the webhook (verify signature, return 2xx fast) and persist the raw event to durable storage.
2. Persist events to a durable queue
Use Cloudflare KV, a small Postgres table, Supabase, or Upstash Redis. The receiver writes the raw webhook payload, event type, source, and timestamp. This is your audit log and your retry queue.
// edge function pseudo-code
export async function handler(request: Request) {
const event = await verifyWebhookSignature(request);
await db.insert('webhook_events', {
id: event.id,
source: 'stripe',
type: event.type,
payload: JSON.stringify(event),
received_at: new Date().toISOString(),
processed: false,
});
return new Response('OK', { status: 200 });
}
3. Forward events to base44 asynchronously
A second worker reads unprocessed events from the queue and calls the base44 function via the SDK or a REST endpoint. The base44 function does the actual business logic — provision the customer, send the email, update the order.
If base44 is cold and slow, the worker waits and retries. The third party never sees the slowness.
4. Add a reconciliation job
Once a day, compare your queue against the third party's "list events" API (Stripe, SendGrid, Twilio all expose this). Any event in the third party that is not in your queue indicates a delivery failure on the receiver — rare but possible. Fetch and replay the missed events.
5. Migrate one webhook at a time
Do not rewire every webhook in one push. Start with the highest-stakes one (typically Stripe), validate over a week, then move the next. This limits blast radius if you misconfigure something.
DIY vs hire decision
This fix sits at the boundary between "solo builder DIY" and "needs an engineer." The mechanics are not hard for someone comfortable with cloud infrastructure. The traps are subtle.
DIY if you have:
- Comfort writing edge functions or small Node services.
- A basic queue/persistence layer in mind.
- Time to validate over a week before moving traffic.
Hire if any of these apply:
- Stripe webhooks are involved and downtime means lost revenue.
- You have multiple third-party integrations and need them all hardened in parallel.
- Your team has no infrastructure experience outside base44.
We have built this exact pattern for ~20 base44 clients. Standard implementation takes 3–7 days and survives base44's hibernation indefinitely.
Need this fix shipped this week?
We treat this as a complex multi-bug fix because it touches infrastructure, third-party integrations, and reconciliation. Standard scope: external receiver, durable queue, async forwarder, reconciliation job, and migration of all critical webhooks. 1–2 week turnaround.
Order a complex fix or book a free 15-minute call to scope which webhooks to migrate first.
Related problems
- Stripe integration breaks after updates — the Stripe webhook is usually the first to fail and the most painful.
- Functions stop working after hours — same cold-start root cause, different symptom.
- No SLA — outage risk — base44 has no contractual commitment to background-event reliability.