fixes / launch-ready

How I Would Fix webhooks failing silently in a Next.js and Stripe internal admin app Using Launch Ready.

When Stripe webhooks fail silently in a Next.js internal admin app, the symptom is usually ugly and expensive: payments succeed, but your app never...

How I Would Fix webhooks failing silently in a Next.js and Stripe internal admin app Using Launch Ready

When Stripe webhooks fail silently in a Next.js internal admin app, the symptom is usually ugly and expensive: payments succeed, but your app never updates the customer record, invoice state, or admin dashboard. The most likely root cause is not "Stripe is down"; it is usually one of these: the endpoint is not reachable, the signature verification fails, the route returns 200 too early, or the event handler throws after the response is already gone.

The first thing I would inspect is Stripe's webhook delivery history and the exact Next.js route that receives the event. I want to see whether Stripe got a 2xx response, whether the request body was preserved for signature verification, and whether your logs show an exception that never made it back to Stripe.

Triage in the First Hour

1. Open Stripe Dashboard -> Developers -> Webhooks.

  • Check recent deliveries, response codes, retries, and timestamps.
  • If you see repeated 400s or 500s, this is not silent failure. It is a visible delivery failure.
  • If you see 2xx but no downstream app change, the bug is in your handler logic or async processing.

2. Inspect your production logs first.

  • Look for webhook route requests by path and timestamp.
  • Search for `stripe-signature`, `WebhookError`, JSON parse errors, or unhandled promise rejections.
  • If you have no structured logs, that is already part of the problem.

3. Check the exact deployed webhook URL in Stripe.

  • Confirm it matches production, not preview or localhost.
  • Confirm there are no redirect chains from HTTP to HTTPS or from non-www to www that might break delivery.

4. Verify environment variables in production.

  • `STRIPE_SECRET_KEY`
  • `STRIPE_WEBHOOK_SECRET`
  • Any admin API keys used after event receipt
  • Make sure values are present in the deployed environment and not only in local `.env`.

5. Inspect the Next.js route implementation.

  • Confirm it uses raw request body handling where required for Stripe signature verification.
  • Confirm it does not call `res.json()` or parse JSON before verifying the signature if that changes the body.

6. Check deployment health and recent builds.

  • Did this start after a release?
  • Did a framework upgrade change route behavior?
  • Did someone move from Pages Router to App Router or change runtime settings?

7. Review Cloudflare or proxy rules if used.

  • Check WAF blocks, bot protection, caching rules, and SSL mode.
  • Webhook endpoints should not be cached or challenged.

8. Reproduce with a test event from Stripe CLI or Dashboard.

  • Send one known event type like `payment_intent.succeeded`.
  • Confirm the full path from Stripe to database write to admin UI update.
stripe listen --forward-to https://yourdomain.com/api/webhooks/stripe
stripe trigger payment_intent.succeeded

Root Causes

| Likely cause | What it looks like | How I confirm it | |---|---|---| | Signature verification fails | Stripe shows 400 responses or retries | Compare endpoint secret, raw body handling, and request headers | | Route returns before work finishes | Stripe gets 200 but data never updates | Check whether async DB writes are awaited before response | | Wrong deployed URL | Events go to staging or dead endpoint | Compare Dashboard endpoint URL with actual production domain | | Proxy or Cloudflare interference | Random failures, timeouts, blocked requests | Inspect WAF logs, caching rules, SSL mode, bot protection | | Unhandled error after ack | Silent success in Stripe but missing business action | Add error logging around DB writes and downstream calls | | Event logic too narrow | Some event types work, others do nothing | Compare actual event names against switch cases and handlers |

1. Signature verification fails

This is common when Next.js parses the body before Stripe verifies it. In practice, this means your code may look fine in development but fail in production because raw bytes are altered.

I confirm this by checking whether the handler uses the correct raw body approach for its router style. If signatures fail intermittently, I also check for proxy compression or body transformations.

2. The handler acknowledges too early

A lot of internal apps do this: receive webhook -> fire off database update -> return 200 immediately -> hope the rest succeeds. That creates silent data loss if any later step fails.

I confirm it by reading whether every critical operation is awaited before sending success back to Stripe.

3. Wrong endpoint configuration

Founders often point Stripe at a preview URL during testing and forget to switch it later. Or they deploy behind a new domain and leave Stripe pointing at an old path.

I confirm this by comparing:

  • Stripe webhook endpoint URL
  • production domain
  • current deployment alias
  • any redirects on that path

4. Cloudflare or reverse proxy interference

If Cloudflare challenges bots or caches API routes incorrectly, webhooks can fail without obvious app errors. A webhook should be treated as an origin-to-origin machine request, not normal browser traffic.

I confirm this by checking firewall events and making sure API routes are excluded from caching and challenge rules.

5. Event handling gaps

Sometimes only one event type was implemented during build-out. The app works for `checkout.session.completed` but ignores `invoice.paid`, `customer.subscription.updated`, or whatever your admin panel actually depends on.

I confirm this by comparing real delivered event names with code coverage in the switch statement or handler map.

The Fix Plan

My goal is to repair this without making a bigger mess. For an internal admin app built with Next.js and Stripe, I would keep the fix small, explicit, logged, and reversible.

1. Freeze changes to webhook code until we know what broke.

  • No refactors.
  • No new abstraction layers.
  • No "cleanup" while revenue events are failing.

2. Add explicit logging at each stage of processing.

  • Log receipt of event ID and type.
  • Log signature verification success or failure.
  • Log database write start and completion.
  • Log any downstream API call separately.

3. Make webhook processing idempotent.

  • Store processed Stripe event IDs in a table with a unique constraint.
  • If an event arrives twice because of retries, ignore duplicates safely.
  • This matters because Stripe retries by design when responses fail or time out.

4. Verify raw body handling for your Next.js router type.

  • If using App Router route handlers, make sure you are reading raw text correctly for verification where needed.
  • If using Pages Router API routes, disable automatic body parsing for that route if required by your implementation pattern.

5. Move critical side effects behind durable persistence.

  • First write the event record to your database.
  • Then process business updates from that stored record if needed.
  • For higher reliability later: queue follow-up work instead of doing everything inline.

6. Return failure when processing fails before completion.

  • Do not send 200 if validation failed or DB writes failed.
  • Let Stripe retry rather than hiding broken state behind false success.

7. Tighten security controls around the endpoint.

  • Restrict secrets to server-side only.
  • Keep least privilege on any service role used by webhook processing.
  • Validate payload shape even after signature verification so malformed data cannot trigger bad writes.

8. Test against production-like conditions before redeploying wide open again.

  • Use test mode events if possible.
  • Re-run known historical events through staging first if you have that setup.

A safe implementation pattern looks like this:

// Pseudocode only: verify first, then process synchronously enough to know success/failure
const sig = req.headers.get("stripe-signature");
const rawBody = await req.text();

let event;
try {
  event = stripe.webhooks.constructEvent(rawBody, sig!, process.env.STRIPE_WEBHOOK_SECRET!);
} catch (err) {
  console.error("Webhook signature failed", err);
  return new Response("Invalid signature", { status: 400 });
}

try {
  await saveEventIfNew(event.id);
  await handleStripeEvent(event);
  return new Response("OK", { status: 200 });
} catch (err) {
  console.error("Webhook processing failed", err);
  return new Response("Processing failed", { status: 500 });
}

Regression Tests Before Redeploy

I would not ship this fix until these checks pass:

1. Signature test

  • Send a real signed test event through Stripe CLI or Dashboard.
  • Acceptance criteria: valid signatures return 200; invalid signatures return 400; no secret appears in logs.

2. Duplicate delivery test

  • Deliver the same event twice.
  • Acceptance criteria: only one database mutation happens; second delivery is ignored safely; no duplicate admin records appear.

3. Failure path test

  • Force a temporary DB failure or mock downstream error in staging.
  • Acceptance criteria: handler returns non-200; Stripe retries; error appears in logs with event ID attached.

4. Event coverage test

  • Test every event type your admin UI depends on:
  • payment succeeded
  • invoice paid
  • subscription updated
  • refund created if relevant
  • Acceptance criteria: each mapped event changes state correctly; unmapped events are logged clearly for review.

5. Security test

  • Confirm endpoint rejects invalid signatures and malformed payloads cleanly.
  • Acceptance criteria: no auth bypass; no stack traces exposed to clients; secrets remain server-side only.

6. Observability test

  • Trigger one live-like event end-to-end while watching logs and dashboard state.
  • Acceptance criteria: you can trace one request from receipt to DB write within under 60 seconds during manual inspection.

7. UX sanity check for admins

  • Refresh affected screens after webhook processing completes.
  • Acceptance criteria: UI reflects updated payment state without manual cache clearing; loading/error states are understandable if sync lags briefly.

For an internal admin app like this, I want at least basic coverage on webhook routes plus one integration test per critical business flow before redeploying anywhere near production traffic again.

Prevention

The real fix is not just "make webhooks work once". It is building guardrails so silent failures do not survive another release cycle.

1. Add monitoring with alerting on missed webhook outcomes.

  • Alert when expected events arrive but no DB update occurs within a set window like 2 minutes.
  • Alert on repeated non-2xx responses from your endpoint above even 3 failures in 10 minutes.

2. Log with correlation IDs tied to Stripe event IDs.

  • Every log line should include `event.id`.
  • This makes postmortems faster and avoids guessing which request broke what.

3. Review webhook code as security-sensitive infrastructure code.

  • Check auth boundaries even though webhooks are server-to-server requests.
  • Validate input shape anyway because verified does not mean safe for business logic execution without checks.

4. Keep Cloudflare rules explicit for API routes.

  • No caching on webhook paths。
  • No browser challenge on machine endpoints。
  • No accidental redirect loops。

5. Use idempotency everywhere money touches state。

  • Unique constraint on processed events。
  • Safe retries on downstream operations。
  • Avoid double-charging side effects or duplicate provisioning。

6. Add rollback-safe deployment practices。

  • Small releases。
  • Feature flags for new handler logic。
  • One owner watching delivery metrics during rollout。

7. Document handover steps inside the repo。

  • Webhook secret location。
  • Endpoint URL。
  • Event types supported。
  • Recovery steps if deliveries spike red。

When to Use Launch Ready

Use Launch Ready when you need me to get this stable fast without turning it into a drawn-out engineering project。 It is built for exactly this kind of problem: domain setup、email、Cloudflare、SSL、deployment、secrets、and monitoring fixed inside a tight window。

  • DNS checks and redirects
  • Subdomain cleanup for staging vs production
  • Cloudflare configuration for SSL、caching、and DDoS protection
  • SPF、DKIM、and DMARC where email deliverability affects admin alerts
  • Production deployment validation
  • Environment variables and secrets review
  • Uptime monitoring setup
  • Handover checklist so you know what changed

What I need from you before I start: 1. Access to GitHub or your repo host。 2 . Access to Vercel、Cloudflare、Stripe、and hosting accounts。 3 . A list of which admin actions depend on webhooks。 4 . One example of an expected broken flow,like "payment succeeds but user stays unpaid"。 5 . Any recent deploy links or commit hashes。

If you already have paying users,do not wait until support tickets pile up。 A silent webhook bug turns into manual reconciliation,bad customer trust,and wasted founder hours very quickly。

Delivery Map

References

1 . Roadmap.sh API Security Best Practices https://roadmap.sh/api-security-best-practices

2 . Roadmap.sh Code Review Best Practices https://roadmap.sh/code-review-best-practices

3 . Roadmap.sh QA https://roadmap.sh/qa

4 . Stripe Webhooks Documentation https://docs.stripe.com/webhooks

5 . Next.js Route Handlers Documentation https://nextjs.org/docs/app/building-your-application/routing/route-handlers

---

Take the next step

If this is a problem in your product right now, here is what to do next:

  • [Use the free Cyprian tools](/tools) - estimate cost, score app risk, check launch readiness, or pick the right service sprint.
  • [Book a discovery call](/contact) - I will tell you honestly whether you need a sprint or if you can DIY the next step.

*Written by Cyprian Tinashe Aarons - senior full-stack and AI engineer helping founders rescue, launch, automate, and scale AI-built products.*

Next steps
About the author

Cyprian Tinashe AaronsSenior Full Stack & AI Engineer

Cyprian helps founders rescue, secure, deploy, and automate AI-built apps with production-grade engineering, launch systems, and AI integration.