fixes / launch-ready

How I Would Fix webhooks failing silently in a Next.js and Stripe subscription dashboard Using Launch Ready.

The symptom is usually ugly in a very specific way: the customer pays, Stripe shows the event was sent, but your Next.js app never updates the...

How I Would Fix webhooks failing silently in a Next.js and Stripe subscription dashboard Using Launch Ready

The symptom is usually ugly in a very specific way: the customer pays, Stripe shows the event was sent, but your Next.js app never updates the subscription state. The founder sees "payment succeeded" in Stripe and "still locked out" in the dashboard, which means support tickets, failed onboarding, and churn.

The most likely root cause is not "Stripe is broken". It is usually one of these: the webhook route is not reachable in production, the request body is being parsed before Stripe verifies it, the signing secret does not match the deployed environment, or errors are being swallowed after verification. The first thing I would inspect is the live webhook delivery log in Stripe, then the deployed Next.js route handler, then the environment variables in production.

Triage in the First Hour

1. Open Stripe Dashboard > Developers > Webhooks.

Check recent deliveries for `2xx`, `4xx`, and `5xx`.
Look for retries, timeout spikes, and event types like `checkout.session.completed`, `invoice.paid`, and `customer.subscription.updated`.

2. Inspect the exact endpoint URL.

Confirm it points to production, not localhost or a stale preview URL.
Verify there are no redirect chains caused by Cloudflare or domain changes.

3. Check your Next.js route file.

Find the webhook handler under `app/api/.../route.ts` or `pages/api/...`.
Confirm it uses raw request body handling for Stripe signature verification.

4. Review deployment logs.

Look at Vercel, Netlify, Render, or your host logs for runtime errors.
Search for signature verification failures, JSON parse errors, and timeouts.

5. Check environment variables in production.

Verify `STRIPE_WEBHOOK_SECRET`, `STRIPE_SECRET_KEY`, and any database keys are present.
Make sure values match the live Stripe endpoint secret, not test mode.

6. Inspect database writes.

Confirm webhook events are actually updating subscription records.
Check whether writes fail silently because of missing try/catch logging.

7. Review auth and permission logic.

Make sure webhook processing is not blocked by user session checks.
Webhooks should be server-to-server and not depend on browser auth.

8. Validate Cloudflare and caching settings.

Ensure webhook routes are excluded from caching and aggressive edge transforms.
Confirm no WAF rule is blocking Stripe IPs or POST requests.

9. Reproduce with a test event.

Send a Stripe test webhook event from the dashboard.
Compare what Stripe says was delivered with what your app logs show.

10. Capture one failing request end-to-end.

Request ID from Stripe
Server log line
Database write result
Final UI state

stripe listen --forward-to localhost:3000/api/stripe/webhook

Use this only to reproduce locally and confirm signature handling before you touch production again.

Root Causes

| Likely cause | What it looks like | How I confirm it | |---|---|---| | Raw body is being parsed | Signature verification fails or always returns 400 | Check if middleware or body parser touches the payload before `constructEvent` | | Wrong signing secret | Webhook works in test but fails in prod | Compare production env var to Stripe dashboard endpoint secret | | Wrong endpoint URL | Events show "delivered" to old domain or preview build | Inspect Stripe endpoint settings and deployment history | | Silent exception after verification | Stripe gets 200 but DB never updates | Add structured logs around every write path | | Caching or proxy interference | Random failures or unexpected redirects | Check Cloudflare rules, redirects, cache bypass on `/api/*` | | Missing idempotency handling | Duplicate events create inconsistent state | Look for repeated event IDs and duplicate DB records |

1. Raw body parsing breaks signature checks

Stripe requires the exact raw request body to verify signatures. If Next.js parses JSON too early, verification can fail even when everything else is correct.

I confirm this by checking whether the handler uses `req.text()` in App Router or disables default body parsing in Pages Router where needed.

2. Production secret mismatch

This happens when a founder copies the test webhook secret into production or rotates secrets without redeploying. The result is a webhook that appears healthy from afar but never verifies correctly.

I confirm it by comparing the active Stripe endpoint secret with the deployed environment variable value in my hosting provider.

3. Endpoint points at a dead route

A lot of teams change domains during launch and forget that Stripe still posts to an old URL. That creates false confidence because delivery attempts continue to appear in Stripe history.

I confirm it by checking every active webhook endpoint inside Stripe and matching each one to a real deployed route.

4. Errors are swallowed after verification

This is one of the worst failure modes because it looks successful from Stripe's side if you return 200 too early. The event verifies, but then a database write fails and nobody knows.

I confirm it by wrapping each downstream action with explicit logging and by checking whether your code returns success before persistence completes.

5. Cloudflare or hosting rules interfere

If you added WAF rules, bot protection, redirects, or cache rules during launch, they can break POST requests or alter headers enough to affect delivery. This shows up as random failures that feel impossible until you inspect edge behavior.

I confirm it by temporarily bypassing caching and security transforms for webhook paths only.

6. No idempotency protection

Stripe retries events. If your code processes duplicates badly, you get double upgrades, wrong billing states, or conflicting records that later look like silent failure.

I confirm it by checking whether event IDs are stored before processing and rejected on repeat delivery.

The Fix Plan

First I would stop guessing and make the webhook path observable end to end. That means logging request arrival, signature verification result, event type, database action result, and final response code for every attempt.

Then I would fix only one layer at a time: 1. Verify routing. 2. Verify raw-body handling. 3. Verify secrets. 4. Verify persistence. 5. Verify response timing.

If this is a Next.js App Router project using Stripe webhooks, my preferred pattern is:

Put webhooks on a dedicated route like `/api/stripe/webhook`.
Read raw text from the request before parsing anything else.
Verify with `stripe.webhooks.constructEvent(...)`.
Process only known event types.
Store processed event IDs for idempotency.
Return `200` only after persistence succeeds.

A safe shape looks like this:

export async function POST(req: Request) {
  const sig = req.headers.get("stripe-signature");
  const rawBody = await req.text();

  try {
    const event = stripe.webhooks.constructEvent(
      rawBody,
      sig!,
      process.env.STRIPE_WEBHOOK_SECRET!
    );

    await handleStripeEvent(event); // must log + persist safely
    return new Response("ok", { status: 200 });
  } catch (err) {
    console.error("stripe_webhook_error", err);
    return new Response("bad request", { status: 400 });
  }
}

Next I would harden storage behavior:

Add a unique constraint on Stripe event ID.
Wrap subscription updates in a transaction where possible.
Record failed events separately so they can be replayed manually.
Never depend on frontend state to decide billing truth.

Then I would remove accidental blockers:

Exclude webhook routes from caching.
Disable auth middleware on server-to-server endpoints if it exists there accidentally.
Confirm Cloudflare does not rewrite headers or challenge POST requests on that path.
Make sure deploy previews do not share production secrets unless intentionally isolated.

Finally I would verify business logic order:

Payment event arrives first?
Subscription record exists?
User mapping resolves correctly?
Access flags update?
UI refreshes from server truth?

If any step depends on stale client state, I would move that decision server-side immediately.

Regression Tests Before Redeploy

Before I ship this fix again, I want proof that payments update subscriptions reliably under real conditions.

Acceptance criteria: 1. A successful test payment updates subscription status within 30 seconds. 2. A duplicate webhook delivery does not create duplicate records. 3. A bad signature returns `400` and does not write to the database. 4. A missing env var fails loudly at startup or health check time. 5. Production logs show each received event ID once with clear outcome text. 6. The dashboard reflects paid access without manual refresh errors more than once per flow.

QA checks I would run:

Test checkout completion flow end to end in test mode.
Replay one known event from Stripe dashboard twice to verify idempotency.
Send an invalid-signature request locally to confirm rejection path works.
Simulate database failure and confirm it logs clearly instead of pretending success.
Check mobile dashboard refresh behavior after payment on iPhone Safari and Chrome Android if this affects login gating.

Risk-based regression areas:

New subscriptions
Renewals
Failed payments
Cancellations
Plan upgrades/downgrades
Trial conversions

I would also check observability before release:

Alert if webhook error rate exceeds 1 percent over 15 minutes
Alert if no successful webhooks arrive for 30 minutes during active sales hours
Alert if p95 webhook processing exceeds 2 seconds

Prevention

The best prevention is boring infrastructure discipline.

I would put these guardrails in place:

Structured logging with event ID, type, environment, latency, and outcome
Unique constraint on processed Stripe events
Health checks for critical env vars at boot
Separate staging and production webhook endpoints
Route-level monitoring with alerting on failures
Code review checklist that includes raw-body handling and secret usage
Cloudflare rules that bypass caching on `/api/stripe/*`
Least privilege API keys with no unnecessary write access elsewhere

From a cyber security lens, I also care about these controls:

Validate all incoming data against expected event types only
Never trust client-reported subscription status
Keep secrets out of client bundles
Rotate leaked keys immediately
Log enough to debug without exposing PII or full payloads unnecessarily

From a UX angle, do not hide billing uncertainty behind vague copy like "processing". If access depends on payment confirmation longer than normal latency windows, tell users what is happening and give them a retry path after about 30 seconds rather than leaving them stuck wondering if they were charged twice.

From a performance angle, keep webhook processing fast:

Aim for p95 under 500 ms when possible
Push non-critical work into queues
Avoid heavy synchronous DB work inside the request path unless required for consistency

When to Use Launch Ready

Launch Ready fits when you have a working product but launch plumbing is making revenue unreliable. If webhooks are failing silently alongside domain issues, email setup gaps, SSL confusion, broken redirects, missing monitoring, or messy environment variables, I can usually clean that up faster than an internal team juggling feature work.

It includes DNS setup, redirects, subdomains, Cloudflare configuration, SSL handling, caching rules, DDoS protection, SPF/DKIM/DMARC, production deployment, environment variables, secrets, uptime monitoring, and a handover checklist so you are not left guessing after go-live.

What I need from you before I start: 1. Admin access to hosting platform 2. Admin access to Cloudflare or DNS provider 3. Admin access to Stripe developer settings 4. A list of current domains and subdomains 5. Production env vars currently used by Next.js 6. One short note describing what should happen after payment succeeds

If your current issue is "payments work sometimes but support cannot explain why," this sprint gives me enough room to fix launch plumbing properly instead of patching around it again next week.

Delivery Map

References

1. https://roadmap.sh/api-security-best-practices 2. https://roadmap.sh/cyber-security 3. https://roadmap.sh/qa 4. https://docs.stripe.com/webhooks 5. https://nextjs.org/docs/app/building-your-application/routing/route-handlers

---

Take the next step

If this is a problem in your product right now, here is what to do next:

[Use the free Cyprian tools](/tools) - estimate cost, score app risk, check launch readiness, or pick the right service sprint.

[Book a discovery call](/contact) - I will tell you honestly whether you need a sprint or if you can DIY the next step.

*Written by Cyprian Tinashe Aarons - senior full-stack and AI engineer helping founders rescue, launch, automate, and scale AI-built products.*

Next steps

Pillar page Tools

About the author

Cyprian Tinashe Aarons — Senior Full Stack & AI Engineer

Cyprian helps founders rescue, secure, deploy, and automate AI-built apps with production-grade engineering, launch systems, and AI integration.

Author bio