How I Would Fix webhooks failing silently in a Cursor-built Next.js internal admin app Using Launch Ready.
The symptom is usually ugly in a very specific way: the admin UI says 'sent', the provider says 'delivered', but nothing changes in your app. In practice,...
How I Would Fix webhooks failing silently in a Cursor-built Next.js internal admin app Using Launch Ready
The symptom is usually ugly in a very specific way: the admin UI says "sent", the provider says "delivered", but nothing changes in your app. In practice, the most likely root cause is that the webhook endpoint is returning 200 too early, throwing inside an async handler, or failing behind a proxy and nobody is logging the failure.
The first thing I would inspect is the actual request path from provider to Next.js route. I want to see the raw incoming payload, the response status, and whether the event is being acknowledged before it is validated and processed.
Triage in the First Hour
I start with evidence, not guesses. Silent failures are usually a logging and observability problem first, then a code problem second.
1. Check the webhook provider dashboard.
- Look for delivery attempts, response codes, retries, and timestamps.
- Confirm whether the provider thinks delivery succeeded or failed.
2. Inspect server logs for the webhook route.
- Search for request entries, thrown errors, and any `console.log` that only appears in local dev.
- If there are no logs at all, suspect routing, deployment, or edge/runtime mismatch.
3. Check the deployed route file.
- Confirm the webhook endpoint exists in the deployed build.
- In Next.js App Router, verify `app/api/.../route.ts` matches what the provider is calling.
4. Review environment variables in production.
- Verify secrets used for signature validation, database writes, queue URLs, and API keys exist in prod.
- Confirm no secret was renamed during a Cursor-generated refactor.
5. Inspect Cloudflare or reverse proxy settings.
- Look for caching on POST routes, WAF blocks, body size limits, or SSL termination issues.
- Confirm webhook requests are reaching origin and not being challenged.
6. Check deployment health and recent builds.
- Review the last successful deploy time against when failures began.
- Compare preview vs production behavior if this only happens after release.
7. Open the database or job queue dashboard.
- Confirm whether records are created but downstream processing fails later.
- If jobs exist but never run, this is not a webhook issue anymore; it is a worker or queue issue.
8. Reproduce with one known event payload.
- Send a test webhook from the provider or replay a captured payload into staging.
- Compare response headers and body to production behavior.
## Quick local check for route behavior
curl -i https://your-domain.com/api/webhooks/provider \
-X POST \
-H "Content-Type: application/json" \
--data '{"type":"test","id":"evt_123"}'Root Causes
Here are the causes I see most often in Cursor-built Next.js apps.
| Likely cause | How it fails | How I confirm it | | --- | --- | --- | | Handler returns before async work finishes | Provider gets 200 OK but DB write never completes | Add structured logs before and after each await | | Signature verification broken by body parsing | Webhook rejected or silently skipped | Compare raw body handling against provider docs | | Wrong runtime or deployment target | Code works locally but not on Vercel/Cloudflare | Check route config and build output | | Missing prod secret or wrong env var name | Verification fails only in production | Compare env names across local and prod | | Caching/proxy interference | POST request blocked or altered | Inspect Cloudflare rules and origin logs | | Downstream DB/job failure | Webhook lands but side effects fail later | Check error logs, queue depth, dead letters |
1. Async work is not awaited
This is common in generated code. The route sends a success response before database writes or external API calls finish.
I confirm it by adding timestamps around each step and checking whether errors appear after the response has already been returned. If so, I move long-running work into a queue or at minimum await it before responding.
2. Raw body handling breaks signature checks
Many providers require signature verification against the exact raw request body. If Next.js parses JSON first, signature validation can fail even though everything looks valid.
I confirm this by comparing the provider's signing docs with our handler implementation. If raw body access is missing, that is usually the bug.
3. Production env vars do not match local env vars
Cursor often generates code that references `WEBHOOK_SECRET`, while production uses `STRIPE_WEBHOOK_SECRET` or another name entirely. That gives you clean builds and broken runtime behavior.
I confirm by listing every variable used by that route and checking them one by one in production settings. Missing secrets are one of the fastest ways to create silent failure.
4. Cloudflare or hosting rules interfere
If Cloudflare sits in front of your app, it can cache incorrectly configured routes, block requests with WAF rules, or challenge automated traffic from providers. Webhooks should never be treated like normal browser traffic.
I confirm this by checking firewall events, page rules, caching headers, and origin access logs. A webhook endpoint should be excluded from caching and bot challenges.
5. The code catches errors and hides them
A lot of AI-generated code uses broad `try/catch` blocks that swallow errors and still return 200 OK. That creates false confidence because monitoring sees success while business logic fails quietly.
I confirm this by searching for empty catches, generic error messages without stack traces, and any handler that always returns success at the end regardless of earlier failures.
6. The event is processed twice or out of order
Some apps treat duplicate deliveries as unique events because idempotency was never added. Others process dependent events out of order and then fail later without obvious symptoms.
I confirm this by checking event IDs in storage and comparing them to provider retry behavior. If duplicates exist without dedupe keys, I fix that before anything else.
The Fix Plan
My rule here is simple: fix observability first so we stop guessing, then fix correctness so we stop losing events.
1. Add structured logging to the webhook route.
- Log request ID, event ID, event type, timestamp, verification result, DB write result, and final status.
- Do not log full secrets or full payloads if they contain customer data.
2. Verify signatures against raw request bodies.
- Read raw bytes exactly as required by your provider.
- Only parse JSON after signature verification passes.
3. Make processing explicit and idempotent.
- Store incoming event IDs before side effects run.
- Reject duplicates cleanly with a known status path.
4. Stop returning success too early.
- If processing must happen inline for now, await all critical steps before responding.
- If processing may take time or fail intermittently, enqueue it and return only after durable acceptance.
5. Separate acceptance from execution if needed.
- For internal admin apps especially, I prefer: receive -> validate -> persist -> queue -> respond.
- That reduces user-facing delay and prevents long-running requests from timing out under load.
6. Harden API security controls around the route.
- Restrict allowed methods to POST only.
- Validate content type and payload shape.
- Add rate limiting where appropriate without blocking legitimate provider retries.
- Keep secrets server-side only; never expose them to client bundles.
7. Fix deployment config if runtime mismatch exists.
- Ensure Node runtime vs Edge runtime matches what your code needs.
- Confirm build output includes all route files after deployment.
8. Add alerting on failure paths immediately after patching.
- Alert on non-2xx responses from your webhook endpoint.
- Alert on zero processed events over a defined window if traffic normally exists.
A safe pattern looks like this:
export async function POST(req: Request) {
const rawBody = await req.text();
const signature = req.headers.get("x-webhook-signature");
// verify signature using rawBody here
// if invalid: return new Response("Invalid signature", { status: 401 });
const event = JSON.parse(rawBody);
// persist event ID first for idempotency
// await saveEvent(event.id);
// process critical work
// await handleEvent(event);
return new Response(JSON.stringify({ ok: true }), { status: 200 });
}Regression Tests Before Redeploy
I do not ship this kind of fix without tests that prove both security and business behavior.
- Signature validation test
- Valid signed payload returns 200.
- Invalid signature returns 401 or equivalent rejection status.
- Idempotency test
- Same event ID sent twice only creates one record or one side effect.
- Failure-path test
- Database down or queue unavailable returns a controlled failure response and logs an error clearly enough to alert on-call staff.
- Payload shape test
- Missing required fields return a non-2xx response with no partial write.
- Production parity test
- Staging uses same runtime mode as production where possible.
- Webhook URL matches final domain behind Cloudflare/SSL exactly as deployed.
- Observability test
- Each webhook attempt produces one traceable log entry with request ID and outcome.
- A failed delivery can be found in under 2 minutes by someone who did not write the code.
Acceptance criteria I would use:
- No silent failures across a sample of at least 20 replayed events.
- p95 webhook handling time under 500 ms for validation plus enqueueing paths.
- Zero duplicate side effects for repeated delivery of the same event ID during testing.
- Clear alert triggered within 5 minutes if failures spike above threshold.
Prevention
If you want this to stay fixed after launch week chaos dies down, I would add guardrails at four levels: code review, monitoring, security checks, and UX feedback loops for admins who rely on these events indirectly.
- Code review guardrails
- Never approve webhook handlers without idempotency checks and explicit error handling.
- Reject empty catches and "always return success" patterns unless there is documented reasoned behavior behind them.
- Monitoring guardrails
- Alert on non-2xx responses above a small threshold like 3 failures in 10 minutes.
- Track processed count vs received count so silent drops show up fast.
- Security guardrails
- Keep least privilege on database credentials used by webhook workers.
- Rotate secrets after any suspected exposure during AI-assisted development cycles.
\n- UX guardrails \n \n \n \n \n Oops?
Delivery Map
References
- [roadmap.sh - API security](https://roadmap.sh/api-security-best-practices)
- [OWASP API Security Top 10](https://owasp.org/www-project-api-security/)
- [MDN Web Docs - HTTP](https://developer.mozilla.org/en-US/docs/Web/HTTP)
- [Cloudflare DNS documentation](https://developers.cloudflare.com/dns/)
- [Sentry documentation](https://docs.sentry.io/)
---
Take the next step
If this is a problem in your product right now, here is what to do next:
- [Use the free Cyprian tools](/tools) - estimate cost, score app risk, check launch readiness, or pick the right service sprint.
- [Book a discovery call](/contact) - I will tell you honestly whether you need a sprint or if you can DIY the next step.
*Written by Cyprian Tinashe Aarons - senior full-stack and AI engineer helping founders rescue, launch, automate, and scale AI-built products.*
Cyprian Tinashe Aarons — Senior Full Stack & AI Engineer
Cyprian helps founders rescue, secure, deploy, and automate AI-built apps with production-grade engineering, launch systems, and AI integration.