fixes / launch-ready

How I Would Fix webhooks failing silently in a Supabase and Edge Functions internal admin app Using Launch Ready.

The symptom is usually this: the admin UI says 'sent', the downstream system never updates, and nobody notices until a customer complains or a report is...

How I Would Fix webhooks failing silently in a Supabase and Edge Functions internal admin app Using Launch Ready

The symptom is usually this: the admin UI says "sent", the downstream system never updates, and nobody notices until a customer complains or a report is wrong. In a Supabase and Edge Functions setup, the most likely root cause is not "the webhook provider is down", it is bad observability plus a missing error path, so the function fails, retries do not happen, and the app still shows success.

The first thing I would inspect is the full request path from the admin action to the Edge Function log line to the outbound response from the webhook target. I want to know whether the event was queued, whether the function actually ran, whether auth or secrets failed, and whether the external endpoint returned a 4xx or 5xx that your code swallowed.

Triage in the First Hour

1. Check Supabase Edge Function logs for the exact request timestamp. 2. Confirm whether the function was invoked at all, not just deployed. 3. Inspect the admin app network tab for request status, response body, and timeout. 4. Verify environment variables in Supabase project settings and local `.env`. 5. Check recent deploys for changes to webhook payload shape or secret names. 6. Review database rows for an event record, delivery status, retry count, and error field. 7. Open Cloudflare or platform logs if you route traffic through a proxy or custom domain. 8. Test one known-good webhook manually with a safe payload in staging. 9. Check whether CORS or auth middleware is blocking the request before it reaches the function. 10. Confirm whether any alerts exist for function errors, latency spikes, or 5xx rates.

If there is no log entry anywhere, I treat that as an instrumentation failure first, not a delivery failure.

supabase functions logs webhook-handler --project-ref YOUR_REF

Use that while reproducing one failed event from the admin app. If you cannot match a UI action to a log line within 60 seconds, your current setup is blind.

Root Causes

| Likely cause | What it looks like | How I confirm it | |---|---|---| | Function throws but caller ignores response | UI says success even when outbound call fails | Inspect response handling in frontend and function logs | | Missing or wrong secret | Requests fail only in prod after deploy | Compare Supabase secrets with local env names | | Payload mismatch | Target rejects data with 400/422 | Log sanitized payload schema and target response body | | No retry or dead-letter path | One transient error causes permanent loss | Check database for retry fields and failed-event storage | | Auth or CORS issue on internal admin call | Browser blocks request before function runs | Use network tab and server logs together | | Timeout from slow downstream API | Works sometimes, fails under load | Measure p95 duration and compare with platform timeout |

For cyber security reasons, I also check whether failures are being hidden by overly broad catch blocks. Silent failure often means security-sensitive errors are being suppressed too early, which creates both data loss and false confidence.

The Fix Plan

My rule here is simple: stop losing events first, then improve reliability second. I would not start by rewriting everything.

1. Add explicit delivery state to your database.

  • Store `queued`, `sending`, `delivered`, `failed`, `retrying`.
  • Save `last_error`, `attempt_count`, `next_retry_at`, and `provider_status_code`.

2. Make every webhook attempt idempotent.

  • Generate an event ID once.
  • Send that ID in headers or payload.
  • Reject duplicate processing on the receiver side where possible.

3. Stop swallowing errors in Edge Functions.

  • Return non-2xx responses when delivery fails.
  • Log only safe metadata: event ID, status code, duration, error class.

4. Separate queueing from delivery.

  • The admin action should create an event record first.
  • A worker or scheduled function should deliver it afterward.
  • This prevents UI latency from hiding outbound failures.

5. Add retries with backoff.

  • Retry transient failures like 429s and 5xx responses.
  • Cap attempts at 3 to 5 tries over 15 to 60 minutes.
  • Move hard failures into a visible failed state.

6. Validate payloads before sending.

  • Use schema validation at the edge function boundary.
  • Reject missing required fields early with clear logs.
  • Do not send partial objects that trigger downstream rejection.

7. Tighten secret handling.

  • Keep webhook signing secrets only in production env vars.
  • Rotate any exposed secret immediately if logs or client code leaked it.
  • Never return secrets in debug output.

8. Add monitoring around delivery outcomes.

  • Alert on failure rate above 2 percent over 15 minutes.
  • Alert on zero deliveries during business hours if events are expected.
  • Track p95 delivery time and error counts per endpoint.

If you want one safe architecture change that gives fast value: write every webhook event to Supabase first, then deliver asynchronously from Edge Functions with retries and visible status. That removes silent loss from your critical path.

Regression Tests Before Redeploy

I would not ship this fix without proving three things: events are recorded, failures are visible, and retries work without duplication.

Acceptance criteria:

  • A failed outbound call creates a persisted `failed` record within 1 minute.
  • The admin UI shows "queued", "sent", or "failed" instead of generic success text.
  • A valid webhook reaches the target endpoint exactly once per unique event ID.
  • A temporary 500 from the target triggers retry logic within the configured window.
  • A missing secret causes a clear deployment-time failure, not silent runtime loss.

QA checks: 1. Trigger one happy-path event in staging and verify delivery end-to-end. 2. Trigger one forced 500 response from a test endpoint and confirm retry behavior. 3. Trigger one invalid payload and confirm validation blocks it before send. 4. Refresh the page during submission to ensure no duplicate event creation occurs. 5. Re-run after deploy to confirm environment variables still resolve correctly.

I would also run exploratory tests around edge cases:

  • Network timeout after request send but before response receive
  • Duplicate clicks on submit
  • Partial outage of downstream service
  • Expired token or rotated secret
  • High-volume burst of 20 to 50 events

Prevention

This problem comes back when teams optimize for shipping speed over visibility. For an internal admin app, that is expensive because broken automation creates manual work quietly until support load spikes.

Guardrails I would put in place:

  • Code review checklist:
  • No swallowed exceptions
  • No success message without verified persistence
  • No direct secret usage in client code
  • No outbound call without timeout handling
  • Security checks:
  • Least privilege on Supabase service roles
  • Signed webhook verification where applicable
  • Secret rotation process documented
  • Rate limiting on public endpoints
  • Monitoring:
  • Delivery success rate dashboard
  • Error budget alerts
  • Function duration tracking with p95 under your timeout ceiling by at least 30 percent headroom
  • Uptime monitoring on custom domains and key endpoints
  • UX improvements:
  • Show pending state while delivery is queued
  • Show failure reason in plain English for admins
  • Provide "retry now" only after validating permissions
  • Performance guardrails:
  • Keep Edge Function execution under about 300 ms for queue writes
  • Avoid blocking UI on external API calls
  • Cache non-sensitive config where appropriate

Here is how I would think about flow:

That flow keeps failures visible instead of burying them inside one browser request.

When to Use Launch Ready

Launch Ready fits when you have a working internal admin app but delivery risk is costing time, trust, or operations capacity. If your webhooks are failing silently today, I would use this sprint to get domain setup, email deliverability basics, Cloudflare protection, SSL, deployment hygiene, secrets management, monitoring, and handover done in one controlled pass.

  • DNS setup and redirects
  • Subdomains wired correctly
  • Cloudflare config for caching and DDoS protection
  • SSL checked end-to-end
  • Production deployment verified
  • SPF/DKIM/DMARC set up if email is part of your workflow
  • Environment variables and secrets audited
  • Uptime monitoring added
  • Handover checklist so your team can maintain it

What you should prepare before booking: 1. Supabase project access with owner-level permission if possible. 2. Current Edge Function source code and deployment history. 3. List of webhook endpoints and expected payloads. 4. Any recent error screenshots or support tickets. 5. Staging credentials or test accounts for safe reproduction.

If you want me to fix this cleanly instead of guessing at it live under pressure: https://cal.com/cyprian-aarons/discovery

References

1. roadmap.sh API Security Best Practices: https://roadmap.sh/api-security-best-practices 2. roadmap.sh Cyber Security: https://roadmap.sh/cyber-security 3. roadmap.sh QA: https://roadmap.sh/qa 4. Supabase Edge Functions docs: https://supabase.com/docs/guides/functions 5. Supabase Logs docs: https://supabase.com/docs/guides/platform/logs

---

Take the next step

If this is a problem in your product right now, here is what to do next:

  • [Use the free Cyprian tools](/tools) - estimate cost, score app risk, check launch readiness, or pick the right service sprint.
  • [Book a discovery call](/contact) - I will tell you honestly whether you need a sprint or if you can DIY the next step.

*Written by Cyprian Tinashe Aarons - senior full-stack and AI engineer helping founders rescue, launch, automate, and scale AI-built products.*

Next steps
About the author

Cyprian Tinashe AaronsSenior Full Stack & AI Engineer

Cyprian helps founders rescue, secure, deploy, and automate AI-built apps with production-grade engineering, launch systems, and AI integration.