About Ganges.app

SelfHeal is an auto-remediation layer for SRE and platform teams. It turns recurring alerts into safe, pre-approved runbooks with guardrails, audit trails, and optional AI assistance while keeping control inside your infrastructure.

The day-to-day problems we target

Most incidents are familiar patterns: repeated alerts, repeated triage, and the same mitigation steps executed under pressure often at night or on weekends. SelfHeal is built to reduce toil without introducing new risk.

How SelfHeal works

SelfHeal is designed to be predictable and production-friendly.

  • 1) Listen: ingest alerts/events from your monitoring systems.
  • 2) Decide: apply guardrails, policies, and approvals.
  • 3) Act + verify: run pre-approved runbooks and validate outcomes.

The result: consistent response, faster mitigation, and less on-call disruption.

Outcomes teams care about

  • Reduce MTTR: execute verified mitigations quickly when conditions match.
  • Reduce on-call load: fewer escalations during nights and weekends.
  • Reduce ticket volume: handle common incidents before they become support work.
  • Improve consistency: standardized runbooks with less shift-to-shift variance.
  • Improve auditability: traceable actions and evidence for incident reviews.

What you can automate first (quick wins)

These are common oerepeat offenders teams automate early where verification is clear and risk is manageable.

  • Service recovery: restart a failing service with pre-checks and post-checks.
  • Disk pressure: safe cleanup paths, log rotation, reclaim space, verify recovery.
  • Stuck processes: detect hangs, restart workers, validate health endpoints.
  • Database health actions: verified checks, replica repair steps, controlled restarts.
  • Kubernetes/app ops: restart deployments, scale components, clear crash loops safely.
  • Configuration drift fixes: re-apply known-good config, verify service readiness.

Automation is only as good as verification SelfHeal is built around guardrails and post-check validation.

Safety and control by design

Automation must be safe to earn trust. SelfHeal ships with controls teams expect.

  • Guardrails: allowlists, blocklists, scoped targets, and controlled execution.
  • DRY-RUN mode: validate logic and outputs before enabling real actions.
  • Audit trail: record what happened, why it happened, and what changed.
  • Runs in your environment: no hidden agents or opaque control planes.
  • Offline-friendly licensing: simple license.json without callbacks.

AI assistance (kept under guardrails)

AI is used to accelerate diagnosis and reduce manual effort not to bypass controls. You decide when AI can suggest, and when any action is allowed to run.

  • Faster triage: summarize alerts, highlight likely causes, reduce time spent digging.
  • Better recommendations: propose next steps aligned to your approved runbooks.
  • Less midnight work: reduce repetitive escalations that typically wake humans.
  • Explainability: rationale and outcomes captured for incident reviews.

Safe rollout path: ObserveRecommendDRY-RUNAllowlisted execution.

Integration posture

SelfHeal integrates using alerts and webhooks, so you can start without changing your monitoring strategy.

  • Monitoring inputs: any system that can send alerts/events (webhook).
  • Common sources: Prometheus/Alertmanager, Datadog, OpsRamp, and similar tools.
  • Ticketing/escalation: integrate with your existing workflows (PagerDuty, ServiceNow, etc.) via hooks.
  • ChatOps: optional notifications and approvals via Slack/Teams-style webhook patterns.

The goal is compatibility: adopt incrementally without a platform rewrite.

Get started

Validate SelfHeal safely without disrupting production workflows.

Start Quickstart Get Trial or view Plans

Who we are

US: Ganges LLC (Colorado, USA)

We build for operators: clear controls, predictable behavior, and simple deployment. SelfHeal is designed to be a dependable remediation layer not another system to babysit.