About Ganges.app
SelfHeal is an auto-remediation layer for SRE and platform teams. It turns recurring alerts into safe, pre-approved runbooks with guardrails, audit trails, and optional AI assistance while keeping control inside your infrastructure.
The day-to-day problems we target
Most incidents are familiar patterns: repeated alerts, repeated triage, and the same mitigation steps executed under pressure often at night or on weekends. SelfHeal is built to reduce toil without introducing new risk.
- Alert fatigue: high-volume pages and unclear signal-to-noise.
- Operational toil: repeat SSH checks, restarts, and cleanup across environments.
- Slow mitigation: human bottlenecks delay recovery even when the fix is known.
- Inconsistent execution: runbooks vary by person, shift, and tribal knowledge.
- Ticket sprawl: repetitive incidents become support queues and oealways-on work.
- Risk & compliance pressure: automation must be controlled, explainable, and auditable.
How SelfHeal works
SelfHeal is designed to be predictable and production-friendly.
- 1) Listen: ingest alerts/events from your monitoring systems.
- 2) Decide: apply guardrails, policies, and approvals.
- 3) Act + verify: run pre-approved runbooks and validate outcomes.
The result: consistent response, faster mitigation, and less on-call disruption.
Outcomes teams care about
- Reduce MTTR: execute verified mitigations quickly when conditions match.
- Reduce on-call load: fewer escalations during nights and weekends.
- Reduce ticket volume: handle common incidents before they become support work.
- Improve consistency: standardized runbooks with less shift-to-shift variance.
- Improve auditability: traceable actions and evidence for incident reviews.
What you can automate first (quick wins)
These are common oerepeat offenders teams automate early where verification is clear and risk is manageable.
- Service recovery: restart a failing service with pre-checks and post-checks.
- Disk pressure: safe cleanup paths, log rotation, reclaim space, verify recovery.
- Stuck processes: detect hangs, restart workers, validate health endpoints.
- Database health actions: verified checks, replica repair steps, controlled restarts.
- Kubernetes/app ops: restart deployments, scale components, clear crash loops safely.
- Configuration drift fixes: re-apply known-good config, verify service readiness.
Automation is only as good as verification SelfHeal is built around guardrails and post-check validation.
Safety and control by design
Automation must be safe to earn trust. SelfHeal ships with controls teams expect.
- Guardrails: allowlists, blocklists, scoped targets, and controlled execution.
- DRY-RUN mode: validate logic and outputs before enabling real actions.
- Audit trail: record what happened, why it happened, and what changed.
- Runs in your environment: no hidden agents or opaque control planes.
- Offline-friendly licensing: simple
license.jsonwithout callbacks.
AI assistance (kept under guardrails)
AI is used to accelerate diagnosis and reduce manual effort not to bypass controls. You decide when AI can suggest, and when any action is allowed to run.
- Faster triage: summarize alerts, highlight likely causes, reduce time spent digging.
- Better recommendations: propose next steps aligned to your approved runbooks.
- Less midnight work: reduce repetitive escalations that typically wake humans.
- Explainability: rationale and outcomes captured for incident reviews.
Safe rollout path: Observe → Recommend → DRY-RUN → Allowlisted execution.
Integration posture
SelfHeal integrates using alerts and webhooks, so you can start without changing your monitoring strategy.
- Monitoring inputs: any system that can send alerts/events (webhook).
- Common sources: Prometheus/Alertmanager, Datadog, OpsRamp, and similar tools.
- Ticketing/escalation: integrate with your existing workflows (PagerDuty, ServiceNow, etc.) via hooks.
- ChatOps: optional notifications and approvals via Slack/Teams-style webhook patterns.
The goal is compatibility: adopt incrementally without a platform rewrite.
Get started
Validate SelfHeal safely without disrupting production workflows.
Start Quickstart Get Trial or view Plans
Who we are
US: Ganges LLC (Colorado, USA)
We build for operators: clear controls, predictable behavior, and simple deployment. SelfHeal is designed to be a dependable remediation layer not another system to babysit.