CalypsoAI Monitoring & Metrics

For On-Prem deployments, customers are responsible for their own monitoring of the deployed solution. In terms of monitoring and metrics, CalypsoAI exposes a wide set of internal metrics that you can leverage for observability and troubleshooting:

1. Prometheus-Compatible Metrics for cai-scanner

Metrics are available at the /metrics endpoint.
Common tools like Prometheus or Dynatrace can scrape this data.
Example metrics include:
- Number of running/waiting requests (vllm:num_requests_running, vllm:num_requests_waiting)
- GPU cache metrics and memory usage
- Python garbage collection and memory stats
- CPU and process metrics (process_cpu_seconds_total, process_resident_memory_bytes)

2. GPU Monitoring

CalypsoAI recommends deploying the Nvidia DCGM Exporter as a DaemonSet for GPU telemetry.
This allows collection of detailed GPU performance data.

3. Moderator Component Telemetry

Similar Prometheus-compatible metrics are available via the cai-moderator service.
These include:
- Thread and DB connection availability
- Worker processing times
- General Python and process metrics

4. Dash-boarding and Visualization

CalypsoAI can provide a pre-built Grafana dashboard (e.g., vllm-dashboard.json) to visualize these metrics effectively.
Prometheus and Grafana can be set up using Helm charts, and we can guide customers through this process as needed.

5. Legacy Metrics (For Reference)

The legacy /backend/v1/app/metrics endpoint is deprecated for modern deployments using custom scanners but may still report some workerStats.

6. Automation and Response

While CalypsoAI does not currently offer a fully out-of-the-box automated remediation system for incidents, the telemetry infrastructure provided is designed to integrate with existing incident response tooling and workflows.
Customers are encouraged to use alerting features in Grafana, Dynatrace, or other tools to automate actions as needed.