Monitoring

Siovos deploys a monitoring stack to help you track the health and performance of your infrastructure.

Overview#

The monitoring stack consists of two services:

Prometheus

Collects and stores metrics from your services. Automatically scrapes Kubernetes components, pods, and any service exposing a /metrics endpoint.

Grafana

Visualizes metrics through dashboards. Comes pre-configured with dashboards for Kubernetes, nodes, and core services.

Accessing Grafana#

After deployment, Grafana is available at:

URL: https://grafana.{suffix}
User: admin
Password: Set during deployment (check Siovos Desktop)

Remember to connect to the VPN and install the root certificate before accessing Grafana.

Pre-installed Dashboards#

Grafana comes with several dashboards ready to use:

Dashboard	Description
Kubernetes / Cluster	Overall cluster health and resource usage
Kubernetes / Nodes	Per-node CPU, memory, disk, network
Kubernetes / Pods	Pod-level metrics and resource consumption
Node Exporter	Detailed host system metrics

To access dashboards: Dashboards → Browse → Select a folder.

Adding Custom Dashboards#

You can import additional dashboards from Grafana's dashboard library:

Find a dashboard ID on grafana.com (e.g., 1860 for Node Exporter Full)
In Grafana: Dashboards → Import
Enter the dashboard ID
Select your Prometheus data source
Click Import

Monitoring Your Applications#

To expose metrics from your own applications:

Add a /metrics endpoint to your app (using Prometheus client libraries)
Create a ServiceMonitor resource to tell Prometheus to scrape it

Example ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  namespace: my-namespace
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

The release: prometheus label is required for Prometheus to discover your ServiceMonitor.

Alerting#

Prometheus includes Alertmanager for sending notifications when things go wrong. By default, alerts are configured but notification channels (email, Slack, etc.) need to be set up manually.

To configure alert notifications:

Access Alertmanager at https://alertmanager.{suffix}
Or edit the Alertmanager configuration via Rancher

Accessing Prometheus Directly#

For advanced queries, you can access Prometheus directly:

URL: https://prometheus.{suffix}

Use the Graph tab to run PromQL queries. For example:

up - Shows which targets are being scraped
container_memory_usage_bytes - Container memory usage
rate(container_cpu_usage_seconds_total[5m]) - CPU usage rate

Data Retention#

By default, Prometheus retains metrics for 15 days. This is configurable but requires modifying the Prometheus deployment directly.

Next Steps#

Architecture Overview - Understand how monitoring fits in
Troubleshooting - Common monitoring issues