Overview
Effective monitoring is essential for maintaining healthy virtual clusters. This guide covers monitoring strategies, metrics collection, and observability best practices for vCluster deployments.Monitoring Architecture
vCluster monitoring operates at multiple levels:Control Plane
Monitor vCluster control plane pods, API server health, and syncer performance
Virtual Resources
Track resources running inside the virtual cluster
Host Resources
Monitor synced resources in the host cluster namespace
Quick Health Checks
vCluster Control Plane Status
Check if vCluster control plane is running:Virtual Cluster Connectivity
Test connection to the virtual cluster:Resource Syncing Status
Verify resources are syncing correctly:Prometheus Integration
Enabling Metrics
Configure vCluster to expose Prometheus metrics:ServiceMonitor for Prometheus Operator
If using Prometheus Operator, create a ServiceMonitor:Key Metrics to Monitor
Control Plane Metrics
API Server Metrics
API Server Metrics
Important metrics:
apiserver_request_total- Total API requestsapiserver_request_duration_seconds- Request latencyapiserver_request_errors_total- Failed requestsapiserver_storage_objects- Number of stored objects
Syncer Metrics
Syncer Metrics
Important metrics:
vcluster_syncer_sync_operations_total- Sync operations countvcluster_syncer_sync_errors_total- Sync failuresvcluster_syncer_sync_duration_seconds- Sync operation durationvcluster_syncer_resources_synced- Number of synced resources
Resource Usage
Resource Usage
Important metrics:
container_memory_usage_bytes- Memory usagecontainer_cpu_usage_seconds_total- CPU usagecontainer_network_receive_bytes_total- Network ingresscontainer_network_transmit_bytes_total- Network egress
Grafana Dashboards
Pre-built Dashboard
Import the vCluster Grafana dashboard:- Open Grafana
- Go to Dashboards → Import
- Use dashboard ID:
15843(community vCluster dashboard) - Or import from JSON:
Custom Dashboard Panels
- Health Status
- Resource Sync
- Performance
- Errors
Logging
Centralized Log Collection
Using Fluentd/Fluent Bit
Using Loki
Viewing vCluster Logs
Log Levels and Debugging
Increase log verbosity for troubleshooting:Alerting
Prometheus AlertManager Rules
Distributed Tracing
OpenTelemetry Integration
Configure vCluster to export traces:Jaeger Configuration
Health Checks and Probes
Custom Liveness Probe
External Monitoring Script
Performance Monitoring
Resource Usage Over Time
Network Monitoring
Debugging Tools
Debug Collect Command
Generate comprehensive debug bundle:- vCluster release info
- Pod logs (current and previous)
- Host cluster info and resources
- Virtual cluster info and resources
- Resource counts
Debug Shell
Direct shell access to vCluster pod:Best Practices
Multi-Level Monitoring
Monitor at all levels: control plane, virtual resources, and host resources.
Set Meaningful Alerts
Configure alerts for actionable issues, not just symptoms. Avoid alert fatigue.
Retain Historical Data
Keep metrics and logs for at least 30 days for trend analysis and troubleshooting.
Dashboard for Each Environment
Create dedicated dashboards for production, staging, and development clusters.
Regular Reviews
Schedule weekly reviews of monitoring data to identify trends and issues early.
Document Baselines
Establish and document normal performance baselines for comparison.
Next Steps
Troubleshooting
Learn how to diagnose and fix issues detected by monitoring
Managing vClusters
Return to general management operations