Dedicated Nodes Architecture

Dedicated nodes architecture provides compute isolation by assigning specific host cluster nodes to each virtual cluster. Virtual cluster workloads run only on their dedicated nodes, providing stronger isolation than shared nodes while still leveraging the host cluster’s infrastructure.

How It Works

In dedicated nodes mode, the vCluster control plane runs in a host namespace, but workloads are scheduled only on labeled host nodes assigned to that virtual cluster:

┌─────────────────────────── Host Cluster ────────────────────────────┐
│                                                                        │
│   ┌──────────────────────────────────────────────────────┐     │
│   │ Namespace: vcluster-tenant-a                     │     │
│   │  ┌──────────────────────────────────────────┐ │     │
│   │  │ vCluster Control Plane (tenant-a)     │ │     │
│   │  └──────────────────────────────────────────┘ │     │
│   │  ┌──────────────────────────────────────────┐ │     │
│   │  │ Workload Pods (from tenant-a)        │ │     │
│   │  └──────────────────────────────────────────┘ │     │
│   └──────────────────────────────────────────────────────┘     │
│                                                                        │
│   ┌──────────────────────────────────────────────────────┐     │
│   │ Dedicated Nodes for tenant-a                    │     │
│   │  - node-tenant-a-1 (label: tenant=tenant-a)      │     │
│   │    └─> Runs ONLY tenant-a workloads                │     │
│   │  - node-tenant-a-2 (label: tenant=tenant-a)      │     │
│   │    └─> Runs ONLY tenant-a workloads                │     │
│   └──────────────────────────────────────────────────────┘     │
│                                                                        │
│   ┌──────────────────────────────────────────────────────┐     │
│   │ Dedicated Nodes for tenant-b                    │     │
│   │  - node-tenant-b-1 (label: tenant=tenant-b)      │     │
│   │    └─> Runs ONLY tenant-b workloads                │     │
│   └──────────────────────────────────────────────────────┘     │
│                                                                        │
│   ┌──────────────────────────────────────────────────────┐     │
│   │ Shared Nodes (for host workloads)              │     │
│   │  - node-1, node-2, node-3                        │     │
│   └──────────────────────────────────────────────────────┘     │
│                                                                        │
└──────────────────────────────────────────────────────────────────────┘

Key Characteristics

Compute Isolation: Each vCluster’s workloads run only on assigned nodes

Real Node Visibility: Host nodes are synced into the virtual cluster

Shared Infrastructure: Still uses host CNI, CSI, and control plane

Node Selectors: Automatic node selector and toleration injection

Better Isolation: Physical separation of workload execution

Configuration

Basic Setup

The core configuration enables node syncing with label selectors:

dedicated-nodes.yaml

# Enable syncing of real nodes from host cluster
sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          tenant: my-tenant  # Only sync nodes with this label

# Automatically inject node selector and tolerations
sync:
  toHost:
    pods:
      enforceTolerations:
        - key: "tenant"
          operator: "Equal"
          value: "my-tenant"
          effect: "NoSchedule"

Prepare Host Nodes

Before creating the vCluster, label and taint the dedicated nodes:

# Label nodes for the tenant
kubectl label nodes node-1 node-2 tenant=my-tenant

# Taint nodes to prevent other workloads
kubectl taint nodes node-1 node-2 tenant=my-tenant:NoSchedule

# Verify
kubectl get nodes -l tenant=my-tenant

Create Dedicated Nodes vCluster

CLI with values
Helm
Complete values.yaml

vcluster create my-tenant \
  --namespace vcluster-my-tenant \
  --values dedicated-nodes.yaml

helm install my-tenant vcluster \
  --repo https://charts.loft.sh \
  --namespace vcluster-my-tenant \
  --create-namespace \
  --values dedicated-nodes.yaml

# Complete dedicated nodes configuration
sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          tenant: my-tenant
      # Sync back changes (labels/taints) from virtual to host
      syncBackChanges: false
      # Hide image status for security
      clearImageStatus: false
  
  toHost:
    pods:
      enabled: true
      # Enforce tolerations on all pods
      enforceTolerations:
        - key: "tenant"
          operator: "Equal"
          value: "my-tenant"
          effect: "NoSchedule"

# Resource limits for the control plane
controlPlane:
  statefulSet:
    resources:
      requests:
        cpu: 200m
        memory: 512Mi
      limits:
        memory: 2Gi

# Optional: Resource quotas for the virtual cluster
policies:
  resourceQuota:
    enabled: true
    quota:
      requests.cpu: 32
      requests.memory: 64Gi
      limits.cpu: 64
      limits.memory: 128Gi

Verify Node Isolation

Check that nodes are correctly synced:

# Connect to virtual cluster
vcluster connect my-tenant --namespace vcluster-my-tenant

# List nodes (should only show labeled nodes)
kubectl get nodes
# NAME     STATUS   ROLES    AGE   VERSION
# node-1   Ready    <none>   5d    v1.28.0
# node-2   Ready    <none>   5d    v1.28.0

# Create a test pod
kubectl run test --image=nginx

# Verify it runs on dedicated nodes
kubectl get pod test -o wide
# NAME   READY   STATUS    NODE
# test   1/1     Running   node-1

Use Cases

Production Multi-Tenancy

Perfect for: SaaS platforms, managed Kubernetes offerings, enterprise multi-tenancy

production-tenant.yaml

sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          tenant: "{{.Values.tenantId}}"
          tier: production
  toHost:
    pods:
      enforceTolerations:
        - key: tenant
          operator: Equal
          value: "{{.Values.tenantId}}"
          effect: NoSchedule
        - key: tier
          operator: Equal
          value: production
          effect: NoSchedule

policies:
  resourceQuota:
    enabled: true
  limitRange:
    enabled: true
  networkPolicy:
    enabled: true

Benefits:

Strong compute isolation
Predictable performance
Clear resource attribution
Compliance-friendly

GPU Workloads

Perfect for: AI/ML platforms, rendering farms, data processing

gpu-tenant.yaml

sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          gpu: "true"
          tenant: ai-team
  toHost:
    pods:
      enforceTolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
        - key: tenant
          operator: Equal
          value: ai-team
          effect: NoSchedule

# Allow GPU resource requests
rbac:
  role:
    extraRules:
      - apiGroups: [""]
        resources: ["nodes"]
        verbs: ["get", "list"]

Setup GPU Nodes:

# Label GPU nodes
kubectl label nodes gpu-node-1 gpu-node-2 \
  gpu=true \
  tenant=ai-team

# Taint GPU nodes
kubectl taint nodes gpu-node-1 gpu-node-2 \
  nvidia.com/gpu=present:NoSchedule \
  tenant=ai-team:NoSchedule

Use in Virtual Cluster:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
spec:
  containers:
    - name: cuda
      image: nvidia/cuda:12.0-base
      resources:
        limits:
          nvidia.com/gpu: 1  # Request GPU

Tiered Service Levels

Perfect for: Offering different performance tiers to customers

Premium Tier
Standard Tier
Economy Tier

premium-tier.yaml

sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          tier: premium
          tenant: customer-123

# High resource limits
policies:
  resourceQuota:
    quota:
      requests.cpu: 64
      requests.memory: 256Gi

Node Characteristics:

Latest generation instances
NVMe SSDs
Higher network bandwidth

standard-tier.yaml

sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          tier: standard
          tenant: customer-456

policies:
  resourceQuota:
    quota:
      requests.cpu: 32
      requests.memory: 128Gi

Node Characteristics:

Current generation instances
Standard SSDs
Standard networking

economy-tier.yaml

sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          tier: economy
          tenant: customer-789

policies:
  resourceQuota:
    quota:
      requests.cpu: 16
      requests.memory: 64Gi

Node Characteristics:

Spot/preemptible instances
Standard disks
Shared networking

Compliance and Data Residency

Perfect for: Regulated industries, data sovereignty requirements

compliance-tenant.yaml

sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          compliance: pci-dss
          region: us-west-2
          zone: us-west-2a
  toHost:
    pods:
      enforceTolerations:
        - key: compliance
          operator: Equal
          value: pci-dss
          effect: NoExecute

policies:
  networkPolicy:
    enabled: true
    workload:
      publicEgress:
        enabled: false  # Block internet egress

Advanced Configuration

Multi-Zone Node Selection

Select nodes from specific availability zones:

sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          tenant: my-tenant
          topology.kubernetes.io/zone: us-east-1a

Dynamic Node Selection

Use expressions for complex node selection:

sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          tenant: my-tenant
        # These nodes must also have high-memory label
        matchExpressions:
          - key: node.kubernetes.io/instance-type
            operator: In
            values:
              - m5.4xlarge
              - m5.8xlarge
              - r5.4xlarge

Sync Node Changes Back

Allow users in virtual cluster to label/taint their nodes:

sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          tenant: my-tenant
      syncBackChanges: true  # Sync labels/taints back to host

Security Risk: Enabling syncBackChanges allows virtual cluster users to modify host nodes. Only enable this for trusted tenants.

Hide Node Images

Prevent users from seeing what images are cached on nodes:

sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          tenant: my-tenant
      clearImageStatus: true  # Remove image information

Networking

Pod Networking

Pods still use the host cluster’s CNI:

# Virtual cluster pod
kubectl get pod nginx -o wide
# NAME    IP           NODE
# nginx   10.244.2.5   node-1

# Uses host CNI (e.g., Calico, Cilium, etc.)

Network Policies

Network policies can isolate tenant traffic:

policies:
  networkPolicy:
    enabled: true
    workload:
      # Block cross-tenant communication
      egress:
        - to:
            - podSelector: {}  # Same vCluster only
      ingress:
        - from:
            - podSelector: {}  # Same vCluster only

Service Mesh Integration

Dedicated nodes work well with service meshes:

# Example: Istio integration
integrations:
  istio:
    enabled: true
    sync:
      toHost:
        virtualServices:
          enabled: true
        destinationRules:
          enabled: true

# Automatic sidecar injection
sync:
  toHost:
    pods:
      enabled: true
      # Preserve Istio annotations
      translateAnnotations:
        - sidecar.istio.io/*

Storage

Using Host Storage Classes

Storage classes are synced from the host:

sync:
  fromHost:
    storageClasses:
      enabled: true  # Sync all host storage classes

Dedicated Storage

You can provision storage on dedicated nodes:

Local Path Provisioner
Node-Specific PVs

# Use local storage on dedicated nodes
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: local-data
spec:
  storageClassName: local-path
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

The volume will be created on one of the dedicated nodes.

# Provision PV on specific node
apiVersion: v1
kind: PersistentVolume
metadata:
  name: data-on-node-1
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /mnt/data
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - node-1

Performance Characteristics

Resource Overhead

Per vCluster:

Control Plane: 200-500MB memory, 100-200m CPU
Syncer Overhead: Minimal (watches real nodes)
Node Resources: Full dedicated node capacity available

Capacity Planning

Example Setup:

Host Cluster: 100 nodes (8 CPU, 32GB RAM each)
Dedicated allocation: 10 vClusters with 5 nodes each
Remaining: 50 nodes for host workloads and additional vClusters

Scaling Limits

Metric	Dedicated Nodes
Nodes per vCluster	1-1000+
vClusters per host	Limited by available nodes
Pods per node	Same as host cluster
Control plane overhead	~200-500MB per vCluster

Pros and Cons

Advantages

Compute Isolation: Workloads cannot interfere with each other

Predictable Performance: Dedicated resources prevent “noisy neighbor”

Real Node Access: Users see and can manage real nodes

Better Troubleshooting: Clear resource attribution

Compliance Friendly: Physical separation aids audits

GPU Support: Ideal for GPU workloads requiring isolation

Limitations

Lower Density: Requires dedicating physical nodes

Higher Cost: More nodes needed vs shared architecture

Shared CNI/CSI: Still uses host networking and storage

Shared Kernel: Workloads share the host OS kernel

Node Management: Must manage node labels and taints

vs Shared Nodes

Aspect	Shared	Dedicated
Compute Isolation	❌	✅
Cost Efficiency	✅	❌
Setup Complexity	Low	Medium
Noisy Neighbor	Possible	Prevented
Resource Visibility	Limited	Full

vs Private Nodes

Aspect	Dedicated	Private
Network Isolation	❌	✅
Storage Isolation	❌	✅
Setup Complexity	Medium	High
Host Dependency	High	Low
CNI Independence	❌	✅

Automation and Tooling

Automatic Node Provisioning

Use Cluster Autoscaler or Karpenter to provision nodes automatically:

Cluster Autoscaler
Karpenter

# Node group for tenant
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-config
data:
  tenant-nodes: |
    {
      "name": "tenant-a-pool",
      "minSize": 2,
      "maxSize": 10,
      "labels": {
        "tenant": "tenant-a"
      },
      "taints": [
        {"key": "tenant", "value": "tenant-a", "effect": "NoSchedule"}
      ]
    }

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: tenant-a
spec:
  template:
    metadata:
      labels:
        tenant: tenant-a
    spec:
      taints:
        - key: tenant
          value: tenant-a
          effect: NoSchedule
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
  limits:
    cpu: 100
    memory: 400Gi

GitOps Workflow

Manage vClusters and node assignments with GitOps:

# tenants/tenant-a/vcluster.yaml
apiVersion: storage.loft.sh/v1
kind: VirtualCluster
metadata:
  name: tenant-a
  namespace: vcluster-tenant-a
spec:
  config: |
    sync:
      fromHost:
        nodes:
          enabled: true
          selector:
            labels:
              tenant: tenant-a
---
# tenants/tenant-a/nodes.yaml
apiVersion: v1
kind: Node
metadata:
  name: node-tenant-a-1
  labels:
    tenant: tenant-a
spec:
  taints:
    - key: tenant
      value: tenant-a
      effect: NoSchedule

Troubleshooting

Pods Not Scheduling

Symptom: Pods stuck in Pending state

# Check events
kubectl describe pod <pod-name>

# Common issues:
# 1. No nodes match the tolerations
# 2. Insufficient resources on dedicated nodes
# 3. Node selector mismatch

Solution:

# Verify nodes are labeled correctly
kubectl get nodes -l tenant=my-tenant

# Check node resources
kubectl top nodes

# Verify pod has correct tolerations (on host)
kubectl get pod -n vcluster-my-tenant <pod-name> -o yaml | grep -A5 tolerations

Nodes Not Appearing in Virtual Cluster

Symptom: kubectl get nodes shows no nodes

# Check syncer logs
kubectl logs -n vcluster-my-tenant <vcluster-pod> -c syncer

# Verify node labels on host
kubectl get nodes --show-labels | grep tenant

Solution: Ensure node selector matches labeled nodes:

sync:
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          tenant: my-tenant  # Must match actual node labels

Performance Issues

Symptom: Slow performance despite dedicated nodes

# Check node resources
kubectl top nodes

# Check for resource contention
kubectl describe node <node-name> | grep -A20 "Allocated resources"

Solutions:

Increase node size
Add more dedicated nodes
Review resource requests/limits
Check for CPU throttling or memory pressure

Best Practices

Label and Taint Consistently

Use consistent labeling and tainting schemes:

# Standard pattern
kubectl label nodes <node> tenant=<tenant-id>
kubectl taint nodes <node> tenant=<tenant-id>:NoSchedule

This prevents scheduling conflicts and simplifies automation.

Monitor Node Utilization

Track resource usage per tenant:

# Per-node metrics
kubectl top nodes -l tenant=my-tenant

# Pod resource usage
kubectl top pods -A --sort-by=memory | grep my-tenant

Use this data for capacity planning and cost attribution.

Implement Resource Quotas

Always set quotas to prevent over-provisioning:

policies:
  resourceQuota:
    enabled: true
    quota:
      requests.cpu: "32"  # Match dedicated node capacity
      requests.memory: 128Gi

Use Node Auto-Scaling

Automate node provisioning based on demand:

Cluster Autoscaler: For managed Kubernetes (EKS, GKE, AKS)
Karpenter: For more flexible provisioning
Custom Controllers: For on-prem or hybrid setups

Test Failover Scenarios

Regularly test node failures:

# Drain a node
kubectl drain node-1 --ignore-daemonsets

# Verify pods reschedule to other dedicated nodes
kubectl get pods -o wide

# Uncordon when done
kubectl uncordon node-1

Document Node Assignments

Maintain documentation of node-to-tenant mappings:

# nodes-inventory.yaml
tenants:
  tenant-a:
    nodes: [node-1, node-2, node-3]
    capacity:
      cpu: 24
      memory: 96Gi
  tenant-b:
    nodes: [node-4, node-5]
    capacity:
      cpu: 16
      memory: 64Gi

Migration

From Shared to Dedicated Nodes

Prepare Dedicated Nodes

kubectl label nodes node-1 node-2 tenant=my-tenant
kubectl taint nodes node-1 node-2 tenant=my-tenant:NoSchedule

Update vCluster Configuration

vcluster upgrade my-tenant \
  --namespace vcluster-my-tenant \
  --values dedicated-nodes.yaml

Trigger Pod Recreation

# Pods will be rescheduled with new node selectors
kubectl rollout restart deployment -A

Verify

# All pods should be on dedicated nodes
kubectl get pods -A -o wide

Next Steps

Private Nodes

Take isolation further with complete CNI/CSI independence.

Shared Nodes

Compare with the shared nodes architecture.

Node Syncing

Advanced node syncing configuration options.

Auto Nodes

Combine dedicated nodes with automatic provisioning.

​How It Works

​Key Characteristics

​Configuration

​Basic Setup

​Prepare Host Nodes

​Create Dedicated Nodes vCluster

​Verify Node Isolation

​Use Cases

​Production Multi-Tenancy

​GPU Workloads

​Tiered Service Levels

​Compliance and Data Residency

​Advanced Configuration

​Multi-Zone Node Selection

​Dynamic Node Selection

​Sync Node Changes Back

​Hide Node Images

​Networking

​Pod Networking

​Network Policies

​Service Mesh Integration

​Storage

​Using Host Storage Classes

​Dedicated Storage

​Performance Characteristics

​Resource Overhead

​Capacity Planning

​Scaling Limits

​Pros and Cons

​Advantages

​Limitations

​vs Shared Nodes

​vs Private Nodes

​Automation and Tooling

​Automatic Node Provisioning

​GitOps Workflow

​Troubleshooting

​Pods Not Scheduling

​Nodes Not Appearing in Virtual Cluster

​Performance Issues

​Best Practices

​Migration

​From Shared to Dedicated Nodes

​Next Steps

Private Nodes

Shared Nodes

Node Syncing

Auto Nodes

How It Works

Key Characteristics

Configuration

Basic Setup

Prepare Host Nodes

Create Dedicated Nodes vCluster

Verify Node Isolation

Use Cases

Production Multi-Tenancy

GPU Workloads

Tiered Service Levels

Compliance and Data Residency

Advanced Configuration

Multi-Zone Node Selection

Dynamic Node Selection

Sync Node Changes Back

Hide Node Images

Networking

Pod Networking

Network Policies

Service Mesh Integration

Storage

Using Host Storage Classes

Dedicated Storage

Performance Characteristics

Resource Overhead

Capacity Planning

Scaling Limits

Pros and Cons

Advantages

Limitations

vs Shared Nodes

vs Private Nodes

Automation and Tooling

Automatic Node Provisioning

GitOps Workflow

Troubleshooting

Pods Not Scheduling

Nodes Not Appearing in Virtual Cluster

Performance Issues

Best Practices

Migration

From Shared to Dedicated Nodes

Next Steps