
Best practices for encrypting Kubernetes Secrets include using encryption at rest, KMS integration, RBAC, and key rotation. Learn how to secure your cluster effectively. Table
Deploy resilient, scalable on-prem Kubernetes storage using Rook-Ceph. Learn architecture, setup steps, best practices, and examples for block, file, and object storage.
Organizations running Kubernetes on-premises often discover that managing persistent storage is harder than deploying applications themselves. Public cloud platforms provide turnkey block and file storage services, but baremetal and private datacenter environments usually require administrators to build and maintain their own storage layer. This challenge grows as clusters scale, stateful workloads multiply, and reliability expectations rise.
Rook-Ceph has emerged as one of the most reliable and production-proven ways to provide persistent, distributed, self-healing storage for Kubernetes—without forcing teams to run a separate storage platform outside the cluster. This post offers a thorough, practical introduction to deploying Rook-Ceph for on-prem Kubernetes clusters, covering the architecture, benefits, deployment choices, real-world tips, and example commands. It is written for platform engineers, DevOps practitioners, SREs, and anyone building resilient storage for Kubernetes.
Rook is an open-source storage orchestrator for Kubernetes. Ceph is a mature, battle-tested distributed storage system trusted in environments ranging from hyperscale data centers to edge clusters. Paired together, they deliver a storage solution that is:
|
|
|
|
|
|
For on-prem clusters where cloud-native storage is unavailable, Rook-Ceph is one of the few fully open-source solutions that delivers enterprise-level functionality with community-driven transparency.
Understanding the architecture helps teams plan, troubleshoot, and optimize storage layouts.
🔄 Ceph Monitor (MON) |
MONs maintain cluster maps and quorum. A healthy cluster requires at least three MONs for high availability.
🔄 Ceph Manager (MGR) |
Managers provide monitoring, statistics, and the web dashboard. At least two instances are recommended.
🔄 Object Storage Daemons (OSDs) |
OSDs store the actual data. Each disk or storage device maps to one OSD. These are the backbone of capacity and throughput.
🔄 Placement Groups (PGs) |
PGs distribute and balance data across OSDs. Ceph automatically manages this, but capacity planning must consider PG count.
🔄 Ceph Block Pool / CephFS / RGW |
These provide the different storage types Rook exposes to Kubernetes:
| Storage Mode | Ceph Component | Uses |
|---|---|---|
| Block Storage (RBD) | Ceph Block Pool | Databases, key–value stores, apps needing PVCs |
| Shared File Storage | CephFS | Web servers, shared workloads, CI caches |
| Object Storage | RGW (Rados Gateway) | S3-compatible backups, archives, application buckets |
Rook-Ceph fits well when:
|
|
|
|
|
Rook-Ceph is not ideal if:
|
|
|
A successful deployment begins with understanding how you want storage distributed across nodes.
🔄 Hardware Considerations |
| Component | Recommendation |
|---|---|
| CPU | ≥4 cores per storage node |
| RAM | ≥16 GB; more for CephFS metadata workloads |
| OS | Modern Linux distribution with LVM2 and necessary kernel modules |
| Disks | SSD or NVMe for metadata; HDD acceptable for bulk capacity pools |
| Network | Dedicated 10GbE or faster for storage replication |
Use homogeneous disks when possible for predictable performance but mixed pools are acceptable with proper tiers.
Below is a streamlined but production-friendly installation workflow.
🔄 Deploy the Rook Operator |
git clone --depth 1 https://github.com/rook/rook.git
cd rook/deploy/examples
kubectl apply -f crds.yaml -f common.yaml -f operator.yaml
Verify that the operator is running:
kubectl -n rook-ceph get pods
🔄 Create a Cluster Custom Resource |
The cluster.yaml defines how Ceph is deployed. A minimal example:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v18
dataDirHostPath: /var/lib/rook
mon:
count: 3
allowMultiplePerNode: false
storage:
useAllNodes: true
useAllDevices: true
config:
osdsPerDevice: "1"
Apply it:
kubectl apply -f cluster.yaml
🔄 Validate the Ceph Cluster |
Check cluster health:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd tree
You should see a HEALTH_OK or at minimum HEALTH_WARN (acceptable during PG backfilling).
📦 Block Storage |
Block storage is the most common use case.
🔄 Create a CephBlockPool: |
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
Apply:
kubectl apply -f blockpool.yaml
🔄 Create a StorageClass: |
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
pool: replicapool
imageFormat: "2"
clusterID: rook-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
Use it in a PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pgdata
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: rook-ceph-block
resources:
requests:
storage: 20Gi
CephFS enables multi-write workloads with shared POSIX semantics. Create a filesystem:
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
name: cephfs
namespace: rook-ceph
spec:
metadataPool:
replicated:
size: 3
dataPools:
- replicated:
size: 3
Create a StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
fsName: cephfs
Rook can expose S3 via the Ceph RGW gateway.
apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
name: rook-s3
namespace: rook-ceph
spec:
gateway:
port: 80
zone:
name: zone1
This is useful for backup tools (Velero), artifact storage, and internal application buckets.
Running Ceph in production is powerful but requires thoughtful operation. These practices help maintain a healthy cluster.
🔄 Spread Storage Across Failure Domains |
Use failureDomain: host or rack to ensure replicas are placed safely.
🔄 Monitor Cluster Health Regularly |
Set up metrics and alerts:
|
|
|
|
🔄 Use Separate Networks If Possible |
Ceph supports:
|
|
Segregation lowers latency spikes and prevents noisy neighbors.
🔄 Avoid Overfilling |
Ceph performance degrades severely above 80% capacity. Automate alerts and cleanups well before reaching this threshold.
🔄 Prefer Bluestore on Raw Devices |
Bluestore offers better performance and efficiency compared to filestore, especially on NVMe.
🔄 Plan for Scaling |
Adding nodes is straightforward, but shrinking clusters is more complex and time-consuming due to data migration. Plan cluster growth before workloads demand it.
🔄 Back Up Critical Metadata |
Export Ceph configs and keyrings for disaster recovery:
ceph auth list > ceph-keys-backup.txt
ceph config dump > ceph-config-backup.txt
🖥️ Example: Deploying PostgreSQL with Rook-Ceph |
Once storage is configured, using it inside applications becomes trivial.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
spec:
storageClassName: rook-ceph-block
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:16
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: postgres-data
Because the PVC uses Rook-Ceph, this database pod now benefits from replicated, resilient, dynamically provisioned storage.
| Feature | Rook-Ceph | NFS | Longhorn | OpenEBS | Proprietary SAN |
|---|---|---|---|---|---|
| Replication | Yes | No | Yes | Yes (varies) | Yes |
| Self-healing | Yes | Limited | Yes | Varies | Yes |
| Block storage | Yes | No | Yes | Yes | Yes |
| File storage | Yes | Yes | Limited | Limited | Yes |
| Object storage | Yes | No | No | No | Add-on |
| Scalability | High | Medium | Medium | Medium | High |
| Cost | Free | Free | Free | Free | High |
| Operational complexity | Moderate | Low | Low | Medium | Low |
Rook-Ceph offers the richest set of capabilities but requires administrators comfortable with distributed storage concepts.
🔧 Cluster reports |
|
|
|
🔧 Too few MONs / lost quorum |
Restart MON pods:
kubectl -n rook-ceph delete pod -l app=rook-ceph-mon
Make sure your nodes have stable network connectivity and time synchronization.
🔧 PVC stuck in “Pending” |
Check the CSI driver:
kubectl get pods -n rook-ceph | grep csi
Check StorageClass correctness:
kubectl describe storageclass rook-ceph-block
🔧 PGs stuck in “activating / recovering” |
|
|
|
ceph pg repair
Ceph and Rook support seamless incremental upgrades.
🔧 Upgrade Rook |
kubectl apply -f operator.yaml
🔧 Upgrade Ceph |
Update the image in the CephCluster spec:
cephVersion:
image: quay.io/ceph/ceph:v18.2.1
Ceph performs rolling upgrades of MONs, MGRs, and OSDs with minimal disruption.
🔧 Adding New Storage Nodes |
|
|
|
|
Rook-Ceph is one of the most complete, flexible, and production-ready storage solutions for on-prem Kubernetes clusters. It brings the power of Ceph’s distributed architecture directly into Kubernetes, enabling teams to provide block, file, and object storage with a unified, self-healing, scalable system.
From stateful applications to CI platforms, analytics clusters, and internal developer tooling, organizations that need resilient storage without relying on external appliances can gain enormous value from Rook-Ceph. With thoughtful planning, strong observability, and adherence to best practices, it delivers cloud-like storage capabilities using your own hardware, fully under your control.
Did you find this article helpful? Your feedback is invaluable to us! Feel free to share this post with those who may benefit, and let us know your thoughts in the comments section below.

Best practices for encrypting Kubernetes Secrets include using encryption at rest, KMS integration, RBAC, and key rotation. Learn how to secure your cluster effectively. Table

Step‑by‑step guide to installing Minikube on CentOS Stream 10: Docker or KVM2 setup, prerequisites, kubectl, starting a local Kubernetes cluster, dashboard, and troubleshooting. Table of Contents

Learn how to install KVM on Fedora 41 step-by-step—covering prerequisites, CLI & GUI configuration, networking, performance tuning, and best practices for virtualized environments. Table of
