In the earlier, not early, days of Kubernetes, the general wisdom was to only run ephemeral workloads on Kubernetes clusters. This was with good reason as the support for persistence was not as mature.
Today, with the work that the community has put into the CloudNative PG, it has become much more feasible to run Production PostgreSQL database in Kubernetes. I will go over the setup, backup and restore, migration, and overal maintenance of CloudNative PG clusters in Kubernetes.
One of the main benefits of running your entire stack in Kubernetes and using open source systems is avoiding vendor lock in. When you’ve deployed your whole stack on Kubernetes, with the right practices, it becomes very easy to switch cloud providers or even move to your own on-premise or self-hosted servers. The trade-off really is one of convenience vs. flexibility. Cloud managed RDBMSs provide convenient setup and are often very good if you are just starting out and need an easy simple way to have your database. Self-managed databases on your own cluster offer you flexibility and full control over your stack, with the added burden of managing the actual database.
Setting Up Link to heading
There are two components required to run CNPG clusters in a Kubernetes cluster, the CNPG Operator and the CNPG Cluster.
The Operator sets up the CRDs and other related resources that allows a CNPG cluster to be deployed. The Cluster is where the actual database is deployed.
Before beginning it is always a good idea to have at least a quick read through the documentation.
CNPG Operator Link to heading
The simplest way to deploy the CNPG Operator is to use the helm chart.
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm upgrade --install cnpg \
--namespace cnpg-system \
--create-namespace \
cnpg/cloudnative-pg
CNPG Cluster Link to heading
There is also a helm chart for the
cluster.
It is a great option so long as your setup follows what the chart expects.
For my use-case I decided to set up my
Clusters
manually since I wanted a more minimal setup and to learn each component of
the cluster works.
A sample cluster configuration:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-cluster-name
namespace: my-namespace
spec:
instances: 3
# PostgreSQL configuration
postgresql:
parameters:
max_connections: "100"
shared_buffers: "128MB"
effective_cache_size: "512MB"
log_statement: "all"
log_min_duration_statement: "1000"
# Service account configuration for permissions
# Most cloud providers use annotations and labels
# for setting permissions
serviceAccountTemplate:
metadata:
annotations: {}
# Pod template configuration
inheritedMetadata:
labels: {}
# Setting resources for the PG instances
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
# Node affinity and tolerations
affinity:
nodeSelector: {}
# Assumes you have nodes set up with the database taint.
# This allows CNPG pods to be scheduled on those nodes.
tolerations:
- key: workload
operator: Equal
value: database
effect: NoSchedule
# Topology spread constraints for AZ distribution
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
cnpg.io/cluster: my-cluster-name
# Storage configuration
storage:
size: 10Gi
storageClass: my-database-storage-class
# Plugin configuration for backups
plugins:
- name: barman-cloud.cloudnative-pg.io
isWALArchiver: true
parameters:
barmanObjectName: my-backup-store
# Monitoring
monitoring:
# Exports database metrics to Prometheus
enablePodMonitor: true
customQueriesConfigMap:
- name: cnpg-default-monitoring
key: queries
# Bootstrap configuration
# Only run on a brand new Cluster
# Once there is existing data in the PVC,
# this config is ignored.
bootstrap:
initdb:
database: my_database_name
owner: app_user
secret:
name: secret-with-database-password
postInitApplicationSQL:
- "CREATE DATABASE my_database_name OWNER app_user;"
- "ALTER USER app_user CREATEDB;"
Since we are using the Barman Plugin for backups and WAL streaming, we need to also install the Barman Cloud CNPG-I plugin and then set up the Barman Object Store.
apiVersion: barmancloud.cnpg.io/v1
kind: ObjectStore
metadata:
name: my-backup-store
namespace: my-namespace
spec:
configuration:
destinationPath: ""
# Insert the credentials configs based on your cloud provider
# or your self-hosting set up
wal:
maxParallel: 8
data:
jobs: 2
retentionPolicy: "30d"
This object store automatically backs up your WAL and data for 30 days.
With this setup, you get a high availability 3 instance cluster spread across
AZs.
CNPG automatically handles having a leader, and follower instances, failover
in case the leader instance goes down.
The database is accessible at my-cluster-name-rw.my-namespace.
For read-only connections, it will be accessible at my-cluster-name-ro.my-namespace.
Multi-Cloud Setups Link to heading
I had the opportunity to move my database between cloud providers from AWS to Azure. Since I was already running CNPG on both Kubernetes clusters, it was a simple process of standing up the new cluster in Azure in recovery mode with pointing to the backup object stores of the existing cluster.
bootstrap:
recovery:
source: s3-backup
database: database_name
owner: app_user
replica:
enabled: true
source: s3-backup
externalClusters:
- name: s3-backup
barmanObjectStore:
serverName: aws-db
destinationPath: s3://backup-bucket/database_name
endpointURL: https://s3.us-east-1.amazonaws.com
s3Credentials:
accessKeyId:
name: credential-secret-name
key: ACCESS_KEY_ID
secretAccessKey:
name: credential-secret-name
key: ACCESS_SECRET_KEY
wal:
maxParallel: 8
This starts the database in replica mode streaming changes from the S3 bucket.
It continues this for as long as replica.enabled=true.
To promote the cluster, just update replica.endabled to false and it will
automatically handle its promotion.
Multi-Region/Cloud CNPG Database Link to heading
It is now a relatively simple process to run a CNPG database across multiple
regions or cloud providers. In the case the primary region or the primary cloud
goes down, switching to the replica is a matter of changing the configurations
to replica.enabled=false.
The disaster recovery now hinges on how quickly you can detect the primary failing and switching over. The usual issues with data loss still apply. This does not make your database somehow more data resilient that other deployment options, it just reduces your vendor lock in.