Running PostrgeSQL in a Kubernetes Cluster

In the earlier, not early, days of Kubernetes, the general wisdom was to only run ephemeral workloads on Kubernetes clusters. This was with good reason as the support for persistence was not as mature.

Today, with the work that the community has put into the CloudNative PG, it has become much more feasible to run Production PostgreSQL database in Kubernetes. I will go over the setup, backup and restore, migration, and overal maintenance of CloudNative PG clusters in Kubernetes.

One of the main benefits of running your entire stack in Kubernetes and using open source systems is avoiding vendor lock in. When you’ve deployed your whole stack on Kubernetes, with the right practices, it becomes very easy to switch cloud providers or even move to your own on-premise or self-hosted servers. The trade-off really is one of convenience vs. flexibility. Cloud managed RDBMSs provide convenient setup and are often very good if you are just starting out and need an easy simple way to have your database. Self-managed databases on your own cluster offer you flexibility and full control over your stack, with the added burden of managing the actual database.

Setting Up Link to heading

There are two components required to run CNPG clusters in a Kubernetes cluster, the CNPG Operator and the CNPG Cluster.

The Operator sets up the CRDs and other related resources that allows a CNPG cluster to be deployed. The Cluster is where the actual database is deployed.

Before beginning it is always a good idea to have at least a quick read through the documentation.

CNPG Operator Link to heading

The simplest way to deploy the CNPG Operator is to use the helm chart.

helm repo add cnpg https://cloudnative-pg.github.io/charts

helm upgrade --install cnpg \
  --namespace cnpg-system \
  --create-namespace \
  cnpg/cloudnative-pg

CNPG Cluster Link to heading

There is also a helm chart for the cluster. It is a great option so long as your setup follows what the chart expects. For my use-case I decided to set up my Clusters manually since I wanted a more minimal setup and to learn each component of the cluster works.

A sample cluster configuration:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-cluster-name
  namespace: my-namespace
spec:
  instances: 3
  
  # PostgreSQL configuration
  postgresql:
    parameters:
      max_connections: "100"
      shared_buffers: "128MB"
      effective_cache_size: "512MB"
      log_statement: "all"
      log_min_duration_statement: "1000"
  
  # Service account configuration for permissions
  # Most cloud providers use annotations and labels
  # for setting permissions
  serviceAccountTemplate:
    metadata:
      annotations: {}
  
  # Pod template configuration
  inheritedMetadata:
    labels: {}
  
  # Setting resources for the PG instances
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

  # Node affinity and tolerations
  affinity:
    nodeSelector: {}
    # Assumes you have nodes set up with the database taint.
    # This allows CNPG pods to be scheduled on those nodes.
    tolerations:
    - key: workload
      operator: Equal
      value: database
      effect: NoSchedule

  # Topology spread constraints for AZ distribution
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        cnpg.io/cluster: my-cluster-name

  # Storage configuration
  storage:
    size: 10Gi
    storageClass: my-database-storage-class
  
  # Plugin configuration for backups
  plugins:
    - name: barman-cloud.cloudnative-pg.io
      isWALArchiver: true
      parameters:
        barmanObjectName: my-backup-store
  
  # Monitoring
  monitoring:
    # Exports database metrics to Prometheus
    enablePodMonitor: true
    customQueriesConfigMap:
      - name: cnpg-default-monitoring
        key: queries
  
  # Bootstrap configuration
  # Only run on a brand new Cluster
  # Once there is existing data in the PVC,
  # this config is ignored.
  bootstrap:
    initdb:
      database: my_database_name
      owner: app_user
      secret:
        name: secret-with-database-password
      postInitApplicationSQL:
        - "CREATE DATABASE my_database_name OWNER app_user;"
        - "ALTER USER app_user CREATEDB;"

Since we are using the Barman Plugin for backups and WAL streaming, we need to also install the Barman Cloud CNPG-I plugin and then set up the Barman Object Store.

apiVersion: barmancloud.cnpg.io/v1
kind: ObjectStore
metadata:
  name: my-backup-store
  namespace: my-namespace
spec:
  configuration:
    destinationPath: ""
    # Insert the credentials configs based on your cloud provider
    # or your self-hosting set up
    wal:
      maxParallel: 8
    data:
      jobs: 2
  retentionPolicy: "30d"

This object store automatically backs up your WAL and data for 30 days.

With this setup, you get a high availability 3 instance cluster spread across AZs. CNPG automatically handles having a leader, and follower instances, failover in case the leader instance goes down. The database is accessible at my-cluster-name-rw.my-namespace. For read-only connections, it will be accessible at my-cluster-name-ro.my-namespace.

Multi-Cloud Setups Link to heading

I had the opportunity to move my database between cloud providers from AWS to Azure. Since I was already running CNPG on both Kubernetes clusters, it was a simple process of standing up the new cluster in Azure in recovery mode with pointing to the backup object stores of the existing cluster.

  bootstrap:
    recovery:
      source: s3-backup
      database: database_name
      owner: app_user

  replica:
    enabled: true
    source: s3-backup

  externalClusters:
    - name: s3-backup
      barmanObjectStore:
        serverName: aws-db
        destinationPath: s3://backup-bucket/database_name
        endpointURL: https://s3.us-east-1.amazonaws.com
        s3Credentials:
          accessKeyId:
            name: credential-secret-name
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: credential-secret-name
            key: ACCESS_SECRET_KEY
        wal:
          maxParallel: 8

This starts the database in replica mode streaming changes from the S3 bucket. It continues this for as long as replica.enabled=true. To promote the cluster, just update replica.endabled to false and it will automatically handle its promotion.

Multi-Region/Cloud CNPG Database Link to heading

It is now a relatively simple process to run a CNPG database across multiple regions or cloud providers. In the case the primary region or the primary cloud goes down, switching to the replica is a matter of changing the configurations to replica.enabled=false.

The disaster recovery now hinges on how quickly you can detect the primary failing and switching over. The usual issues with data loss still apply. This does not make your database somehow more data resilient that other deployment options, it just reduces your vendor lock in.

Scaling Link to heading

Vertical Scaling Link to heading

Vertical scaling involves bringing up a larger follower, then promoting it to be the new leader. It does involve a short downtime during the promotion window which can be mitigated with proper planning and scheduling.

Horizontal Scaling Link to heading

Horizontal scaling here only adds more followers to the cluster, allowing for more concurrent reads. This has no effect on the write performance of the cluster. It is a good option in read-heavy workloads and involves simply increasing the number of instances in the cluster configuration.

Scaling Storage Link to heading

CNPG clusters are backed by PVCs so increasing storage is dependent on the storage class of the provisioned PVCs. Many cloud providers support resizing PVCs while they are still attached and running. This allows scaling up the amount of storage provisioned. Scaling down is usually a more involved process which requires bringing up new followers on smaller PVCs and then running the “promotion” process and deleting the instances running on the larger PVCs.

Final Thoughts Link to heading

The CloudNative PG project has really modernized the storage and persistence layer in Kubernetes. It allows Kubernetes clusters to be run almost fully without dependencies on managed RDBMSs, which in turn reduces a lot of the vender lock-in risks. For fully self-hosted and managed clusters, it offers a Kubernetes-native way to operate and manage PostgreSQL databases.