Backup Storage Locations and Volume Snapshot Locations

Ark v0.10 introduces a new way of configuring where Ark backups and their associated persistent volume snapshots are stored.

Motivations

In Ark versions prior to v0.10, the configuration for where to store backups & volume snapshots is specified in a Config custom resource. The backupStorageProvider section captures the place where all Ark backups should be stored. This is defined by a provider (e.g. aws, azure, gcp, minio, etc.), a bucket, and possibly some additional provider-specific settings (e.g. region). Similarly, the persistentVolumeProvider section captures the place where all persistent volume snapshots taken as part of Ark backups should be stored, and is defined by a provider and additional provider-specific settings (e.g. region).

There are a number of use cases that this basic design does not support, such as:

Additionally, as we look ahead to backup replication, a major feature on our roadmap, we know that we’ll need Ark to be able to support multiple possible storage locations.

Overview

In Ark v0.10 we got rid of the Config custom resource, and replaced it with two new custom resources, BackupStorageLocation and VolumeSnapshotLocation. The new resources directly replace the legacy backupStorageProvider and persistentVolumeProvider sections of the Config resource, respectively.

Now, the user can pre-define more than one possible BackupStorageLocation and more than one VolumeSnapshotLocation, and can select at backup creation time the location in which the backup and associated snapshots should be stored.

A BackupStorageLocation is defined as a bucket, a prefix within that bucket under which all Ark data should be stored, and a set of additional provider-specific fields (e.g. AWS region, Azure storage account, etc.) The API documentation captures the configurable parameters for each in-tree provider.

A VolumeSnapshotLocation is defined entirely by provider-specific fields (e.g. AWS region, Azure resource group, Portworx snapshot type, etc.) The API documentation captures the configurable parameters for each in-tree provider.

Additionally, since multiple VolumeSnapshotLocations can be created, the user can now configure locations for more than one volume provider, and if the cluster has volumes from multiple providers (e.g. AWS EBS and Portworx), all of them can be snapshotted in a single Ark backup.

Limitations / Caveats

Examples

Let’s look at some examples of how we can use this new mechanism to address each of our previously unsupported use cases:

Take snapshots of more than one kind of persistent volume in a single Ark backup (e.g. in a cluster with both EBS volumes and Portworx volumes)

During server configuration:

ark snapshot-location create ebs-us-east-1 \
    --provider aws \
    --config region=us-east-1

ark snapshot-location create portworx-cloud \
    --provider portworx \
    --config type=cloud

During backup creation:

ark backup create full-cluster-backup \
    --volume-snapshot-locations ebs-us-east-1,portworx-cloud

Alternately, since in this example there’s only one possible volume snapshot location configured for each of our two providers (ebs-us-east-1 for aws, and portworx-cloud for portworx), Ark doesn’t require them to be explicitly specified when creating the backup:

ark backup create full-cluster-backup

Have some Ark backups go to a bucket in an eastern USA region, and others go to a bucket in a western USA region

During server configuration:

ark backup-location create default \
    --provider aws \
    --bucket ark-backups \
    --config region=us-east-1

ark backup-location create s3-alt-region \
    --provider aws \
    --bucket ark-backups-alt \
    --config region=us-west-1

During backup creation:

# The Ark server will automatically store backups in the backup storage location named "default" if
# one is not specified when creating the backup. You can alter which backup storage location is used
# by default by setting the --default-backup-storage-location flag on the `ark server` command (run
# by the Ark deployment) to the name of a different backup storage location.
ark backup create full-cluster-backup

Or:

ark backup create full-cluster-alternate-location-backup \
    --storage-location s3-alt-region

For volume providers that support it (e.g. Portworx), have some snapshots be stored locally on the cluster and have others be stored in the cloud

During server configuration:

ark snapshot-location create portworx-local \
    --provider portworx \
    --config type=local

ark snapshot-location create portworx-cloud \
    --provider portworx \
    --config type=cloud

During backup creation:

# Note that since in this example we have two possible volume snapshot locations for the Portworx 
# provider, we need to explicitly specify which one to use when creating a backup. Alternately,
# you can set the --default-volume-snapshot-locations flag on the `ark server` command (run by
# the Ark deployment) to specify which location should be used for each provider by default, in 
# which case you don't need to specify it when creating a backup.
ark backup create local-snapshot-backup \
    --volume-snapshot-locations portworx-local

Or:

ark backup create cloud-snapshot-backup \
    --volume-snapshot-locations portworx-cloud

One location is still easy

If you don’t have a use case for more than one location, it’s still just as easy to use Ark. Let’s assume you’re running on AWS, in the us-west-1 region:

During server configuration:

ark backup-location create default \
    --provider aws \
    --bucket ark-backups \
    --config region=us-west-1

ark snapshot-location create ebs-us-west-1 \
    --provider aws \
    --config region=us-west-1

During backup creation:

# Ark's will automatically use your configured backup storage location and volume snapshot location. 
# Nothing new needs to be specified when creating a backup.
ark backup create full-cluster-backup

Additional Use Cases

  1. If you’re using Azure’s AKS, you may want to store your volume snapshots outside of the “infrastructure” resource group that is automatically created when you create your AKS cluster. This is now possible using a VolumeSnapshotLocation, by specifying a resourceGroup under the config section of the snapshot location. See the Azure volume snapshot location documentation for details.

  2. If you’re using Azure, you may want to store your Ark backups across multiple storage accounts and/or resource groups. This is now possible using a BackupStorageLocation, by specifying a storageAccount and/or resourceGroup, respectively, under the config section of the backup location. See the Azure backup storage location documentation for details.