Tanzu Backup and Restore with Standalone Velero

Overview

In this post we will take a look at backup and restore of workloads inside of vSphere with Tanzu. In this setup where we have vSphere with Tanzu setup with vSphere networking, we will use the Standalone Velero and Restic solution as opposed to the more integrated Velero operator that can be utilized when using NSX-T as the networking stack.

Backing up and restoring vSphere with Tanzu

First a note on backups of a vSphere with Tanzu environment..

vSphere with Tanzu consists of multiple components. Currently there's no solution that does a complete backup of all these components in one go (and maybe we'll never have one).

We have vCenter it self, and it's configuration, that will have it's own backup process, you'll have the Load Balancer if using the vSphere networking stack that should have it's backup process, NSX-T will have it's own process, and you'll have the Tanzu stuff which is handled differently based on what type of object (and a few other things).

VMware provides a table of the different components and the Tools that can be used which explains in a high-level what considerations we must take when implementing a backup and restore strategy

Note that, as mentioned, there are differences in the plugins/tools that can be used based on if we're using vSphere networking or NSX-T in our environment. The latter uses a Velero Plugin for vSphere which is more integrated than the standalone version we'll have to use without NSX-T, but this will also tie it to Tanzu and backups cannot necessarily be ported to other clusters. For a run through of using the Velero plugin, check out this blog post by Cormac Hogan

The focus of this post will be on backup and restore of workloads running in vSphere with Tanzu, i.e. what is running inside a Tanzu Kubernetes Cluster (TKC).

The environment

The environment used in my setup consists of a cluster of three hosts running on vSphere 7.0 U2. There's a couple of vSphere Namespaces configured, and there's also a couple of Tanzu Kubernetes Clusters (guest clusters) running in the environment. The load balancer used in the environment is the NSX Advanced Load Balancer (Avi Vantage).

vSphere with Tanzu environment

Velero

The tool used for doing backup and restore inside the TKC clusters is Velero. This is an open-source, Cloud Native Computing Foundation backed project for performing disaster recovery of Kubernetes cluster resources. It is also used for migrating resources and persistent volumes.

Velero supports a few storage providers, but again since the vSphere provider requires NSX-T as the networking stack we'll go for an object store and in this setup we'll use MinIO as the object store provider

Note that to run the Velero commands we'll see later on in this post we'll have to have the Velero CLI installed. And be aware that to be eligible for support from VMware we'll have to use the VMware signed binary. Check the VMware documentation for more information

Installing MinIO

MinIO can be installed in a container, on bare-metal or on virtual machines. In this setup we'll install it on a Ubuntu 20.04 VM.

VMware provides documentation on how to configure it, but we'll go through it step-by-step with a few additions and clarifications.

First we'll download the MinIO executable

1wget https://dl.min.io/server/minio/release/linux-amd64/minio

We'll create a directory for storing the MinIO data

1mkdir /MINIO-DATA

We'll start MinIO manually first, later on we'll create a service for it

1chmod +x minio
2./minio server /MINIO-DATA

The output will give us the root user and password which we'll need. Note that this can be set with environment variables, which also is true for the console endpoint port. In my lab environment I'll leave it with the defaults. In a production environment this should taken care of.

Starting MinIO

We can verify the MinIO server by navigating to the console address (if you use the 9000 port you'll be redirected). Use the credentials recorded from the console output to log in

MinIO UI

Let's stop the server and prepare a service for it.

First we'll create a user to run the service

1useradd -M minio-user

Set the minio-user as the owner of the data directory

1chown -R minio-user:minio-user /MINIO-DATA

Copy the executable to /usr/local/bin

1cp minio /usr/local/bin

Create a service file

1touch minio.service

Add the following to the service file

 1[Unit]
 2Description=MinIO
 3Documentation=https://docs.min.io
 4Wants=network-online.target
 5After=network-online.target
 6AssertFileIsExecutable=/usr/local/bin/minio
 7
 8[Service]
 9WorkingDirectory=/usr/local/
10
11User=minio-user
12Group=minio-user
13ProtectProc=invisible
14
15ExecStart=/usr/local/bin/minio server /MINIO-DATA
16
17# Let systemd restart this service always
18Restart=always
19
20# Specifies the maximum file descriptor number that can be opened by this process
21LimitNOFILE=65536
22
23# Specifies the maximum number of threads this process can create
24TasksMax=infinity
25
26# Disable timeout logic and wait until process is stopped
27TimeoutStopSec=infinity
28SendSIGKILL=no
29
30[Install]
31WantedBy=multi-user.target

Copy the service file to /etc/systemd/system

1cp minio.service /etc/systemd/system

Reload service daemon

1systemctl daemon-reload

Start the MinIO service and check that it's running

1systemctl start minio
2systemctl status minio

MinIO server status

Verify that you can access the MinIO UI and let's create a user account and a bucket to be used for our Tanzu backups!

Create MinIO user and bucket

Note that this could easily be done with the MinIO CLI tools, but we'll use the UI for demonstration purposes

First we'll create a user named velero and give it readwrite access. We could (and in prod should) restrict it to only the buckets used for backups, but for now I'm find with a generic access

Create MinIO user

Now, we'll create a Bucket for our Tanzu backups.

Before you go ahead and create the buckets, you might put some considerations into our bucket structure.

In this post I'm doing a Velero install per TKC cluster so a bucket per cluster could be a strategy. Likewise, if you want to utilize the MinIO server for more than just backups then maybe you want to separate backups from other data and will end up with a specific backup bucket per cluster. In short, do some thinking before creating buckets for all of your clusters!

In this case I'm creating one bucket per cluster and I'll give the bucket the same name as the TKC cluster

Create MinIO bucket

Also, verify that the bucket access is R/W

Verify bucket access

And lastly that the user account we created has access to it

Verify user access

Now we can carry on with installing Velero and Restic on our Tanzu Kubernetes Cluster

Install Velero

First let's take a quick look on the environment. We have two vSphere namespaces in this setup as we saw earlier in this blog post, and in one of the namespaces we have two TKC clusters

TKC clusters

Let's install Velero in the tkc-cluster-2 cluster

First we'll change to/verify that we are working with the correct context. The Velero CLI will automatically work with the current context

1kubectl config use-context tkc-cluster-2

Let's also create a credentials file for the Velero installer to use where we specify the details for the user we created earlier

1[default]
2aws_access_key_id = velero
3aws_secret_access_key = <secret-key-for-velero-user>

I'm saving this in a file called credentials-minio

Note that the VMware documentation doesn't include that [default] header in the credentials file. In my setup I got error messages like "Error getting a backup store" and "NoCredentialProviders: no valid providers in chain" if I left that header out. I got a clue from this discussion on Github on how to solve it

Now we're ready to install Velero and Restic in our TKC cluster. (Make sure you change the bucket, secret-file, s3Url and publicUrl parameters to reflect your environment)

1velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket tkc-cluster-2 --secret-file ./credentials-minio --use-volume-snapshots=false --use-restic --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.100.41:9000,publicUrl=http://192.168.100.41:9000

Instailling Velero in a Tanzu Kubernetes Cluster

After a few minutes we can run the kubectl logs deployment/velero -n velero command to check the status of the install. The message we want to see is the Backup storage location valid, marking as available

With Velero installed we can try to run a backup in our cluster.

Create a backup with Velero

First, let's check out a simple workload that are running in our cluster

Workload running in TKC

Now we'll create a backup

1velero backup create my-app-backup

Create Velero backup

Over in our MinIO server we can check our bucket and see that our backup objects have been created there

MinIO bucket

Delete app and do a restore

Now let's delete our namespace mimicing that our app has been deleted and needs to be restored

1kubectl delete ns my-app

Delete Namespace and verify that our app stops working

Now let's run a restore and see if we can ge our app back

1velero restore create --from-backup my-app-backup

Restore namespace from backup

As we can see, our namespace is back and also our nginx deployment and the Load balanced service!

Ok, so what about losing an entire cluster? Or maybe we want to migrate our app to a different cluster?

Restore to different cluster

Well, let's take a look at our tkc-cluster-1 cluster

New cluster

Now, if we'd want to restore our backup taken from the different cluster we'll just install Velero and point it to our MinIO bucket. In fact we can use the exact same install command as we did in the first cluster.

The cool thing now is that Velero will automatically sync our object store which we can verify by listing available backups

List backups

So, let's try a restore of that backup in this cluster!

Restore to new cluster

It works!

Summary

There's a lot more to Velero (and Restic) than what we've looked at here, but hopefully this has been an intro to some of the capabilities in the product, and also shown that it's quite easy to get started with it.

Thanks for reading, please reach out if you have any questions or comments!

This page was modified on October 21, 2021: fixed title