Running Plex on a Tanzu Community Edition Kubernetes Cluster

Update 2022-10-21: After just one year in the wild VMware announced on Oct 21 2022 that they would no longer update or maintain the TCE project and that by end of 2022 the Github project will be removed. For more information check out my blog post here

Most of the things mentioned in this post (outside of installing TCE) should still be valid for other Kubernetes distributions

As part of the VMware vExpert Applicaton Modernization group I was invited to be a part of a Tanzu Challenge this month. The task was to get a Plex solution up and running on Tanzu Community Edition (TCE) cluster. The solution should be "production grade", including both management and protection of the data and configuration, monitoring, logging and recovery/high availability.

As I wanted to get my hands dirty with TCE this was a golden opportunity and something I was happy to take on.

The work towards delivering on the challenge has resulted in many different blog posts

Getting Plex to run on Docker in general and Kubernetes in particular was not an easy task and there's still things I'd want to get in place for this solution, like more security measures for the pods and other Kubernetes resources (network policies, service accounts etc), SSL integrated with Vault or an external PKI like Let's encrypt, some kind of a pipeline to orchestrate the deployment of the solution, etc.

Please note that when I'm talking about the struggles to get this up and running I'm primarily thinking of how to have the PMS, Plex Media Server, running on Kubernetes accessing my own media. The Plex web server works just fine from the Docker image and can stream online content

In my case I can summarize my struggles with one word, Plex account (ok that's actually two words I guess). Or at least I think this is what made me pull my hair the last couple of days. To tell the truth, I've tested so much back and forth that there might a few things I've forgot and/or misunderstood in the process. I'll discuss that shortly.

Let's first take a look at how I've set up the environment where I'll run my solution

Solution

All yaml and config files mentioned below can be found on this GitHub repository

Kubernetes environment

The Kubernetes environment will be running as a Tanzu Community Edition cluster as the challenge dictated. TCE can be deployed on both public cloud, vSphere and even in Docker.

I decided to deploy it on vSphere which I've written about here. In short we'll have a Management cluster deployed and from this Management cluster we can deploy multiple Workload clusters.

My management cluster is set up as a dev cluster with one control plane node and one worker node to save resources in my lab, the workload cluster is also deployed with the Dev plan giving us one control plane nodes, but I have explicitly specified that I want three worker nodes. For a real production cluster we would obviously have three control plane nodes as well, but since this is my lab I need to save where I can.

TCE cluster nodes

Network

The network setup for the environment is quite simple. The physical hosts are running on a management VLAN, while the Kubernetes nodes are run on another specific VLAN. Check out my previous post on how to deploy TCE on vSphere for details.

Storage

Storage is a common struggle in Kubernetes and there's a myriad of both options, guides and walkthroughs out there. I have previously worked with NFS storage in Kubernetes, and of course utilized vSAN when working with vSphere with Tanzu.

For Plex you need at least two volumes, one for the configuration and database, /config, and one for the media files, /data. There is a third volume in play for transcoding, but I've not set that up in my solution.

The media files in my solution are delivered from a separate NAS server which shares the files through NFS.

Initially I tried to get Plex running on multiple replicas, and with a ReadWriteMany volume. I couldn't get this in my current vSphere environment and I wanted to set up something other than a plain old NFS server, but after a lot of trying and failing I gave up. I found that Plex didn't work out that well with multiple replicas so I decided to revert to a vSphere volume for the config directory, and to use the CSI that get's deployed for TCE when deploying on vSphere.

There is a clusterized Plex community solution out there, ClusterPlex, which you might want to check out if you really want to have Plex running as a proper cluster

For the supporting software discussed below we'll also use vSphere storage in the cases where persistent volumes are required. Although there's a default storage class created at deployment time I've created a specific one pointing to a specific datastore

1kind: StorageClass
2apiVersion: storage.k8s.io/v1
3metadata:
4  name: sc-vmdk
5  annotations:
6    storageclass.kubernetes.io/is-default-class: "false"
7provisioner: csi.vsphere.vmware.com
8parameters:
9  datastoreurl: "ds:///vmfs/volumes/618d662e-9ab79b60-16ee-1c697a01b903/"

With this we can create PVC's pointing to the name of the storage class and the vSphere CSI driver will create a PVC backed by vSphere automatically.

With this said and as mentioned already, storage in Kubernetes can be difficult and this is where a solution like vSphere with Tanzu and vSAN really helps out. This is one of the strongest learning points from this whole thing, having a platform with an easy way to integrate storage saves a lot of time and is something that'll let your developers focus on other things.

Supporting software

One of the cool things with TCE (and the other Tanzu platforms) is that we can utilize the Tanzu Package repository to deploy common supporting software. For our solution we'll use quite a few of these.

  • cert-manager will be used to provide the ability to issue certificates
  • Contour with Envoy will provide Ingress capabilities
  • ExternalDNS will integrate with an external MS DNS server to register our services in DNS
  • Prometheus will scrape metrics from the different parts of our environment
  • Grafana will let us visualize the metrics in dashboards
  • FluentBit will be used to push logs to an external syslog server
  • Velero will be used to provide backup and restore functionality

In addition we'll install

  • MetalLB for providing load balancing services

I have individual posts created for setting up all of these, links can be found at the top of this blog post.

Note that I am also utilizing a Harbor image registry running outside of the TCE cluster, but for the examples following below I've used the Docker Hub image. For deploying Harbor on TCE check out my blog post here

The Plex solution

Now, let's discuss the Plex components, starting with the one thing I've spent most of my time on.

Plex.tv accounts, claims and advertised IP

First things first. When you get Plex up and running you'll need to log in to the Plex Web server. This login is a plex.tv login and your account can be linked to multiple Plex Media Servers. This is all fine, I already have a plex.tv account (actually I have two, one for testing things like this setup) and was ready to use that.

In the specifics for the Plex Docker image there's a section for the recommended (but not mandatory) PLEX_CLAIM parameter. This parameter allows for the server to be automatically logged in.

In my many tests I've not come around having to specify this. But that's not all. There's one more parameter I've had to set to get it working, ADVERTISE_IP. And even though it says IP we actually have to specify both the http scheme and the media server port

One very important thing to note with the Plex claim is that it's only valid for four - 4 - minutes.

Persistent volumes

As mentioned we need two persistent volumes. One of which will be the reference to my already existing share of media files.

For this I've created a Persistent volume mounting the NFS share, and a corresponding Persistent Volume claim which the Plex Pod can mount

The second volume is for the /config directory, and this will be created dynamically through the previously mentioned Storage class

 1apiVersion: v1
 2kind: Namespace
 3metadata:
 4  name: plex
 5---
 6apiVersion: v1
 7kind: PersistentVolume
 8metadata:
 9  name: nfs-plex-data
10spec:
11  storageClassName: "pvc-plex-data-nfs"
12  capacity:
13    storage: 20Gi
14  accessModes:
15    - ReadWriteMany
16  nfs:
17    path: /nfs/plex/tce/data
18    server: x.x.x.x
19    readOnly: false
20---
21apiVersion: v1
22kind: PersistentVolumeClaim
23metadata:
24  name: pvc-plex-config
25  namespace: plex
26spec:
27  storageClassName: sc-vmdk
28  accessModes:
29    - ReadWriteOnce
30  resources:
31    requests:
32      storage: 1Gi
33---
34apiVersion: v1
35kind: PersistentVolumeClaim
36metadata:
37  name: pvc-plex-data
38  namespace: plex
39spec:
40  storageClassName: pvc-plex-data-nfs
41  volumeName: nfs-plex-data
42  accessModes:
43    - ReadWriteMany
44  resources:
45    requests:
46      storage: 20Gi

After creating the storage resources we should have them ready to use for our Plex deployment. Again, note that the config PVC was created with the storage class which created a Persistent volume for us

Plex Persistent volume claims

Plex configuration

We've already discussed the PLEX_CLAIM and the ADVERTISE_IP parameters. There's a couple more to set, again they are optional according to the Plex documentation. Most of them are only used at first start of the container, building out the Plex database backed by the /config volume.

Depending on how you'll connect to, and from what device, the Plex server you'll have to open and make available a few ports. I've used both the Plex docker image reference and this blog post for getting a list of ports to define. There are ports I've omitted, for intance for Roku devices which I don't have. If the ports won't be used they shouldn't be available.

What I've ended up with is the following deployment spec

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: plex
 5  namespace: plex
 6  labels:
 7    app: plexserver
 8spec:
 9  replicas: 1
10  selector:
11    matchLabels:
12      app: plexserver
13  template:
14    metadata:
15      labels:
16        app: plexserver
17      annotations:
18        backup.velero.io/backup-volumes: plex-config
19    spec:
20      volumes:
21      - name: plex-config
22        persistentVolumeClaim:
23          claimName: pvc-plex-config
24      - name: plex-data
25        persistentVolumeClaim:
26          claimName: pvc-plex-data
27      containers:
28      - name: plex
29        image: plexinc/pms-docker
30        imagePullPolicy: Always
31        resources:
32          requests:
33            cpu: 1
34            memory: 512Mi
35        ports:
36        - containerPort: 32400 #PMS
37          name: plex
38        - containerPort: 32400
39          name: plex-udp
40          protocol: UDP
41        - containerPort: 32469 #Plex DLNA
42          name: plex-dlna
43        - containerPort: 5353
44          name: plex-discovery
45          protocol: UDP
46        - containerPort: 1900 #Plex DLNA
47          name: plex-dlna-udp
48          protocol: UDP
49        env:
50        - name: UID
51          value: "\x31\x30\x30\x33" #Corresponds to user id on NFS server (ascii)
52        - name: GID
53          value: "\x31\x30\x30\x33" #Corresponds to group id on NFS server
54        - name: PLEX_CLAIM
55          value: <plex.tv/claim>
56        - name: ADVERTISE_IP
57          value: https://plex-p.tce.rhmlab.local
58        - name: HOSTNAME
59          value: plex-p.tce.rhmlab.local
60        - name: ALLOWED_NETWORKS
61          value: 192.168.0.0/16,100.96.0.0/11
62        - name: TZ
63          value: Europe/Oslo
64        volumeMounts:
65        - mountPath: /config
66          name: plex-config
67        - mountPath: /data
68          name: plex-data

Note the annotation specifying which volume to include for Velero snapshot backup. I'm only specifying the /config volume since the media files are sitting on an external NFS server which has it's own backup.

Service

To connect to a Kubernetes deployment we have to use Services. And to connect to Plex we'll have to create a service for the previously mentioned ports. This blog post mentioned earlier helped a lot here. Note that we need to do a special thing here with the load balancer IP so that both the TCP and UDP service can use the same one.

 1kind: Service
 2apiVersion: v1
 3metadata:
 4  name: plex-udp
 5  namespace: plex
 6  annotations:
 7    metallb.universe.tf/allow-shared-ip: plexserver # annotation name to combine the Service IP, make sure it's same name as in the service UDP yaml
 8spec:
 9  selector:
10    app: plexserver
11  ports:
12  - port: 1900
13    targetPort: 1900
14    name: dlna-udp
15    protocol: UDP
16  - port: 5353
17    targetPort: 5353
18    name: discovery-udp
19    protocol: UDP
20  type: LoadBalancer
21  loadBalancerIP: 192.168.151.62
22---
23kind: Service
24apiVersion: v1
25metadata:
26  name: plex-tcp
27  namespace: plex
28  annotations:
29    metallb.universe.tf/allow-shared-ip: plexserver
30spec:
31  selector:
32    app: plexserver
33  ports:                      
34  - port: 32400
35    targetPort: 32400
36    name: pms-web
37    protocol: TCP
38  - port: 32469
39    targetPort: 32469
40    name: dlna-tcp
41    protocol: TCP
42  type: LoadBalancer
43  loadBalancerIP: 192.168.151.62
44

Now, we can access the Plex server by the IP address of the load balancer and on the port 32400 which are the default port. We'll do just that to see that things are working and do the initial configuration before proceeding

Initial Plex configuration

Point library to data folder

Plex library ready

TLS certificate and HTTPProxy

Normally we want to be able to access the service by a DNS name and not having to specify the port. For that we'll use Contour and a HTTPProxy resource.

We'll also annotate this resource with annotations for ExternalDNS so we'll get it registered in the External DNS server and we'll specify it to use a certificate which we'll create in the same yaml.

Note that the certificate will be self-signed, but if we'd have an integration with an external service like Vault, or even better Let's Encrypt we could have easily had this create a trusted certificate for us.

 1apiVersion: cert-manager.io/v1
 2kind: Issuer
 3metadata:
 4  name: plex-issuer
 5  namespace: plex
 6spec:
 7  selfSigned: {}
 8---
 9apiVersion: cert-manager.io/v1
10kind: Certificate
11metadata:
12  name: plex-tls
13  namespace: plex
14spec:
15  secretName: plex-tls
16  duration: 2160h # 90d
17  renewBefore: 360h # 15d
18  subject:
19    organizations:
20      - rhmlab
21  isCA: false
22  privateKey:
23    algorithm: RSA
24    encoding: PKCS1
25    size: 2048
26  usages:
27    - server auth
28    - client auth
29  dnsNames:
30    - plex.tce.rhmlab.local
31  ipAddresses:
32    - 192.168.151.62
33  issuerRef:
34    name: plex-issuer
35    kind: Issuer
36    group: cert-manager.io

This gives us a self-signed certificate which we can reference in our HTTPProxy specification

 1apiVersion: projectcontour.io/v1
 2kind: HTTPProxy
 3metadata:
 4  annotations:
 5    external-dns.alpha.kubernetes.io/hostname: plex.tce.rhmlab.local.
 6  labels:
 7    app: plexserver
 8  name: plex-httpproxy
 9  namespace: plex
10spec:
11  routes:
12  - conditions:
13    - prefix: /
14    pathRewritePolicy:
15      replacePrefix:
16      - prefix: /
17        replacement: /
18    services:
19    - name: plex-tcp #corresponds to service object
20      port: 32400
21  virtualhost:
22    fqdn: plex.tce.rhmlab.local
23    tls:
24      secretName: plex-tls

This let's us access Plex with the specified FQDN and we have a certificate in place, albeit self-signed.

Plex web working with SSL certificate and FQDN

With this we actually have our Plex solution running and we can enjoy our media.

Now to mimic some kind of a production setting we'll add in a few additional capabilities from the supporting software discussed earlier.

Monitoring

In my TCE cluster I have already set up Prometheus and Grafana based on Tanzu packages. I've a blog post on how to set that up here

Plex doesn't offer any Prometheus metrics out of the box, but there are a few community exporters out there. I went with this image on the Docker Hub which is based on this project

This needs a config file which we'll present to the exporter through a Config map. You'll note that the token is inside the config map directly, ideally this should be a Secret. To get the token check out this Plex support article.

I couldn't get Autodiscovery to work but since this is Kubernetes we can at least reference the plex service with the Kubernetes service name

To have our Prometheus installation scrape these metrics we'll annotate the plex_exporter pod with the following

1prometheus.io/scrape: "true"
2prometheus.io/scheme: "http"
3prometheus.io/path: "/metrics"
4prometheus.io/port: "9594"

The full yaml for the plex_exporter deployment and service are here.

 1apiVersion: v1
 2kind: ConfigMap
 3metadata:
 4  name: plex-exporter
 5  namespace: plex
 6data:
 7  config.yaml: |
 8    address: ":9594"
 9    logLevel: "info"
10    logFormat: "text"
11    autoDiscover: false
12    token: "<your-token>"
13    servers:
14    - baseUrl: http://plex-tcp:32400    
15---
16apiVersion: apps/v1
17kind: Deployment
18metadata:
19  name: plex-exporter
20  namespace: plex
21  labels:
22    app: plex-exporter
23spec:
24  replicas: 1
25  selector:
26    matchLabels:
27      app: plex-exporter
28  template:
29    metadata:
30      labels:
31        app: plex-exporter
32      annotations:
33        prometheus.io/scrape: "true"
34        prometheus.io/scheme: "http"
35        prometheus.io/path: "/metrics"
36        prometheus.io/port: "9594"
37    spec:
38      containers:
39        - name: plex-exporter
40          image: granra/plex_exporter:v0.2.3
41          imagePullPolicy: Always
42          ports:
43            - containerPort: 9594
44              name: metrics
45          volumeMounts:
46            - name: config
47              mountPath: /etc/plex_exporter/config.yaml
48              subPath: config.yaml
49      volumes:
50        - name: config
51          configMap:
52            name: plex-exporter
53---
54kind: Service
55apiVersion: v1
56metadata:
57  name: plex-exporter
58  namespace: plex
59spec:
60  selector:
61    app: plex-exporter
62  ports:                      
63  - port: 9594
64    name: metrics
65    protocol: TCP
66  type: ClusterIP #Change to LoadBalancer if you want to check out the metrics page directly

With this in place Prometheus picks it up and adds it to it's targets

Prometheus target

And we can verify that we can see our metrics

Prometheus metrics

Over in Grafana we would like to have a Dashboard with these metrics. As the Grafana installation from the Tanzu package repo is installed with a Sidecar periodically updating dashboards added as Config maps, we simply have to add a Config map with the label grafana-dashboard: "1", but be sure to put it in the same namespace as the Grafana installation

  1apiVersion: v1
  2kind: ConfigMap
  3metadata:
  4  name: plex-grafana
  5  namespace: grafana #Needs to match the Grafana installation namespace
  6  labels:
  7    grafana-dashboard: "1" #Required to have the Grafana sidecar container pick up the new dashboard
  8data:
  9  plex-prometheus.json: |
 10    {
 11      "annotations": {
 12        "list": [
 13          {
 14            "builtIn": 1,
 15            "datasource": "-- Grafana --",
 16            "enable": true,
 17            "hide": true,
 18            "iconColor": "rgba(0, 211, 255, 1)",
 19            "name": "Annotations & Alerts",
 20            "type": "dashboard"
 21          }
 22        ]
 23      },
 24      "editable": true,
 25      "gnetId": null,
 26      "graphTooltip": 0,
 27      "id": 3,
 28      "links": [],
 29      "panels": [
 30        {
 31          "datasource": null,
 32          "fieldConfig": {
 33            "defaults": {
 34              "color": {
 35                "mode": "thresholds"
 36              },
 37              "mappings": [],
 38              "thresholds": {
 39                "mode": "absolute",
 40                "steps": [
 41                  {
 42                    "color": "green",
 43                    "value": null
 44                  },
 45                  {
 46                    "color": "red",
 47                    "value": 80
 48                  }
 49                ]
 50              }
 51            },
 52            "overrides": []
 53          },
 54          "gridPos": {
 55            "h": 8,
 56            "w": 8,
 57            "x": 0,
 58            "y": 0
 59          },
 60          "id": 2,
 61          "options": {
 62            "colorMode": "value",
 63            "graphMode": "area",
 64            "justifyMode": "auto",
 65            "orientation": "auto",
 66            "reduceOptions": {
 67              "calcs": [
 68                "lastNotNull"
 69              ],
 70              "fields": "",
 71              "values": false
 72            },
 73            "text": {},
 74            "textMode": "auto"
 75          },
 76          "pluginVersion": "7.5.6",
 77          "targets": [
 78            {
 79              "exemplar": true,
 80              "expr": "sum(plex_library_section_size_count)",
 81              "instant": true,
 82              "interval": "",
 83              "legendFormat": "",
 84              "refId": "A"
 85            }
 86          ],
 87          "timeFrom": null,
 88          "timeShift": null,
 89          "title": "Total Library items",
 90          "type": "stat"
 91        },
 92        {
 93          "datasource": null,
 94          "fieldConfig": {
 95            "defaults": {
 96              "color": {
 97                "mode": "thresholds"
 98              },
 99              "mappings": [],
100              "thresholds": {
101                "mode": "absolute",
102                "steps": [
103                  {
104                    "color": "green",
105                    "value": null
106                  },
107                  {
108                    "color": "red",
109                    "value": 80
110                  }
111                ]
112              }
113            },
114            "overrides": []
115          },
116          "gridPos": {
117            "h": 8,
118            "w": 8,
119            "x": 8,
120            "y": 0
121          },
122          "id": 5,
123          "options": {
124            "colorMode": "value",
125            "graphMode": "area",
126            "justifyMode": "auto",
127            "orientation": "auto",
128            "reduceOptions": {
129              "calcs": [
130                "lastNotNull"
131              ],
132              "fields": "",
133              "values": false
134            },
135            "text": {},
136            "textMode": "auto"
137          },
138          "pluginVersion": "7.5.6",
139          "targets": [
140            {
141              "exemplar": true,
142              "expr": "sum(plex_sessions_active_count)",
143              "instant": true,
144              "interval": "",
145              "legendFormat": "",
146              "refId": "A"
147            }
148          ],
149          "timeFrom": null,
150          "timeShift": null,
151          "title": "Active sessions",
152          "type": "stat"
153        },
154        {
155          "datasource": null,
156          "fieldConfig": {
157            "defaults": {
158              "color": {
159                "mode": "thresholds"
160              },
161              "mappings": [],
162              "thresholds": {
163                "mode": "absolute",
164                "steps": [
165                  {
166                    "color": "green",
167                    "value": null
168                  },
169                  {
170                    "color": "red",
171                    "value": 80
172                  }
173                ]
174              }
175            },
176            "overrides": []
177          },
178          "gridPos": {
179            "h": 8,
180            "w": 8,
181            "x": 16,
182            "y": 0
183          },
184          "id": 6,
185          "options": {
186            "colorMode": "value",
187            "graphMode": "area",
188            "justifyMode": "auto",
189            "orientation": "auto",
190            "reduceOptions": {
191              "calcs": [
192                "lastNotNull"
193              ],
194              "fields": "",
195              "values": false
196            },
197            "text": {},
198            "textMode": "auto"
199          },
200          "pluginVersion": "7.5.6",
201          "targets": [
202            {
203              "exemplar": true,
204              "expr": "sum(plex_transcode_sessions_active_count)",
205              "instant": true,
206              "interval": "",
207              "legendFormat": "",
208              "refId": "A"
209            }
210          ],
211          "timeFrom": null,
212          "timeShift": null,
213          "title": "Transcode sessions",
214          "type": "stat"
215        },
216        {
217          "datasource": null,
218          "fieldConfig": {
219            "defaults": {
220              "color": {
221                "mode": "palette-classic"
222              },
223              "custom": {
224                "axisLabel": "",
225                "axisPlacement": "auto",
226                "barAlignment": 0,
227                "drawStyle": "line",
228                "fillOpacity": 0,
229                "gradientMode": "none",
230                "hideFrom": {
231                  "graph": false,
232                  "legend": false,
233                  "tooltip": false
234                },
235                "lineInterpolation": "linear",
236                "lineWidth": 1,
237                "pointSize": 5,
238                "scaleDistribution": {
239                  "type": "linear"
240                },
241                "showPoints": "auto",
242                "spanNulls": false
243              },
244              "mappings": [],
245              "thresholds": {
246                "mode": "absolute",
247                "steps": [
248                  {
249                    "color": "green",
250                    "value": null
251                  },
252                  {
253                    "color": "red",
254                    "value": 80
255                  }
256                ]
257              }
258            },
259            "overrides": []
260          },
261          "gridPos": {
262            "h": 8,
263            "w": 12,
264            "x": 0,
265            "y": 8
266          },
267          "id": 7,
268          "options": {
269            "legend": {
270              "calcs": [],
271              "displayMode": "list",
272              "placement": "bottom"
273            },
274            "tooltipOptions": {
275              "mode": "single"
276            }
277          },
278          "pluginVersion": "7.5.6",
279          "targets": [
280            {
281              "exemplar": true,
282              "expr": "sum (plex_library_section_size_count) by (name)",
283              "interval": "",
284              "legendFormat": "{{name}}",
285              "refId": "A"
286            }
287          ],
288          "timeFrom": null,
289          "timeShift": null,
290          "title": "Library items by Name",
291          "type": "timeseries"
292        },
293        {
294          "datasource": null,
295          "fieldConfig": {
296            "defaults": {
297              "color": {
298                "mode": "palette-classic"
299              },
300              "custom": {
301                "axisLabel": "",
302                "axisPlacement": "auto",
303                "barAlignment": 0,
304                "drawStyle": "line",
305                "fillOpacity": 0,
306                "gradientMode": "none",
307                "hideFrom": {
308                  "graph": false,
309                  "legend": false,
310                  "tooltip": false
311                },
312                "lineInterpolation": "linear",
313                "lineWidth": 1,
314                "pointSize": 5,
315                "scaleDistribution": {
316                  "type": "linear"
317                },
318                "showPoints": "auto",
319                "spanNulls": false
320              },
321              "mappings": [],
322              "thresholds": {
323                "mode": "absolute",
324                "steps": [
325                  {
326                    "color": "green",
327                    "value": null
328                  },
329                  {
330                    "color": "red",
331                    "value": 80
332                  }
333                ]
334              },
335              "unit": "short"
336            },
337            "overrides": []
338          },
339          "gridPos": {
340            "h": 8,
341            "w": 12,
342            "x": 12,
343            "y": 8
344          },
345          "id": 4,
346          "options": {
347            "legend": {
348              "calcs": [],
349              "displayMode": "list",
350              "placement": "bottom"
351            },
352            "tooltipOptions": {
353              "mode": "single"
354            }
355          },
356          "pluginVersion": "7.5.6",
357          "targets": [
358            {
359              "exemplar": true,
360              "expr": "sum (plex_sessions_active_count)",
361              "interval": "",
362              "legendFormat": "Sessions",
363              "refId": "A"
364            }
365          ],
366          "timeFrom": null,
367          "timeShift": null,
368          "title": "Active sessions",
369          "type": "timeseries"
370        },
371        {
372          "datasource": null,
373          "fieldConfig": {
374            "defaults": {
375              "color": {
376                "mode": "palette-classic"
377              },
378              "custom": {
379                "axisLabel": "",
380                "axisPlacement": "auto",
381                "barAlignment": 0,
382                "drawStyle": "line",
383                "fillOpacity": 0,
384                "gradientMode": "none",
385                "hideFrom": {
386                  "graph": false,
387                  "legend": false,
388                  "tooltip": false
389                },
390                "lineInterpolation": "linear",
391                "lineWidth": 1,
392                "pointSize": 5,
393                "scaleDistribution": {
394                  "type": "linear"
395                },
396                "showPoints": "auto",
397                "spanNulls": true
398              },
399              "mappings": [],
400              "thresholds": {
401                "mode": "absolute",
402                "steps": [
403                  {
404                    "color": "green",
405                    "value": null
406                  },
407                  {
408                    "color": "red",
409                    "value": 80
410                  }
411                ]
412              }
413            },
414            "overrides": []
415          },
416          "gridPos": {
417            "h": 7,
418            "w": 12,
419            "x": 0,
420            "y": 16
421          },
422          "id": 8,
423          "options": {
424            "legend": {
425              "calcs": [],
426              "displayMode": "list",
427              "placement": "bottom"
428            },
429            "tooltipOptions": {
430              "mode": "single"
431            }
432          },
433          "pluginVersion": "7.5.6",
434          "targets": [
435            {
436              "exemplar": true,
437              "expr": "sum(rate(container_cpu_usage_seconds_total{pod=~\"plex.*\",pod!~\"plex-exporter.*\"}[1m]))by(pod)",
438              "interval": "",
439              "legendFormat": "{{container}}",
440              "refId": "A"
441            }
442          ],
443          "timeFrom": null,
444          "timeShift": null,
445          "title": "Pod CPU usage",
446          "type": "timeseries"
447        },
448        {
449          "datasource": null,
450          "fieldConfig": {
451            "defaults": {
452              "color": {
453                "mode": "palette-classic"
454              },
455              "custom": {
456                "axisLabel": "",
457                "axisPlacement": "auto",
458                "barAlignment": 0,
459                "drawStyle": "line",
460                "fillOpacity": 0,
461                "gradientMode": "none",
462                "hideFrom": {
463                  "graph": false,
464                  "legend": false,
465                  "tooltip": false
466                },
467                "lineInterpolation": "linear",
468                "lineWidth": 1,
469                "pointSize": 5,
470                "scaleDistribution": {
471                  "type": "linear"
472                },
473                "showPoints": "auto",
474                "spanNulls": true
475              },
476              "mappings": [],
477              "thresholds": {
478                "mode": "absolute",
479                "steps": [
480                  {
481                    "color": "green",
482                    "value": null
483                  },
484                  {
485                    "color": "red",
486                    "value": 80
487                  }
488                ]
489              },
490              "unit": "bytes"
491            },
492            "overrides": []
493          },
494          "gridPos": {
495            "h": 7,
496            "w": 12,
497            "x": 12,
498            "y": 16
499          },
500          "id": 9,
501          "options": {
502            "legend": {
503              "calcs": [],
504              "displayMode": "list",
505              "placement": "bottom"
506            },
507            "tooltipOptions": {
508              "mode": "single"
509            }
510          },
511          "pluginVersion": "7.5.6",
512          "targets": [
513            {
514              "exemplar": true,
515              "expr": "sum (container_memory_working_set_bytes{pod=~\"plex.*\",pod!~\"plex-exporter.*\"}) by (container_name, pod_name)",
516              "interval": "",
517              "legendFormat": "Memory",
518              "refId": "A"
519            }
520          ],
521          "timeFrom": null,
522          "timeShift": null,
523          "title": "Pod Memory usage",
524          "type": "timeseries"
525        }
526      ],
527      "refresh": "10s",
528      "schemaVersion": 27,
529      "style": "dark",
530      "tags": [],
531      "templating": {
532        "list": []
533      },
534      "time": {
535        "from": "now-6h",
536        "to": "now"
537      },
538      "timepicker": {},
539      "timezone": "",
540      "title": "Plex",
541      "uid": "w8ttuLbnk",
542      "version": 10
543    }    

When the Grafana dashboard sidecar container picks this up we can see our very simple dashboard in Grafana

Plex dashboard in Grafana

Logging

For logging we have Fluent Bit running in the cluster and it's pushing logs to an external vRealize Log Insight server.

Fluent Bit logs

Backup

Velero is running in my cluster and takes scheduled backups of the Plex namespace. I have annotated the Plex deployment with backup-volumes pointing to the config volume to have backup of that. The media files in the data directory has it's own backup outside of this solution.

1backup.velero.io/backup-volumes=plex-config

With Velero we can create a schedule like this to do hourly backups of our Plex namespace.

1velero create schedule plex-hourly --schedule = "@ every 1h" --include-namespaces plex

For a bit more details on how to configure Velero on TCE check out this blog post

Summary

Wow, this has been a fun, but at times also frustrating, project.

The learning points from this is that there are lots of exciting stuff that can be done with TCE, and that with Tanzu packages it's easy to deploy integrated building blocks into our solution like we've seen with external dns, monitoring, logging and so forth.

But I think that one of the biggest things that I got out of this was the reinforced insight that while Kubernetes is a great way of deploying apps, there are lots of moving parts in play. Coming from an infrastructure and sys admin background I strongly believe that we should not expect (or even let) our developers to be in charge of all of this and that the Ops in DevOps is just as important. Platforms and solutions like the ones that can be delivered in the Tanzu portfolio helps both parts achieving their goals, creating value for their business and it's customers

Thanks for reading, and feel free to reach out if you have any comment

This page was modified on October 22, 2022: Added TCE retirement info