Send vCenter Events to Alertmanager With VEBA

Overview

In this post we will see how we can integrate Alertmanager with vCenter utilizing the VMware Event Broker Appliance, a.k.a VEBA.

The example used in this blog post is using the event raised when a DRS Fault happens, e.g. when a VM is violating an affinity rule.

First let's get some background about the components used in this post. If you've already familiar with both Alertmanager, vCenter events and VEBA, feel free to skip to this section.

Background and use case

vCenter events and VEBA

In vCenter there are 1900+ events available, a number that grows with every release. For a list of events please refer to William Lam's event mapping repo

These events can be seen in vCenter, but there's not much more that can be done from there.

With the introduction of the VMware Event Broker Appliance fling in 2019 a whole new world of event-driven automation was unlocked.

VEBA let's us create "functions" that can be linked to events, where these "functions" can run scripts that can be used to integrate with pretty much anything.

A lot more information on VEBA can be found over on their website.

Alertmanager

Alertmanager is an open-source tool to handle alerts. It's heavily integrated with Prometheus, but it can also handle alerts from other sources (clients).

In this post we will make use of the Alertmanager API to create alerts based on events processed by VEBA.

Alertmanager integrates with a lot of 3rd party solutions for ticket creation, messaging and more, and there's also generic webhooks available to use.

Why VEBA and Alertmanager?

Or, why couldn't we have VEBA just integrate directly with whatever is the end consumer/platform? E.g. there's already lots of examples with VEBA sending to Slack or PagerDuty directly?

There's a couple of reasons for this.

First, Alertmanager is a tool for handling alerts. It has capabilities for grouping alerts to a single notification, and also suppressing notifications based on other alerts. Also it offers silencing where we can silence alerts based on filters, for instance when doing maintenance.

Secondly and building on the first, Alertmanager can be serving as the initial platform for receiving alerts from multiple sources, and then based on rules redirect these to whatever platform/tool we want to.

There's probably a lot of other tools out there that does similar stuff, but this is the route we're evaluating.

Let's get to work

Now with the background and use case handled, let's see how we can combine these two great tools.

Posting to Alertmanager

We'll use the V2 /alerts endpoint in Alertmanager to post alerts to. There's a schema model for how alerts needs to be formatted and which properties that needs to be included.

We'll post alerts that includes a label set with the name of the alert, the vm, host and cluster (if these are available) and the message in the event.

Note that we will make a generic function that is meant to handle all types of events so we'll have to have the labels pretty generic. Also note that we will only trigger on a few specific events so we'll not be flooded with alerts.

In addition to the labels we'll add the timestamp of the alert and a few other annotations.

The body of our alert will look like this

 1[
 2  {
 3    "generatorURL": "https://vcenter.local/sdk",
 4    "startsAt": "2023-06-08T10:01:16.407Z",
 5    "annotations": {
 6      "message": "vCenter event",
 7      "type": "com.vmware.event.router/event"
 8    },
 9    "labels": {
10      "vm": null,
11      "message": "Some kind of message that comes from the event",
12      "cluster": null,
13      "alertname": "<Event subject>",
14      "host": null
15    }
16  }
17]

Powershell handler script

To handle the event and post to the Alertmanager API we will use Powershell. There's plenty of example scripts that sends webhooks / requests to APIs so we won't go into details in this blog post.

For this script I started with the Slack function from the VEBA examples repo and modified it to work with the Alertmanager API.

The VEBA powershell examples uses a cmdlet that takes care of decoding the Eventdata pushed from the Event router in VEBA. This cmdlet is included in the base container we'll build our function containers from.

 1param(
 2  [Parameter(Position=0,Mandatory=$true)][CloudNative.CloudEvents.CloudEvent]$CloudEvent
 3)
 4
 5# Decode CloudEvent
 6try {
 7  $cloudEventData = $cloudEvent | Read-CloudEventJsonData -Depth 10
 8} catch {
 9  throw "`nPayload must be JSON encoded"
10}

The Alertmanager specific part of this is actually nothing more than how we create the payload. We'll use the $cloudEventData variable which includes the event specific stuff, as well as the outer $cloudEvent which has some metadata about the event (e.g. the Subject and timestamp).

 1$payload = @(
 2    @{
 3      labels = @{
 4          alertname = $cloudEvent.subject;
 5          vm = $cloudEventData.Vm.Name;
 6          host = $cloudEventData.Host.Name;
 7          cluster = $cloudEventData.ComputeResource.Name;
 8          message = $cloudEventData.FullFormattedMessage;
 9      };
10      annotations = @{
11          message = "vCenter event";
12          type = $cloudEvent.Type
13      };
14      startsAt = $cloudEventData.CreatedTime
15      generatorURL = $cloudEvent.Source
16    }
17)

The payload will be posted to the API url of Alertmanager which we'll have in a Kubernetes secret. This could have been in a config map or ENV variable as well, but we're doing it the same way as the Slack example to more easily follow the process, and since we might add some user details going forward.

1try {
2  Invoke-WebRequest -Uri $(${jsonSecrets}.ALERTMANAGER_URL) -Method POST -ContentType "application/json" -Body $body
3} catch {
4  throw "$(Get-Date) - Failed to send Alertmanager Message: $($_)"
5}

The Powershell script will be copied over to a container which we will build with a Dockerfile and then pushed to a container registry.

1FROM us.gcr.io/daisy-284300/veba/ce-ps-base:1.4
2
3COPY handler.ps1 handler.ps1
4
5CMD ["pwsh","./server.ps1"]

We will not go through the process of building, testing and pushing the container image to a registry. An example and guide of that process can be found here and here. Also be sure to check out the full VEBA series from Patrick Kremer

The knative function and triggers

The function we'll deploy to the VEBA Kubernetes cluster is a Knative construct.

First we have a Knative service (which is sometime referred to as a function)

 1apiVersion: serving.knative.dev/v1
 2kind: Service
 3metadata:
 4  name: kn-ps-alertmanager
 5  labels:
 6    app: veba-ui
 7spec:
 8  template:
 9    metadata:
10      annotations:
11        autoscaling.knative.dev/maxScale: "1"
12        autoscaling.knative.dev/minScale: "1"
13    spec:
14      containers:
15        - image: <container-repo>/veba/kn-ps-alertmanager:<tag>
16          envFrom:
17            - secretRef:
18                name: alertmanager-secret
19          env:
20            - name: FUNCTION_DEBUG
21              value: "true"
22

Note that we're setting the FUNCTION_DEBUG variable to true, and we're using a secret for adding the Alertmanager URL. We're also adding a label for it to show in the VEBA UI.

This Service resource will be used by all events that we want to forward to Alertmanager.

We'll also create different Triggers based on which events we want VEBA to forward.

 1apiVersion: eventing.knative.dev/v1
 2kind: Trigger
 3metadata:
 4  name: veba-ps-alertmanager-trigger-drsfault
 5  labels:
 6    app: veba-ui
 7spec:
 8  broker: default
 9  filter:
10    attributes:
11      type: com.vmware.event.router/event
12      subject: DrsSoftRuleViolationEvent
13  subscriber:
14    ref:
15      apiVersion: serving.knative.dev/v1
16      kind: Service
17      name: kn-ps-alertmanager
18

So for each event we'll want to forward/handle we'll create a Trigger that filters on the Event subject. Notice that the trigger points to the Knative service mentioned above.

Deploying to Kubernetes

Let's stich things together and deploy to the VEBA Kubernetes cluster

First we'll add the secrets file with the Alertmanager specifics

1kubectl -n vmware-functions create secret generic alertmanager-secret --from-file=ALERTMANAGER_SECRET=alertmanager_secret.json

Create secret

Now let's create the Knative service. Note that you'll have to update the path to the image and optionally the imagepullsecrets if required

1kubectl -n vmware-functions apply -f service.yaml

Create Knative service

With that in place we can deploy a Trigger that will run the service with the Cloud Event coming from the Event Router

1kubectl -n vmware-functions apply -f trigger-drsfault.yaml

Create Knative trigger

Let's see it in action

The event used as an example in this blog post is the event that is being raised when a VM is violating an affinity rule, i.e. a DRS Fault.

From the vCenter side of things we'll see a DRS Fault

DRS Fault in vCenter

vCenter event

The Subject in this case is DrsSoftRuleViolationEvent. Note that based on the DRS rule you might have a corresponding hard rule event, DrsRuleViolationEvent

In VEBA we'll see that we catch this event

VEBA event

And with the event firing we should now also see it in Alertmanager

Alertmanager event

Alertmanager expects the clients to re-send alerts as long as they are active. This DRS event fires every three minutes which is perfect since by default Alertmanager will resolve alerts after they're not seen for five minutes.

Summary

In this blog post we've seen how to integrate vCenter with Alertmanager through the VMware Event Broker Appliance. This integration will allow us to utilize Alertmanager's features for alert handling capabilities.

The files used in this blog post can be found in this Github repo

Thanks for reading and feel free to reach out with any comments or questions.

This page was modified on June 9, 2023: Fixed usecase