Last year we started building our own solution for Performance Monitoring of our Infrastructure platform with the focus on the VMware vSphere environment. The components used for this solution is PowerCLI for extracting the metrics, InfluxDB for storing the metrics, and Grafana for presenting the metrics. I did a Blog series on this project which explains in detail what we did when building the solution. The solution has been very well received and are used daily by many of my colleagues, and we frequently update the solution with new metrics and dashboards.
In a previous post I’ve talked about issues in the StoragePolicy and Tag cmdlets in PowerCLI. I found a workaround by ignoring certificate warnings and setting my date format to en-US. Today I tried to replicate some Storage Policies from one vCenter to another and I found that I got new errors… I can export the policies without issues, but when I try to Import the policy to the new vCenter I get the following error: “Object reference not set to an instance of an object”.
We all love today’s modern web with lots of API’s available, both for retrieving information from various sources, gaining additional insights and for transform and enrich your data. Most API’s today are RESTFUL, meaning that they should follow the REST principles. REST is not a standard, it’s more a guideline for how to design your API. With the REST guidelines in place many API’s share the same or similar structure and with that it gets easier to work with API’s as you can make use of the same techniques.
HPE released it’s 4th version of their OneView management appliance late in December. While version 3 was a great deal better than v2 and v1 I have some expectations on this release as well. I think all versions have had value and the new features and functionalities presented has been for the better. Still it wasn’t until version 3 I really felt that it was a solid product. We’ve run v3 for almost as long as it has been available and have been happy with it.
For those of you that have read my blog you probably know I’ve done a series on performance monitoring infrastructure with the help of InfluxDB. InfluxDB is a part of the TICK stack delivered by InfluxData. All components are open-sourced and available. The TICK stack consists of, Telegraf, InfluxDB, Chronograf and Kapacitor. This post will do a quick review and some examples on how I have started exploring them in my Performance monitoring project.
In our environment we have several HPE Blade Chassis systems. The chassis is managed with the Onboard Administrator (OA) which consists of one or two management modules. Like all other hardware these modules have components that needs firmware to run. And firmware needs to be kept updated to fix bugs, add features, new hardware compatibility and mitigate security risks. It’s also a good thing to keep it pretty close to the iLO version updates on your blades as I suspect HPE might not test newer iLO against a lot of old OA version.
This weekend our primary OneView appliance crashed. This particular OneView appliance handles 10 blade chassis and over 120 blade servers As OneView handles only the management side of the hardware nothing in production was affected by this crash. TLDR; There is a bug in version 3.10.04 which doesn’t delete expired sessions. This is fixed in version 3.10.07 A few troubleshooting steps was taken initially. First we restarted the appliance, it took a while but it stopped when loading it’s resource managers and threw the same error We also gave it some more CPU’s and more RAM to see if it was a resource issue, after powering on the VM it eventually threw the same error Unfortunately we are not doing backups of the appliance from OneView.
If you’ve followed my vSphere performance data blog series you probably have noted that I used InfluxDB as the database for storing the performance data. With over 4 months of performance data in the InfluxDB I’ve picked up some gotcha’s along the way (there’s probably more lying around which I’ve not come over yet). In this blog post I’ll outline what I’ve learned so far (Save) Disk space One of them is of course, and this is an obvious one, the amount of data and the corresponding disk space needed to store it.
Following up on my last post on Automating DRS Groups with PowerCLI I found that we also need to automatically remove VMs and Hosts from a given DRS Group. Although I could have included this in the previous script which creates the groups and adds members I wanted to separate them. There could for instance be times when you would like to run such a script on a different interval than the one that adds members as well as other scenarios.
In vCenter we have lot’s of DRS functionalities. I won’t go into all of them here, you’ll probably know about most of them already. This blog post will talk about the VM/Host affinity functionality, i.e. rules to keep VMs running one or more specific host(s). There is multiple use-cases for this. You might want to keep some VMs together on the same host to minimize latency, maybe you are doing some port mirroring and so on.