Tuesday, April 16 • 2:40pm - 3:20pm
Fine-grained monitoring Swift in the HP Cloud

Sign up or log in to save this to your schedule and see who's attending!

One of the great challenges of of monitoring any large cluster is how much data to collect and how often to collect it. Those responsible for managing the cloud infrastructure want to see everything collected centrally which places limits on how much and how often. Developers on the other hand want to see as much detail as they can at as high a frequency as reasonable without impacting the overall cloud performance.

To address what seems to be conflicting requirements, we've chosen a hybrid model at HP. Like many others, we have a centralized monitoring system that records a set of key system metrics for all servers at the granularity of 1 minute, but at the same time we do fine-grained local monitoring on each server of hundreds of metrics every second so when there are problems that need more details than are available centrally, one can go to the servers in question to see exactly what was going on at any specific time.

The tool of choice for this fine-grained monitoring is the open source tool collectl, which additionally has an extensible api. It is through this api that we've developed a swift monitoring capability to not only capture the number of gets, put, etc every second, but using collectl's colmux utility, we can also display these in a top-like formact to see exactly what all the object and/or proxy servers are doing in real-time.

We've also developer a second cability that allows one to see what the Virtual Machines are doing on each compute node in terms of CPU, disk and network traffic. This data can also be displayed in real-time with colmux.

This talk will briefly introduce the audience to collectl's capabilities but more importantly show how it's used to augment any existing centralized monitoring infrastructure. 

avatar for Mark Seger

Mark Seger

Mark has spent the last 15 years first working in High Performnce Computing, both building monitoring tools and troubleshooting performance problems on large compute clusters and then applying those experieces to OpenStack Clouds,.  He is also the author of collectl, an open source... Read More →

Tuesday April 16, 2013 2:40pm - 3:20pm
C123+C124 (Portland Convention Center)

Attendees (0)