On a beefy machine (24 cores, 96 GB RAM, SSD), booting a single instance (from "nova boot" to ACTIVE) takes seconds. However, when you try booting 20 instances in parallel, the last instance might not be ACTIVE for minutes! While you're waiting, you notice that the host's CPUs and disk are mostly idle and there's plenty of free RAM. While your instances are BUILDING, you wonder what's going on -- why's this taking so long?
It turns out that lengthy portions of the boot process are serialized by contention for software resources, like iptables, database connections, libvirt, and the python interpreter! In this talk, we show how tools like strace and Tracelytics can be used to identify bottlenecks in Openstack. We present techniques for eliminating these bottlenecks, such as coalescing updates to iptables and avoiding greethreads pitfalls, and demonstrate how boot can scale!