According to Wikipedia, Disaster Recovery (DR) is "the process, policies and procedures . . for recovery . . . of technology infrastructure . . . after a natural or human-induced disaster." The ability to recover quickly with minimal data loss after a disaster such as a fire, hurricane, etc., can make the difference between an organization staying in business or vanishing. In an OpenStack environment there are multiple approaches of realizing this recovery which differ in how much work is lost (the recovery point objective - RPO) and how long it takes to recover (the recovery time objective - RTO). These approaches trade-off up-front effort and cost (when there is no disaster) against greater data loss (RPO) and much longer recovery times (RTO) after a disaster. The appropriate approach depends upon the organization's objectives.
In this presentation, after a brief background on DR concepts, we will survey the various approaches that can be used to provide DR for an OpenStack cloud, showing how the up-front investment impacts RPO and RTO. We will start by considering solutions that work in any OpenStack environment, independent of the underlying physical infrastructure; while these solutions are relatively simple, they lead to long recovery times and significant data loss. We will also consider solutions integrated with the application, i.e., provided from within the guest; these solutions typically provide higher quality of service but at the drawback of being application specific. Finally, we will consider approaches which take advantage of advanced functions seen in storage controllers; these approaches can avoid all (or most) data loss and often can recover quickly, but require up front investment.