Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint

17.5. Gradual Failures

When replicating extremely large data stores between two quite distant locations, it may become infeasible to achieve real-time or even near-real-time replication delays. A write which hits one datacenter may take up to several hours to be copied to the other, along with all the other simultaneous writes carpooling along in the replication stream. This can make it quite difficult to achieve a tight RPO. If one site were to suddenly fall into the ocean, you would lose several hours' worth of data! Fortunately, datacenters don't fall into oceans very often. Sudden outages generally occur only when a utility power failure combines with a UPS or generator failure (with the odd explosion from time to time). However, there are plenty of other ways for datacenters to fail, and a surprising number of them happen slowly. If you detect the problem and react quickly, this can give you precious time to sync up your replication stream and save the data.

A few years back, we had an HVAC failure in one of our datacenters (an HVAC is basically an air conditioner the size of a Mack truck). This caused the temperature to start rising in one part of the facility. It shouldn't have been that big a deal; the datacenter was designed to be able to lose an HVAC and still keep the ambient temperature at a reasonable level with the remaining units. Unfortunately, there was a fire sensor in the area that was getting hot. It sent a false positive, trigging the alarm. Now, the first thing you do in a datacenter fire (besides dumping Halon, which didn't happen in this case) is to shut off the flow of fresh oxygen to the area. The fancy automated system happily complied by shutting off all the remaining HVAC units. Linux servers are hot. Multiply hot by fifty thousand or so, and a datacenter with no HVAC will become dangerously hot within minutes. But, while the servers were still running, all was not lost. We were able to swiftly flush our replication data and start clean shutdowns. The inside air was so fiery, we had to send in our site operations people in five-minute shifts, to shut down as many machines as they could before they needed to evacuate.


  

You are currently reading a PREVIEW of this book.

                                                                                        

Get instant access to over
$1 million worth of books and videos.

  

Start a Free Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint