VMware ESXi Server Keeps Running with Failed RAID Array

We at Corner Edge Solutions LOVE VMware.  It’s not too hard to tell that based on our blog, but this past week we found a new reason to fall in love all over again.  One of our ESXi 4 servers in a cluster had a double drive failure on our RAID 5 array, which would have completely crashed a server had it been a typical setup, but since it is running VMware ESXi with all the VM DataStores on a iSCSI storage device, we had ZERO impact on our environment.  ESXi is the lightweight version of the original ESX server which runs entirely in memory, not requiring disk access once it has been loaded at startup.

Since this machine was part of a cluster, we simply migrated the VMs on the failed server to the other working ESXi server through vSphere vCenter Server.  The working VMware server was able to overcommit the available physical memory by almost 50% with room to spare.  We then took down the server with the bad drives to rebuild.  We also took this opportunity to install the OS onto a USB flash drive, which installed internally to the server, and remove the remaining two working hard drives to run a completely diskless server configuration.  With a small amount of configuring to VMware, the newly rebuilt server was ready to join the cluster again and the VMs were then evenly distributed throughout the cluster, all the while never having to power anything off.  That means never having to send out maintenance notices to customers that their hosted servers will be offline, and keeping out uptime in tact.  The whole process took only about 5 hours as well.  When was the last time a total failure on a system RAID drive, and nothing went down, and everything was upgraded and repaired in 5 hours?