Hi folks:
I'm suddenly having problems with VMWare ESXi 6 on a S2600CWTS motherboard. All was well for the first month or so after I had it running, but in the last two weeks I'm seeing errors in the VMWare event log saying that it lost access to the datastore which is on a RAID 5 array connected to the LSI onboard controller. A couple of times the RAID controller had to rebuild one of the raid drives, but each time it was a different drive, which leads me to believe that the problem isn't a failing drive, but is rather an issue with the controller, or a software/driver issue with ESXi 6. Also one time the controller took two drives offline, causing ESXi to crash. The drives are Samsung 1tb SSD 850 Pro (MZ-7KE1T0BW). After the rebuild, or bringing the drives back online through the RAID controller BIOS, everything worked fine and no data was lost, furthering my thought that the drives aren't the problem. I've also been able to copy VMs off the system without problems and without causing timeout errors, so it doesn't appear that specific data areas on the drives are an issue.
I'm not quite sure how to proceed at this point. I've taken some of the more critical VMs off the server and put them back on the old one until I can get this figured out. For now though can anyone direct me to where I can get the BIOS manual for the LSI controller, as I haven't tried to do an integrity check yet, and I don't want to fool with that until I have the manual in front of me and know what I'm doing. I was able to bring the drives back online, and one time force a rebuild on a drive the controller had marked "Failed" without the manual, but it took me a while to figure out how to do it. I haven't had any luck finding that BIOS manual on the Intel or LSI website.
Also, does anyone know if there is some software that I can add to ESXi 6 that would allow me to access the RAID controller's functions without having to bring ESXi down and do it from the BIOS screen? Dell has some software vib's that you can install into ESXi that allows you access their RAID controller and do things like run an integrity check and do a hot swap without taking down the server. I see the settings in the RAID controller BIOS for things like limiting the amount of resources consumed during an integrity check, which leads to believe that their must be a way to do it without having the system down.
Suggestions welcome.
Thanks,
SRW