When I got home I started running SeaTools, Seagate’s disk diagnostics utility for Windows, on the ad4 which had begun failing earlier. It reported back that it, and the other hard drive, were just fine. However, when booting into FreeBSD after using them I found that both drives were now indicating that Seek_Error_Rate was past threshold. The OS booted very slow, then kicked ad6 out of the mirror set.
I tried connecting the drives to another, standalone SATA controller (some plain old Maxtor bundle-in one) with new SATA cables and same problem.
So, I’m not sure what to do. Here’s every issue I’ve had with the new server and its resolution:
Issue: Server locking up hard, unexpectedly. MCEs on console.
Resolution: Ensure that only matched RAM is used and that all RAM tests good during burn-in.
Issue: Slow performance / absurd latency while using 3ware disk controller.
Resolution: Identified GIANT-LOCK on driver, moved to using software mirroring.
Issue: One of the original two Western Digital disks used, which were part of a gmirror set, has started giving block errors.
Resolution: Replace disks with brand new Seagate pair.
Issue: Both of the new Seagate drives began failing with excessive Seek_Error_Rate within a few hours of each other after extensive burn in.
Resolution: Unsure.
I can’t help but wonder if one of the Seagates beginning to fail was contributing to the latency observed with the 3ware controller, but as neither was throwing SMART errors at the time, so I discount this.
My current thought is that I should order a pair of server-grade disks, burn them in as before (~50 hours of constant activity), copy the data to them, then see if things will keep working. The failed disks and the unwanted 3ware controller will go back to Newegg, and hopefully things will work right.
I don’t know what other option I have besides scrapping the whole idea of moving servers, but I really rather not do that. If anyone else has any ideas, I’d love to hear them…