nuxx.net

darktrain.nuxx.net Server Issues and Disk Replacement

My current webserver, darktrain.nuxx.net, has been working well for a couple years, despite needing a proactive (due to bad BIOS chip) motherboard replacement and the normal quirks. This past Saturday morning, about 10am, one of the hard drives failed. Due to the use of a ZFS mirror pool for the root filesystem this shouldn’t have caused any problems, but it did. On top of that, due to not rebooting the server in 600-some days I ran into a few other quirks. Here’s what all happened, in chronological order, to get it running stable again:

At this point the server was stable and I was able to replace the failed disk. The previous setup was with two Seagate ST1000DM003 disks (the mirror pool) and one Crucial M4 SSD (L2ARC). The biggest difficulty in replacing the disk is not the $54.44 cost of the replacement purchase; it’s setting up time to access the server in the data center. Since there was still one free disk bay in the server, instead of just replacing the one failed disk I decided to put two new ones in. These will then be configured into a three-way mirror pool with the SSD L2ARC. It cost a bit more, but now when the next magnetic disk dies (remember, all parts die eventually) I can drop it from the pool and still have two properly working drives, all without another data center visit.

During lunch today I headed over to the facility housing the server in Southfield (conveniently, only 15-20 minutes from work) and within the span of 12 minutes I’d met the escorts, downed the server, swapped the disks, and brought it back up confirming that they are in place and functional.

After getting the disks back I used hints from the FreeBSD Root on ZFS (Mirror) using GPT article to get the new disks partitioned for swap and boot, then added the /dev/ada1p3 and /dev/ada2p3 partitions to the mirror pool and made sure the L2ARC was working. Now everything’s (essentially) back to functionally normal, hopefully with better reliability than before.

So, what’s next? Probably a FreeBSD 10.0-RELEASE upgrade, and better staying on top of patch levels so I don’t suffer the same fate as last time. Being a whole version upgrade there’ll need to be a good bit more planning and testing than this go around, but so long as I’m doing it less urgently, all should be good.

Exit mobile version