{"id":17890,"date":"2014-05-21T23:25:30","date_gmt":"2014-05-22T03:25:30","guid":{"rendered":"https:\/\/nuxx.net\/blog\/?p=17890"},"modified":"2014-05-21T23:25:30","modified_gmt":"2014-05-22T03:25:30","slug":"darktrain-nuxx-net-server-issues-and-disk-replacement","status":"publish","type":"post","link":"https:\/\/nuxx.net\/blog\/2014\/05\/21\/darktrain-nuxx-net-server-issues-and-disk-replacement\/","title":{"rendered":"darktrain.nuxx.net Server Issues and Disk Replacement"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" title=\"zpool status output showing a three-way mirror and L2ARC on an SSD.\" src=\"https:\/\/nuxx.net\/blog\/wp-content\/uploads\/2026\/06\/Screen-Shot-2014-05-21-at-11_21_45-PM.png\" alt=\"\" width=\"570\" height=\"337\" \/><\/p>\n<p>My current webserver, darktrain.nuxx.net <em>(photo gallery retired)<\/em>, has been working well for a couple years, despite needing a proactive (due to bad BIOS chip) motherboard replacement and the normal quirks. This past\u00a0Saturday morning, about 10am, one of the hard drives failed. Due to the use of\u00a0a <a href=\"http:\/\/en.wikipedia.org\/wiki\/ZFS\">ZFS<\/a> mirror pool for the root filesystem this shouldn&#8217;t have caused any problems, but it did. On top of that, due to not rebooting the server in 600-some days I ran into a few other quirks. Here&#8217;s what all happened, in chronological order, to get it running stable\u00a0again:<\/p>\n<ul>\n<li>Second hard disk, <code>\/dev\/ada1<\/code>, fails. ZFS throws up on itself and the storage basically falls out from under the OS. As a result, everything not in memory and database-backed websites fail.<\/li>\n<li>An OS initiated reboot wouldn&#8217;t work (seemed to loop\u00a0during sync) I powered off the server manually.<\/li>\n<li>Upon powering the server up\u00a0disk performance was really bad until \/dev\/ada1 was removed from the mirror pool. After this point disks settled out and all was good.<\/li>\n<li>Outbound email from server wasn&#8217;t\u00a0working due to DKIM-Milter \/ <a href=\"http:\/\/opendkim.org\/\">OpenDKIM<\/a> failing to start. This could be bypassed, but this wasn&#8217;t a good solution because the <a href=\"http:\/\/mmba.org\/forum\">MMBA Forum<\/a> sends a fair bit of email notifications.\u00a0DKIM-Milter failed to start because OpenSSL had been rebuilt due to <a href=\"http:\/\/en.wikipedia.org\/wiki\/Heartbleed\">Heartbleed<\/a>\u00a0 bug, but as I hadn&#8217;t restarted it since upgrading OpenSSL I didn&#8217;t notice the issue.<\/li>\n<li>DKIM-Milter couldn&#8217;t be upgraded from <a href=\"http:\/\/www.freebsd.org\/ports\/\">Ports<\/a>\u00a0because FreeBSD 9.0-RELEASE (which was still running) had been depreciated and Ports intentionally broken on this release.<\/li>\n<li>OS upgraded to <a href=\"http:\/\/www.freebsd.org\/releases\/9.2R\/announce.html\">FreeBSD 9.2-RELEASE<\/a>-p6 using <a href=\"http:\/\/www.freebsd.org\/doc\/handbook\/updating-upgrading-freebsdupdate.html\">freebsd-update<\/a>.\u00a0DNS and mail broke, but this was fairly easy to fix. Update otherwise went smoothly.<\/li>\n<li>Ports updated, OpenDKIM rebuilt, mail working again.<\/li>\n<li>Upgraded ZFS on remaining disk with <code>zpool upgrade -a<\/code> command, then wrote new bootcode to <code>ada0<\/code> using\u00a0<code>gpart bootcode -b \/mnt2\/boot\/pmbr -p \/mnt2\/boot\/gptzfsboot -i 1 ada0<\/code>.<\/li>\n<\/ul>\n<p>At this point the server was stable and I was able to replace the failed disk. The previous setup was with two <a href=\"http:\/\/www.seagate.com\/gb\/en\/internal-hard-drives\/desktop-hard-drives\/desktop-hdd\/?sku=ST1000DM003\">Seagate\u00a0ST1000DM003<\/a>\u00a0disks (the mirror pool) and one Crucial M4 SSD (<a href=\"http:\/\/www.zfsbuild.com\/2010\/04\/15\/explanation-of-arc-and-l2arc\/\">L2ARC<\/a>). The biggest difficulty in replacing the disk is not the $54.44 cost of the replacement purchase; it&#8217;s setting up time to access the server in the data center. Since there was still one free disk bay in the server, instead of just replacing\u00a0the one failed disk I decided to put two new ones in. These will then be configured into a three-way mirror pool with the SSD L2ARC. It cost a bit more, but now when the next magnetic disk dies (remember, all parts die eventually) I can drop it from the pool and still have two properly working drives, all without another data center visit.<\/p>\n<p><a href=\"https:\/\/nuxx.net\/blog\/wp-content\/uploads\/2026\/06\/darktrain_nuxx_net_camcontrol_devinfo_2014-May-21.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" title=\"camcontrol devinfo output after replacing a failed hard drive and adding a second.\" src=\"https:\/\/nuxx.net\/blog\/wp-content\/uploads\/2026\/06\/darktrain_nuxx_net_camcontrol_devinfo_2014-May-21.png\" alt=\"\" width=\"300\" height=\"171\" \/><\/a>During lunch today I headed over to the facility housing the server in <a href=\"http:\/\/en.wikipedia.org\/wiki\/Southfield,_Michigan\">Southfield<\/a>\u00a0(conveniently, only 15-20 minutes from work) and within the span of 12 minutes I&#8217;d met the escorts, downed the server, swapped the disks, and brought it back up confirming that they are in place and functional.<\/p>\n<p>After getting the disks back I used hints from\u00a0the <a href=\"https:\/\/wiki.freebsd.org\/RootOnZFS\/GPTZFSBoot\/Mirror\">FreeBSD Root on ZFS (Mirror) using GPT<\/a>\u00a0article to get the new disks partitioned for swap and boot, then added the <code>\/dev\/ada1p3<\/code> and <code>\/dev\/ada2p3<\/code> partitions to the mirror pool and made sure the L2ARC was working. Now everything&#8217;s (essentially) back to functionally normal, hopefully with better reliability than before.<\/p>\n<p>So, what&#8217;s next? Probably a <a href=\"http:\/\/www.freebsd.org\/releases\/10.0R\/announce.html\">FreeBSD 10.0-RELEASE<\/a> upgrade, and better staying on top of patch levels so I don&#8217;t suffer the same fate as last time. Being a whole version upgrade there&#8217;ll need to be a good bit more planning and testing than this go around, but so long as I&#8217;m doing it less urgently, all should be good.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>My current webserver, darktrain.nuxx.net (photo gallery retired), has been working well for a couple years, despite needing a proactive (due to bad BIOS chip) motherboard replacement and the normal quirks. This past\u00a0Saturday morning, about 10am, one of the hard drives failed. Due to the use of\u00a0a ZFS mirror pool for the root filesystem this shouldn&#8217;t\u2026<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[],"class_list":["post-17890","post","type-post","status-publish","format-standard","hentry","category-computers"],"_links":{"self":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/17890","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/comments?post=17890"}],"version-history":[{"count":8,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/17890\/revisions"}],"predecessor-version":[{"id":17898,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/17890\/revisions\/17898"}],"wp:attachment":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/media?parent=17890"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/categories?post=17890"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/tags?post=17890"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}