{"id":17890,"date":"2014-05-21T23:25:30","date_gmt":"2014-05-22T03:25:30","guid":{"rendered":"https:\/\/nuxx.net\/blog\/?p=17890"},"modified":"2014-05-21T23:25:30","modified_gmt":"2014-05-22T03:25:30","slug":"darktrain-nuxx-net-server-issues-and-disk-replacement","status":"publish","type":"post","link":"https:\/\/nuxx.net\/blog\/2014\/05\/21\/darktrain-nuxx-net-server-issues-and-disk-replacement\/","title":{"rendered":"darktrain.nuxx.net Server Issues and Disk Replacement"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" title=\"zpool status output showing a three-way mirror and L2ARC on an SSD.\" src=\"https:\/\/nuxx.net\/gallery\/d\/105984-1\/Screen+Shot+2014-05-21+at+11_21_45+PM.png\" alt=\"\" width=\"570\" height=\"337\" \/><\/p>\n<p>My current webserver, <a href=\"https:\/\/nuxx.net\/gallery\/v\/computers\/darktrain_nuxx_net\/\">darktrain.nuxx.net<\/a>, has been working well for a couple years, despite needing a proactive (due to bad BIOS chip) motherboard replacement and the normal quirks. This past\u00a0Saturday morning, about 10am, one of the hard drives failed. Due to the use of\u00a0a <a href=\"http:\/\/en.wikipedia.org\/wiki\/ZFS\">ZFS<\/a> mirror pool for the root filesystem this shouldn&#8217;t have caused any problems, but it did. On top of that, due to not rebooting the server in 600-some days I ran into a few other quirks. Here&#8217;s what all happened, in chronological order, to get it running stable\u00a0again:<\/p>\n<ul>\n<li>Second hard disk, <code>\/dev\/ada1<\/code>, fails. ZFS throws up on itself and the storage basically falls out from under the OS. As a result, everything not in memory and database-backed websites fail.<\/li>\n<li>An OS initiated reboot wouldn&#8217;t work (seemed to loop\u00a0during sync) I powered off the server manually.<\/li>\n<li>Upon powering the server up\u00a0disk performance was really bad until \/dev\/ada1 was removed from the mirror pool. After this point disks settled out and all was good.<\/li>\n<li>Outbound email from server wasn&#8217;t\u00a0working due to DKIM-Milter \/ <a href=\"http:\/\/opendkim.org\/\">OpenDKIM<\/a> failing to start. This could be bypassed, but this wasn&#8217;t a good solution because the <a href=\"http:\/\/mmba.org\/forum\">MMBA Forum<\/a> sends a fair bit of email notifications.\u00a0DKIM-Milter failed to start because OpenSSL had been rebuilt due to <a href=\"http:\/\/en.wikipedia.org\/wiki\/Heartbleed\">Heartbleed<\/a>\u00a0 bug, but as I hadn&#8217;t restarted it since upgrading OpenSSL I didn&#8217;t notice the issue.<\/li>\n<li>DKIM-Milter couldn&#8217;t be upgraded from <a href=\"http:\/\/www.freebsd.org\/ports\/\">Ports<\/a>\u00a0because FreeBSD 9.0-RELEASE (which was still running) had been depreciated and Ports intentionally broken on this release.<\/li>\n<li>OS upgraded to <a href=\"http:\/\/www.freebsd.org\/releases\/9.2R\/announce.html\">FreeBSD 9.2-RELEASE<\/a>-p6 using <a href=\"http:\/\/www.freebsd.org\/doc\/handbook\/updating-upgrading-freebsdupdate.html\">freebsd-update<\/a>.\u00a0DNS and mail broke, but this was fairly easy to fix. Update otherwise went smoothly.<\/li>\n<li>Ports updated, OpenDKIM rebuilt, mail working again.<\/li>\n<li>Upgraded ZFS on remaining disk with <code>zpool upgrade -a<\/code> command, then wrote new bootcode to <code>ada0<\/code> using\u00a0<code>gpart bootcode -b \/mnt2\/boot\/pmbr -p \/mnt2\/boot\/gptzfsboot -i 1 ada0<\/code>.<\/li>\n<\/ul>\n<p>At this point the server was stable and I was able to replace the failed disk. The previous setup was with two <a href=\"http:\/\/www.seagate.com\/gb\/en\/internal-hard-drives\/desktop-hard-drives\/desktop-hdd\/?sku=ST1000DM003\">Seagate\u00a0ST1000DM003<\/a>\u00a0disks (the mirror pool) and one Crucial M4 SSD (<a href=\"http:\/\/www.zfsbuild.com\/2010\/04\/15\/explanation-of-arc-and-l2arc\/\">L2ARC<\/a>). The biggest difficulty in replacing the disk is not the $54.44 cost of the replacement purchase; it&#8217;s setting up time to access the server in the data center. Since there was still one free disk bay in the server, instead of just replacing\u00a0the one failed disk I decided to put two new ones in. These will then be configured into a three-way mirror pool with the SSD L2ARC. It cost a bit more, but now when the next magnetic disk dies (remember, all parts die eventually) I can drop it from the pool and still have two properly working drives, all without another data center visit.<\/p>\n<p><a href=\"https:\/\/nuxx.net\/gallery\/v\/computers\/darktrain_nuxx_net\/darktrain_nuxx_net_camcontrol_devinfo_2014-May-21.png.html\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" title=\"camcontrol devinfo output after replacing a failed hard drive and adding a second.\" src=\"https:\/\/nuxx.net\/gallery\/d\/105983-2\/darktrain_nuxx_net_camcontrol_devinfo_2014-May-21.png\" alt=\"\" width=\"300\" height=\"171\" \/><\/a>During lunch today I headed over to the facility housing the server in <a href=\"http:\/\/en.wikipedia.org\/wiki\/Southfield,_Michigan\">Southfield<\/a>\u00a0(conveniently, only 15-20 minutes from work) and within the span of 12 minutes I&#8217;d met the escorts, downed the server, swapped the disks, and brought it back up confirming that they are in place and functional.<\/p>\n<p>After getting the disks back I used hints from\u00a0the <a href=\"https:\/\/wiki.freebsd.org\/RootOnZFS\/GPTZFSBoot\/Mirror\">FreeBSD Root on ZFS (Mirror) using GPT<\/a>\u00a0article to get the new disks partitioned for swap and boot, then added the <code>\/dev\/ada1p3<\/code> and <code>\/dev\/ada2p3<\/code> partitions to the mirror pool and made sure the L2ARC was working. Now everything&#8217;s (essentially) back to functionally normal, hopefully with better reliability than before.<\/p>\n<p>So, what&#8217;s next? Probably a <a href=\"http:\/\/www.freebsd.org\/releases\/10.0R\/announce.html\">FreeBSD 10.0-RELEASE<\/a> upgrade, and better staying on top of patch levels so I don&#8217;t suffer the same fate as last time. Being a whole version upgrade there&#8217;ll need to be a good bit more planning and testing than this go around, but so long as I&#8217;m doing it less urgently, all should be good.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>My current webserver, darktrain.nuxx.net, has been working well for a couple years, despite needing a proactive (due to bad BIOS chip) motherboard replacement and the&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/nuxx.net\/blog\/2014\/05\/21\/darktrain-nuxx-net-server-issues-and-disk-replacement\/\">Continue reading<span class=\"screen-reader-text\">darktrain.nuxx.net Server Issues and Disk Replacement<\/span><\/a><\/div>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[],"class_list":["post-17890","post","type-post","status-publish","format-standard","hentry","category-computers","entry"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/17890","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/comments?post=17890"}],"version-history":[{"count":8,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/17890\/revisions"}],"predecessor-version":[{"id":17898,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/17890\/revisions\/17898"}],"wp:attachment":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/media?parent=17890"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/categories?post=17890"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/tags?post=17890"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}