{"id":16899,"date":"2012-11-29T19:03:16","date_gmt":"2012-11-30T00:03:16","guid":{"rendered":"https:\/\/nuxx.net\/blog\/?p=16899"},"modified":"2012-11-29T23:04:30","modified_gmt":"2012-11-30T04:04:30","slug":"fixing-nuxx-nets-backups","status":"publish","type":"post","link":"https:\/\/nuxx.net\/blog\/2012\/11\/29\/fixing-nuxx-nets-backups\/","title":{"rendered":"Fixing nuxx.net&#8217;s Backups"},"content":{"rendered":"<p>nuxx.net has a backup system which uses <a href=\"http:\/\/en.wikipedia.org\/wiki\/Rsync\">rsync<\/a> to back up the server (and everything it hosts) to my desktop iMac, deduplicating the files by way of rsync&#8217;s <tt>--link-dest<\/tt> argument to create hard links (<a href=\"https:\/\/nuxx.net\/blog\/2009\/12\/06\/time-machine-for-freebsd\/\">detailed info here<\/a>). Maintaining a 30 day backup history it only uses the space of one full backup plus 29 days of change, which isn&#8217;t much given the utility allowed. It&#8217;s been running fine for a few years, but about a month or so ago I began getting errors where the target disk for the backups was full, which seemed a bit odd to me. As a stopgap I was able to clean up some other things and keep backups working, but the space was a lot more than normal.<\/p>\n<p>Today I managed to get this resolved, but it took a bit of digging. Below the fold you can see what I found.<\/p>\n<p><!--more--><br \/>\nAfter a bit of initial investigation, this is what I knew:<\/p>\n<blockquote><p>&middot; Some files, generally those written before a certain date, were appropriately being deduplicated via hard links and thus had the same <a href=\"http:\/\/en.wikipedia.org\/wiki\/Inode\">inode<\/a>.<br \/>\n&middot; Newer files weren&#8217;t getting deduplicated and had different inodes.<br \/>\n&middot; The lack of deduplication (especially with the large nightly database dumps) was contributing to the volume becoming full.<br \/>\n&middot; Looking at the <tt>Number of files transferred:<\/tt> entries in each of the daily rsync logs the numbers weren&#8217;t increasing, as one would expect if existing files weren&#8217;t being found locally, so the backup must have been referencing the files in <tt>--link-dest<\/tt>. They generally stayed at 100-300 per day, for the last many months.<\/p><\/blockquote>\n<p>By setting up two test cases running <tt>rsync -vvvv<\/tt> (a very verbose mode) against one file that wasn&#8217;t being deduplicated and one that was, I saw these lines in the not-getting-deduplicated log file needed it&#8217;s UID and GID changed:<\/p>\n<blockquote><p><tt>set uid of IMG_7118.jpg from 99 to 1001<\/tt><br \/>\n<tt>set gid of IMG_7118.jpg from 99 to 1001<\/tt><\/p><\/blockquote>\n<p>In both cases I saw that the file wasn&#8217;t being transferred. Both looked similar to this:<\/p>\n<blockquote><p><tt>Number of files: 1<\/tt><br \/>\n<tt>Number of files transferred: 0<\/tt><br \/>\n<tt>Total file size: 4779530 bytes<\/tt><br \/>\n<tt>Total transferred file size: 0 bytes<\/tt><\/p><\/blockquote>\n<p>Checking this file on the target disk showed that it&#8217;s UID and GID were not changed as described in the verbose log, so that gave me a place to start.<\/p>\n<p>I then looked at the disk from the machine itself, and while I could create files, I couldn&#8217;t change their ownership:<\/p>\n<blockquote><p><tt>quicksilver:darktrain.nuxx.net root# touch blah<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root# ls -alsn blah<\/tt><br \/>\n<tt>0 -rw-r--r--  1 99  99  0 Nov 29 13:17 blah<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root# chown 1001:1001 blah<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root# ls -alsn blah<\/tt><br \/>\n<tt>0 -rw-r--r--  1 99  99  0 Nov 29 13:17 blah<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root#<\/tt><\/p><\/blockquote>\n<p>Based on this, I figured I&#8217;d see what <a href=\"http:\/\/en.wikipedia.org\/wiki\/Fsck\">fsck<\/a> had to say:<\/p>\n<blockquote><p><tt>quicksilver:~ root# fsck_hfs -f \/dev\/disk3s3<\/tt><br \/>\n<tt>** \/dev\/rdisk3s3 (NO WRITE)<\/tt><br \/>\n<tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Using cacheBlockSize=32K cacheTotalBlock=65536 cacheSize=2097152K.<\/tt><br \/>\n<tt>&nbsp;&nbsp;&nbsp;Executing fsck_hfs (version diskdev_cmds-557~393).<\/tt><br \/>\n<tt>** Verifying volume when it is mounted with write access.<\/tt><br \/>\n<tt>&nbsp;&nbsp;&nbsp;Journal need to be replayed but volume is read-only<\/tt><br \/>\n<tt>** Checking Journaled HFS Plus volume.<\/tt><br \/>\n<tt>** Detected a case-sensitive volume.<\/tt><br \/>\n<tt>&nbsp;&nbsp;&nbsp;The volume name is Cruft<\/tt><br \/>\n<tt>** Checking extents overflow file.<\/tt><br \/>\n<tt>** Checking catalog file.<\/tt><br \/>\n<tt>** Checking multi-linked files.<\/tt><br \/>\n<tt>&nbsp;&nbsp;&nbsp;Incorrect number of file hard links<\/tt><br \/>\n<tt>** Checking catalog hierarchy.<\/tt><br \/>\n<tt>** Checking extended attributes file.<\/tt><br \/>\n<tt>** Checking volume bitmap.<\/tt><br \/>\n<tt>** Checking volume information.<\/tt><br \/>\n<tt>** The volume Cruft was found corrupt and needs to be repaired.<\/tt><br \/>\n<tt>quicksilver:~ root#<\/tt><\/p><\/blockquote>\n<p>Well, that&#8217;s something&#8230; Maybe <a href=\"http:\/\/developer.apple.com\/library\/mac\/#documentation\/Darwin\/Reference\/ManPages\/man8\/diskutil.8\">diskutil<\/a>, Apple&#8217;s utility for repairing disks, can help out:<\/p>\n<blockquote><p><tt>quicksilver:~ root# diskutil repairVolume \/Volumes\/Cruft\/<\/tt><br \/>\n<tt>Started file system repair on disk3s3 Cruft<\/tt><br \/>\n<tt>Checking file system<\/tt><br \/>\n<tt>Checking Journaled HFS Plus volume<\/tt><br \/>\n<tt>Detected a case-sensitive volume<\/tt><br \/>\n<tt>Checking extents overflow file<\/tt><br \/>\n<tt>Checking catalog file<\/tt><br \/>\n<tt>Checking multi-linked files<\/tt><br \/>\n<tt>Incorrect number of file hard links<\/tt><br \/>\n<tt>Checking catalog hierarchy<\/tt><br \/>\n<tt>Checking extended attributes file<\/tt><br \/>\n<tt>Checking volume bitmap<\/tt><br \/>\n<tt>Checking volume information<\/tt><br \/>\n<tt>Repairing volume<\/tt><br \/>\n<tt>Orphaned file inode (id = 108285)<\/tt><br \/>\n<tt>Orphaned file inode (id = 3609024)<\/tt><br \/>\n<tt>Look for missing items in lost+found directory<\/tt><br \/>\n<tt>Rechecking volume<\/tt><br \/>\n<tt>Checking Journaled HFS Plus volume<\/tt><br \/>\n<tt>Detected a case-sensitive volume<\/tt><br \/>\n<tt>Checking extents overflow file<\/tt><br \/>\n<tt>Checking catalog file<\/tt><br \/>\n<tt>Checking multi-linked files<\/tt><br \/>\n<tt>Checking catalog hierarchy<\/tt><br \/>\n<tt>Incorrect folder count in a directory (id = 2)<\/tt><br \/>\n<tt>(It should be 13 instead of 12)<\/tt><br \/>\n<tt>Checking extended attributes file<\/tt><br \/>\n<tt>Checking volume bitmap<\/tt><br \/>\n<tt>Checking volume information<\/tt><br \/>\n<tt>Repairing volume<\/tt><br \/>\n<tt>Rechecking volume<\/tt><br \/>\n<tt>Checking Journaled HFS Plus volume<\/tt><br \/>\n<tt>Detected a case-sensitive volume<\/tt><br \/>\n<tt>Checking extents overflow file<\/tt><br \/>\n<tt>Checking catalog file<\/tt><br \/>\n<tt>Checking multi-linked files<\/tt><br \/>\n<tt>Checking catalog hierarchy<\/tt><br \/>\n<tt>Checking extended attributes file<\/tt><br \/>\n<tt>Checking volume bitmap<\/tt><br \/>\n<tt>Checking volume information<\/tt><br \/>\n<tt>The volume Cruft was repaired successfully<\/tt><br \/>\n<tt>Volume repair complete<\/tt><br \/>\n<tt>Updating boot support partitions for the volume as required<\/tt><br \/>\n<tt>Finished file system repair on disk3s3 Cruft<\/tt><br \/>\n<tt>quicksilver:~ root#<\/tt><\/p><\/blockquote>\n<p>Unfortunately, that didn&#8217;t make it better either. After reading up a bit on Apple&#8217;s <em>Ignore ownership<\/em> setting for external drives (which I explicitly had to disable when setting this system up and had somewhat forgotten about) I found this setting is stored in <tt>\/var\/db\/volinfo.database<\/tt>:<\/p>\n<blockquote><p><tt>quicksilver:darktrain.nuxx.net root# cat \/var\/db\/volinfo.database<\/tt><br \/>\n<tt>AB8A61B83C455765: 00000001<\/tt><br \/>\n<tt>D3572298D5E48D13: 00000001<\/tt><br \/>\n<tt>66408CB4FDEFB639: 00000001<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root#<\/tt><\/p><\/blockquote>\n<p>Hmm, okay. Only three entries&#8230; That seems odd. I have more drives than that. Apparently this setting is also be reflected as the <tt>noowners<\/tt> option on mount:<\/p>\n<blockquote><p><tt>quicksilver:darktrain.nuxx.net root# mount<\/tt><br \/>\n<tt>\/dev\/disk0s2 on \/ (hfs, local, journaled)<\/tt><br \/>\n<tt>devfs on \/dev (devfs, local, nobrowse)<\/tt><br \/>\n<tt>\/dev\/disk1s2 on \/Volumes\/Thunder (hfs, local, journaled)<\/tt><br \/>\n<tt>map -hosts on \/net (autofs, nosuid, automounted, nobrowse)<\/tt><br \/>\n<tt>map auto_home on \/home (autofs, automounted, nobrowse)<\/tt><br \/>\n<tt>\/dev\/disk2s2 on \/Volumes\/WIN7 (ntfs, local, nodev, nosuid, read-only, noowners)<\/tt><br \/>\n<tt>\/dev\/disk5s2 on \/Volumes\/Time Machine (hfs, local, nodev, nosuid, journaled)<\/tt><br \/>\n<tt>\/dev\/disk3s3 on \/Volumes\/Cruft (hfs, local, nodev, nosuid, journaled, noowners)<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root#<\/tt><\/p><\/blockquote>\n<p>Ah hah! Now to get rid of noowners on <tt>\/Volumes\/Cruft<\/tt>&#8230;<\/p>\n<blockquote><p><tt>quicksilver:~ root# diskutil enableOwnership \/Volumes\/Cruft\/<\/tt><br \/>\n<tt>File system user\/group ownership enabled<\/tt><br \/>\n<tt>quicksilver:~ root#<\/tt><\/p><\/blockquote>\n<p>Are there now more entries in <\/tt>\/var\/db\/volinfo.database<\/tt>?<\/p>\n<blockquote><p><tt>quicksilver:darktrain.nuxx.net root# cat \/var\/db\/volinfo.database<\/tt><br \/>\n<tt>AB8A61B83C455765: 00000001<\/tt><br \/>\n<tt>D3572298D5E48D13: 00000001<\/tt><br \/>\n<tt>66408CB4FDEFB639: 00000001<\/tt><br \/>\n<tt>A345AB3A480EFF2A: 00000001<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root#<\/tt><\/p><\/blockquote>\n<p>Yep! So, did it work?<\/p>\n<blockquote><p><tt>quicksilver:darktrain.nuxx.net root# mount<\/tt><br \/>\n<tt>\/dev\/disk0s2 on \/ (hfs, local, journaled)<\/tt><br \/>\n<tt>devfs on \/dev (devfs, local, nobrowse)<\/tt><br \/>\n<tt>\/dev\/disk1s2 on \/Volumes\/Thunder (hfs, local, journaled)<\/tt><br \/>\n<tt>map -hosts on \/net (autofs, nosuid, automounted, nobrowse)<\/tt><br \/>\n<tt>map auto_home on \/home (autofs, automounted, nobrowse)<\/tt><br \/>\n<tt>\/dev\/disk2s2 on \/Volumes\/WIN7 (ntfs, local, nodev, nosuid, read-only, noowners)<\/tt><br \/>\n<tt>\/dev\/disk5s2 on \/Volumes\/Time Machine (hfs, local, nodev, nosuid, journaled)<\/tt><br \/>\n<tt>\/dev\/disk3s3 on \/Volumes\/Cruft (hfs, local, nodev, nosuid, journaled)<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root#<\/tt><\/p><\/blockquote>\n<p>Mount looks good&#8230;<\/p>\n<blockquote><p><tt>quicksilver:darktrain.nuxx.net root# ls -alsn blah<\/tt><br \/>\n<tt>0 -rw-r--r--  1 0  80  0 Nov 29 16:53 blah<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root# chown 1001:1001 blah<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root# ls -alsn blah<\/tt><br \/>\n<tt>0 -rw-r--r--  1 1001  1001  0 Nov 29 16:53 blah<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root#<\/tt><\/p><\/blockquote>\n<p>Looks like it did! Now to run two backup jobs (one to ensure the data is all there with the right permissions, and the second to reference it) and see if the those two most recent backups (<tt>backup.0<\/tt> and <tt>backup.1<\/tt>) have the previously-problematic file now sharing the same inode:<\/p>\n<blockquote><p><tt>quicksilver:darktrain.nuxx.net root# ls -i backup.*\/c0nsumer\/gallery2_data\/nuxx.net\/albums\/acquired_stuff\/IMG_7118.jpg<\/tt><br \/>\n<tt>80384029 backup.0\/c0nsumer\/gallery2_data\/nuxx.net\/albums\/acquired_stuff\/IMG_7118.jpg<\/tt><br \/>\n<tt>80384029 backup.1\/c0nsumer\/gallery2_data\/nuxx.net\/albums\/acquired_stuff\/IMG_7118.jpg<\/tt><br \/>\n<tt>80144307 backup.2\/c0nsumer\/gallery2_data\/nuxx.net\/albums\/acquired_stuff\/IMG_7118.jpg<\/tt><br \/>\n<tt>79905924 backup.3\/c0nsumer\/gallery2_data\/nuxx.net\/albums\/acquired_stuff\/IMG_7118.jpg<\/tt><br \/>\n<tt>[...snip...]<\/tt><br \/>\n<tt>quicksilver:darktrain.nuxx.net root#<\/tt><\/p><\/blockquote>\n<p>They do! So, that&#8217;s what the problem was&#8230; Because the ownership of the files on the destination (the iMac) couldn&#8217;t be changed, rsync apparently created a copy instead of a hard link. Thus the increase in space. But, apparently because the checksum of the files matched it was never re-transferred, which is why there was not a consistently increasing <tt>Number of files transfered<\/tt> in the rsync log.<\/p>\n<p>That brings about the next question: how did this happen? I suspect that when I replaced my failed SSD in the iMac (where the OS resides) <tt>\/var\/db\/volinfo.database<\/tt> didn&#8217;t get restored by Time Machine. Or, maybe it did and the external drive was given a new identifier? I&#8217;m not sure what the entries in <tt>volinfo.database<\/tt> refer to, as they aren&#8217;t the volume GUIDs, so I can&#8217;t really check that.<\/p>\n<p>Still, I&#8217;m glad this problem is sorted out and I know why it wasn&#8217;t working. I&#8217;ll now just wait 30 days for the old, only-partially-deduplicated backups to rotate out and everything will be good again.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>nuxx.net has a backup system which uses rsync to back up the server (and everything it hosts) to my desktop iMac, deduplicating the files by<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/nuxx.net\/blog\/2012\/11\/29\/fixing-nuxx-nets-backups\/\">Continue reading<span class=\"screen-reader-text\">Fixing nuxx.net&#8217;s Backups<\/span><\/a><\/div>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[],"class_list":["post-16899","post","type-post","status-publish","format-standard","hentry","category-computers","entry"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/16899","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/comments?post=16899"}],"version-history":[{"count":43,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/16899\/revisions"}],"predecessor-version":[{"id":16942,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/16899\/revisions\/16942"}],"wp:attachment":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/media?parent=16899"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/categories?post=16899"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/tags?post=16899"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}