About a week back I did a round of updates at home, including updating the Pi-hole container (running in Docker on a Synology DS1019+) to the latest version, v4.2.2. Not long after this I noticed that backups to Backblaze, via Arq running on my main Mac, were stuck with a Caching existing backup metadata (this may take a while) message.
Since it said it might take a while I gave it a few days, but after a week it was likely something was wrong. Turns out it wasn’t caused by any of my updates, but instead by two versions of the the block list HOSTS (v3.5.3 and v3.6.0) — the default block list in Pi-hole — in turn caused by the Polish block list KAD.
How’d I figure it out? Here goes:
First, a wee bit of digging led to this Reddit thread on /r/Arqbackup, and a quick look at Pi-hole showed that yes,
f000.backblazeb2.com is being blocked over and over.
Whitelisting this site allowed backups resume working. But… why?
I then disabled the whitelist entry and updated gravity in Pi-hole (pulling down and compiling a new copy of the blocklists) and everything kept working. So, this seems like a block list might have been the source of the problem.
I only use two block lists, one the Pi-hole default and the other from the COVID-19 Cyber Thread Coalition. Taking a quick look through the current versions (1 · 2) didn’t show anything blocking this site as of this morning, which seemed rational as the blocklist update fixed things. Local DNS for this client is via Pi-hole, which in turn points to my firewall, which is running Unbound to handle all resolution itself. So, it shouldn’t have been caused by a DNS provider blocking things.
Pi-hole automatically updates gravity every Sunday early in the morning, which would about correlate with the Arq problems starting. So maybe this is it? With the last Gravity updates happening on 2021-Apr-04 and 2021-Mar-28 we’ve got a window to look for
f000.backblazeb2.com in blocklists.
The COVID-19 Cyber Threat Coalition domain blocklist was updated this morning, and doesn’t have any obvious version control, so I skipped over this one for now. The second, the Pi-hole default HOSTS, is hosted in GitHub and has regular releases. So let’s look through there…
Grabbing the last four, v3.5.2, v3.5.3, v3.6.0, and v3.6.1 spanned the last 18 days, which should cover the window during which this broke. A quick unzip and grep showed
www.f000.backblazeb2.com in the fakenews, gambling + social, gambling + porn, and social categories in versions 3.5.3 and 3.6.0, but not anything before nor after.
There we go; the reason for the block and it’s all within the observed timeframe. This isn’t a hostname one would normally want to block, as it’s part of BackBlaze’s CDN (PDF). Sounds like an overzealous addition to a blocklist got sucked up into the HOSTS list.
Looking further through the grep output, this was part of the
.../KADhosts/hosts file from the KAD list. It turns out that
f000.backblazeb2.com was added to the KAD list on 2021-Mar-26 and then removed on 2021-Apr-01. HOSTS pulled from KAD for v3.5.3 on 2021-Mar-28 and v3.6.0 2021-Mar-31, which caused it to inherit the block in those versions.
Quite an interesting chain, eh? A Polish ad blocking group makes a change that ends up in the default list for one of the most common DIY adblockers, which in turn breaks access to a fairly common CDN, in turn breaking data backups. It’s dependencies all the way down…
It’s now fixed, and everything would have resolved itself had I waited until Sunday, but at least now I know why.