Press "Enter" to skip to content

Category: computers

HOSTS v3.5.3 and v3.6.0 Broke BackBlaze Backups in Arq

About a week back I did a round of updates at home, including updating the Pi-hole container (running in Docker on a Synology DS1019+) to the latest version, v4.2.2. Not long after this I noticed that backups to Backblaze, via Arq running on my main Mac, were stuck with a Caching existing backup metadata (this may take a while) message.

Since it said it might take a while I gave it a few days, but after a week it was likely something was wrong. Turns out it wasn’t caused by any of my updates, but instead by two versions of the the block list HOSTS (v3.5.3 and v3.6.0) — the default block list in Pi-hole — in turn caused by the Polish block list KAD.

How’d I figure it out? Here goes:

First, a wee bit of digging led to this Reddit thread on /r/Arqbackup, and a quick look at Pi-hole showed that yes, f000.backblazeb2.com is being blocked over and over.

Whitelisting this site allowed backups resume working. But… why?

I then disabled the whitelist entry and updated gravity in Pi-hole (pulling down and compiling a new copy of the blocklists) and everything kept working. So, this seems like a block list might have been the source of the problem.

I only use two block lists, one the Pi-hole default and the other from the COVID-19 Cyber Thread Coalition. Taking a quick look through the current versions (1 · 2) didn’t show anything blocking this site as of this morning, which seemed rational as the blocklist update fixed things. Local DNS for this client is via Pi-hole, which in turn points to my firewall, which is running Unbound to handle all resolution itself. So, it shouldn’t have been caused by a DNS provider blocking things.

Pi-hole automatically updates gravity every Sunday early in the morning, which would about correlate with the Arq problems starting. So maybe this is it? With the last Gravity updates happening on 2021-Apr-04 and 2021-Mar-28 we’ve got a window to look for f000.backblazeb2.com in blocklists.

The COVID-19 Cyber Threat Coalition domain blocklist was updated this morning, and doesn’t have any obvious version control, so I skipped over this one for now. The second, the Pi-hole default HOSTS, is hosted in GitHub and has regular releases. So let’s look through there…

Grabbing the last four, v3.5.2, v3.5.3, v3.6.0, and v3.6.1 spanned the last 18 days, which should cover the window during which this broke. A quick unzip and grep showed f000.backblazeb2.com and www.f000.backblazeb2.com in the fakenews, gambling + social, gambling + porn, and social categories in versions 3.5.3 and 3.6.0, but not anything before nor after.

There we go; the reason for the block and it’s all within the observed timeframe. This isn’t a hostname one would normally want to block, as it’s part of BackBlaze’s CDN (PDF). Sounds like an overzealous addition to a blocklist got sucked up into the HOSTS list.

Looking further through the grep output, this was part of the .../KADhosts/hosts file from the KAD list. It turns out that f000.backblazeb2.com was added to the KAD list on 2021-Mar-26 and then removed on 2021-Apr-01. HOSTS pulled from KAD for v3.5.3 on 2021-Mar-28 and v3.6.0 2021-Mar-31, which caused it to inherit the block in those versions.

Quite an interesting chain, eh? A Polish ad blocking group makes a change that ends up in the default list for one of the most common DIY adblockers, which in turn breaks access to a fairly common CDN, in turn breaking data backups. It’s dependencies all the way down…

It’s now fixed, and everything would have resolved itself had I waited until Sunday, but at least now I know why.

Comments closed

A Home Network Troubleshooting Journey

This week I moved from UniFi to a new setup that included OPNsense on the edge to handle firewall, NAT, and other such tasks on the home network. Built in to OPNsense is a basic NetFlow traffic analyzer called Insight. Looking at this and turning on Reverse lookup something strange popped out: ~22% of the traffic coming in from the internet over the last two hours was from just two hosts: dynamic-75-76-44-147.knology.net and dynamic-75-76-44-149.knology.net.

While reverse DNS worked to resolve the IPs to hostnames (75.76.44.147 to dynamic-75-76-44-147.knology.net and 75.76.44.149 to dynamic-75-76-44-149.knology.net), forward lookup of those hostnames didn’t work. This didn’t really surprise me as the whole DNS situation on the WOW/Knowlogy network is poor, but it did make me more curious. Particularly strange was the IPs being are so close together.

To be sure this is Knology (ruling out intentionally-misleading reverse DNS) I used whois to confirm the addresses are owned by them:

NetRange: 75.76.0.0 - 75.76.46.255
CIDR: 75.76.46.0/24, 75.76.40.0/22, 75.76.0.0/19, 75.76.44.0/23, 75.76.32.0/21
NetName: WIDEOPENWEST
NetHandle: NET-75-76-0-0-1
Parent: NET75 (NET-75-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS12083
Organization: WideOpenWest Finance LLC (WOPW)
RegDate: 2008-02-13
Updated: 2018-08-27
Ref: https://rdap.arin.net/registry/ip/75.76.0.0

My home ISP is Wide Open West (WOW), and Knology is an ISP that they bought in 2012. While I use my ISP directly for internet access (no VPN tunnel to elsewhere), I run my own DNS to avoid their service announcement redirections, so why would I be talking to something else on my ISP’s network?

Could this be someone doing a bunch of scanning of my house? Or just something really misconfigured doing a bunch of broadcasting? Let’s dig in and see…

First I used the Packet capture function in OPNsense to grab a capture on the WAN interface filtered to these two IPs. Looking at it in Wireshark showed it was all HTTPS. Hmm, that’s weird…

A couple coworkers and I have Plex libraries shared with each other, maybe that’s it? The port isn’t right (Plex usually uses 32400) but maybe one of them are running on it in 443 (HTTPS)… But why the two IPs so close to each other? Maybe one of them are getting multiple IPs from their cable modem, have dual WAN links configured on their firewall, and it’s bouncing between them… (This capture only showed the middle of a session, so there was no certificate exchange present to get any service information from.)

Next I did another packet capture on the LAN interface to see if it’s a computer on the network or OPNsense as the local endpoint. This showed it’s coming from my main personal computer, a 27″ iMac at 192.168.0.8 / myopia.--------.nuxx.net, so let’s look there. (Plex doesn’t run on the iMac, so that’s ruled out.)

Conveniently the -k argument to tcpdump on macOS adds packet metadata, such as process name, PID, etc. A basic capture/display on myopia with tcpdump -i en0 -k NP host 75.76.44.149 or 75.76.44.147 to show all traffic going to and from those hosts identified Firefox as the source:

07:39:57.873076 pid firefox.97353 svc BE pktflags 0x2 IP myopia.--------.nuxx.net.53515 > dynamic-75-76-44-147.knology.net.https: Flags [P.], seq 19657:19696, ack 20539524, win 10220, options [nop,nop,TS val 3278271236 ecr 1535621504], length 39
07:39:57.882070 IP dynamic-75-76-44-147.knology.net.https > myopia.--------.nuxx.net.53515: Flags [P.], seq 20539524:20539563, ack 19696, win 123, options [nop,nop,TS val 1535679857 ecr 3278271236], length 39

Well, okay… Odd that my browser would be talking so much HTTPS to my ISP directly. I double-checked that DNS-over-HTTPS was disabled, so it’s not that…

Maybe I can see what these servers are? Pointing curl at one of them to show the headers, the server header indicated proxygen-bolt which is a Facebook framework:

c0nsumer@myopia Desktop % curl --insecure -I https://75.76.44.147
HTTP/2 400
content-type: text/plain
content-length: 0
server: proxygen-bolt
date: Sat, 16 Jan 2021 13:22:57 GMT
c0nsumer@myopia Desktop %

Now we’re getting somewhere…

Finally I pointed openssl at the IP to see what certificate it’s presenting and it’s a wildcard cert for a portion of Facebook’s CDN:

c0nsumer@myopia Desktop % openssl s_client -showcerts -connect 75.76.44.149:443 </dev/null
CONNECTED(00000003)
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert High Assurance EV Root CA
verify return:1
depth=1 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA
verify return:1
depth=0 C = US, ST = California, L = Menlo Park, O = "Facebook, Inc.", CN = *.fdet3-1.fna.fbcdn.net
verify return:1
[SNIP]

As a final test I restarted tcpdump on the iMac then closed the Facebook tab I had open in Firefox and the traffic stopped.

So there’s our answer. All this traffic is to Facebook CDN instances on the Wide Open West / Knology network. It sure seems like a lot for a tab just sitting open in the background, but hey… welcome to the modern internet.


I could have received more information from OPNsense’s Insight by clicking on the pie slice shown above to look at that host in the Details view, but it seems to have an odd quirk. When the Reverse lookup box is checked, clicking the pie slice to jump to the Details view automatically puts the hostname in the (src) Address field, which returns no results (it needs an IP address). I thought this was the tool failing, so I looked to captures for most of the info.

Later on I realized that filtering on the IP showed a bunch more useful information, including two other endpoints within the network talking to these servers (mobile phones), and that HTTPS was also running over UDP, indicating QUIC.

(Bug 4609 was submitted for this issue and AdSchellevis fixed it within a couple hours via commit c797bfd.)

Comments closed

Pi-hole via Docker on Synology DSM with Bonded Network Interface

With consolidating and upgrading my home network I’m moving Pi-hole from a stand-alone Raspberry Pi to running under Docker on my Synology DS1019+ running DiskStation Manager (DSM) v6.2.3.

This was a little bit confusing at first as the web management UI would work, but DNS queries weren’t getting answered. This ended up being caused by the bonded network interface, which is ovs_bond0 instead of the normal default of eth0.

Using the official Pi-hole Docker image, set to run with Host networking (Use the same network as Docker host in the Synology UI), setting or changing the following variables will set up Pi-hole work from first boot, configured to:

  • Listen on ovs_bond0 (instead of the default eth0).
  • Answer DNS queries on the same IP as DSM (192.168.0.2).
  • Run the with the web-based management interface on port 8081 with password piholepassword.
  • Send internal name resolutions to the internal DNS/DHCP server at 192.168.0.1 for clients *.internal.example.com within 192.168.0.0/24.
  • Set the displayed temperature to Farenheit and time zone to America/Detroit.
  • Listen for HTTP requests on http://diskstation.internal.example.com:8081 along side the default pi.hole hostname.

DNS=127.0.0.1
INTERFACE=ovs_bond0
REV_SERVER=True
REV_SERVER_CIDR=192.168.0.0/24
REV_SERVER_DOMAIN=internal.example.com
REV_SERVER_TARGET=192.168.0.1
ServerIP: 192.168.0.2
TEMPERATUREUNIT=f
TZ: America/Detroit
VIRTUAL_HOST: diskstation.internal.example.com
WEB_PORT: 8081
WEBPASSWORD: piholepassword

Additionally, setting up volumes for /etc/dnsmasq.d/ and /etc/pihole/ will ensure changes to the UI persist across restarts and container upgrades. I do this as shown here:

Note: If you stop the Pi-hole container, clear out the contents of these directories, and then restart the container, Pi-hole will set itself up again from the environment variables. This allows tweaking the variables without recreating the container each time.

Comments closed

nginx for HTTPS Request Logging

Consider the following situation: You have a web app from a vendor and during a security scan it crashes. The web app is running over HTTPS with your certificates, but neither the scanning tool nor web app offer sufficient logging to see exactly which request caused the crash.

Because you can’t decrypt HTTPS without access to a client key log file (or making a bunch of TLS changes), and the client is a security scanning tool, Wireshark is not an option to see the triggering request. Fiddler is also likely out, as that’d require the security scanner to trust a new root cert. So what can you do? Stick something else in the way to proxy the connection, logging all the requests!

Having access to the private certificates for the server this is quite easy: set up nginx as a proxy. The only wrinkle is that getting access to all of the request headers requires Lua, so you’ll need to ensure your nginx install supports this. On macOS this was easy using Homebrew to install nginx from denji’s GitHub repository (the default nginx doesn’t support Lua):

brew tap denji/nginx
brew install nginx-full --with-lua-module --with-set-misc-module

This configuration uses the web app’s certificates in nginx to proxy requests it receives to your main site, logging the client IP, request, headers, body, and request status to intercept.log. Requests are broken out by line to make for easy visual reading. You may wish to move this all on to one line to make parsing easy:

events {
}

http {
    log_format custom 'Time: $time_local'
                      '
'
                      'Remote Addr: $remote_addr'
                      '
'
                      'Request: $request'
                      '
'
                      'Request Headers: $request_headers'
                      '
'
                      'Body: $request_body'
                      '
'
                      'Status: $status'
                      '
'
                      '-----';

    server {
        listen 443 ssl;
        server_name example.com;
        access_log /path/to/intercept.log custom;
        ssl_certificate /path/to/cert.pem;
        ssl_certificate_key /path/to/privkey.pem;

        location / {
            proxy_pass https://example.com;
            proxy_set_header Accept-Encoding ''; 
            set_by_lua_block $request_headers {
                local h = ngx.req.get_headers()
                local request_headers_all = ""
                for k, v in pairs(h) do
                    request_headers_all = request_headers_all .. ""..k..": "..v..";"
                end
                return request_headers_all
            }
        }
    }
}

To put this in place, ensure that requests from the scanner go to nginx instead of the web app and then nginx will forward and log the requests. There are a few ways you could do this:

  • Run nginx on the same server as the web app, move the web app to listen to another port for HTTPS, and set proxy_pass to the other port: proxy_pass https://example.com:4430
  • Run nginx on a new server, change the DNS records for the site to point to the new server, and point nginx to the old server by IP: proxy_pass https://192.168.10.10
  • If the scanner tool’s name resolution can be adjusted, such as via a HOSTS file or custom configuration, point it to the nginx proxy for the site name.

To test you can use a web browser on a client computer and a HOSTS file to point the original hostname nginx. To get the screenshot above I ran nginx on iMac running macOS, then in a Windows VM I changed the HOSTS file to map nuxx.net to the iMac’s IP. Firefox on the Windows VM then sent requests for nuxx.net to nginx on macOS which logged and proxied the requests out to the real nuxx.net.

Comments closed

Pi-hole (and PiVPN) with Ubiquiti UniFi

Pi-hole

My home network is based around Ubiquiti’s UniFi, with a Security Gateway (USG) handling the NAT/firewall/routing duties. For ad blocking and to have better control over DNS I use Pi-hole running on a Raspberry Pi.

With the following settings you can have the two working well together with UniFi doing DHCP and Pi-hole doing DNS. Internal forward and reverse resolution will work, which means hostnames will appear properly for internal devices on both consoles while requests are still appropriately Pi-hole’d.

Here’s how:

  • Set up the Pi-hole and put it on the network at a static IP.
  • In Pi-hole, under SettingsDNS turn on:
    • Never forward non-FQDNs
    • Never forward reverse lookups for private IP ranges
    • Conditional forwarding with IP address of your DHCP server (router) as the USG
    • Local domain name (optional) as your internal DNS suffix
  • In the USG, set DHCP to hand out the Pi-hole’s IP for DHCP Name Server.
  • In USG, under ServicesDHCPDHCP Server, set Register client hostname from DHCP requests in USG DNS forwarder to On.
  • Leave the WAN interface’s DNS set to something public, such as what the ISP provides or Google’s 8.8.8.8/8.8.4.4 or whatever. This ensures that if the Pi-hole goes down then the USG can still resolve DNS.

After setting this up clients will use Pi-hole for DNS, as configured via DHCP. Requests for hostnames and addresses on the local network (shortnames or local suffix) will get forwarded to the USG, ensuring ensures that internal requests work properly.

PiVPN

Taking this a step further, I also have PiVPN running on the same Pi, to provide an endpoint for connecting into my home network via Wireguard. Pi-hole and PiVPN integrate very nicely and are designed to work together, making the setup very smooth.

By default, PiVPN sets the Pi-hole as the DNS via a DNS option in the [Interface] section of the config. To ensure appropriately geolocated search results when connected to VPN, use a DNS which supports Extended Client Subnet (ECS) (under SettingsDNS) on the Pi-hole.

(For reference, I’m running Pi-hole on a Raspberry Pi 4 Model B with 2GB of RAM and it has plenty of overhead for both Pi-hole for ~20 devices and sustaining 50 MByte/sec via Wireguard. The Pi-hole section of this was originally written up here on Reddit.)

Comments closed

Mail-Hijacking Malicious Profile on iOS

I was recently asked to look at a family member’s iPad because it was no longer sending email. Turns out that it had been set up to use an additional email account that steals copies of all their outgoing mail. Unfortunately, they didn’t notice until the attacker’s system stopped working and the iPad started showing an error message. Besides the irritating (or worse) spam they saw, their stolen emails could have been used for anything from spear phishing to accessing one’s online accounts, impersonating them, phishing others, delivering targeted spam, fake news / propaganda, etc.

So how did this get set up?

Apparently at some point this person installed the My Accurate Forecast app [1]. Included in this app was a Profile — or a set of settings for Apple devices — that added a second email account with address lazaroburst@my.minbox.email. This account was also set as the outgoing server for their Hotmail (Outlook.com) account.

This person would then have seen all messages in this account, with notifications just like their normal Hotmail email. Worse, everything they sent, from any email account, went to the attacker first. As it’s a separate email account, all the normal spam and malware protections from a normal email provider don’t apply… It’s a firehose of junk straight to their mailbox, with outgoing mail theft frosting on top.

This is bad because not only does it end up with them getting more spam, it allows the attacker to know exactly what they sent and to whom, and to modify those messages before delivering them to the intended recipients.

I think this was likely generated based on geolocated advertising, but it’s possible this individual was specifically targeted. The signed Profile had a name of “WEATHER ALERTS” a description of “Tap ‘Install’ above to get your local radar forecasts and weather alerts in 48062”, showing its intent to deceive; trying to make the normal Profile installation security alert — which is supposed to warn the user of a change to important settings — look like part of an application install.

I’m unsure when this first got installed, but judging by the the Profile signing certificate expiring on December 8, 2016 it was likely within a year or two prior. (Unfortunately I didn’t check the issuance date before deleting the profile.) The Profile which made these changes was signed by secure5g.com, an “advertising” company which has ties to minbox.email (the Unsubscribe link at the bottom of the page is a generic link to a minbox.email page).

A post from June 2018 on Medium, Unwanted Profiles Pop Up in iOS Devices, Inviting Spam and Malware, reports the same problem almost two and a half years ago. Curiously, the handful of other posts I read about this (ref: 1, 2) didn’t mention (or maybe didn’t notice) the outgoing server change? Perhaps because they only noticed before things broke, or maybe this iPad somehow ended up different? (It does seem that at least one other app: Daily Bible Verse, included similar email hijacking.)

Cleaning this up these settings was easy, just a matter of removing the malicious Profile, outgoing mail account, and setting the Hotmail account back to using the appropriate servers. But, who knows what damage was done with the theft of the sent mail and receipt of spammy stuff.


[1] The My Accurate Forecast website still shows screenshots of the app, but does not link to any app stores. It also no longer appears in the Apple App Store, implying that it’s been pulled out.

Comments closed

SharkTapUSB Gen2 Review and PCB Details

For years I’ve used an eBay-purchased Net Optics TP-CU3 (now called Ixia TP-CU3-ST) copper 10/100/1000 Ethernet tap along with a StarTech USB 3.0 to Dual Gigabit NIC for getting external network captures from client computers [1]. The fan in the tap is dying and making a lot of noise. While not just irritating, I believe this is causing the tap to overheat resulting in occasional weirdness in the data [2].

As a replacement I now have a SharkTapUSB Gen2 from midBit Technologies, LLC, and so far it’s working great. Being a simpler device, with a USB NIC built in, it’s much more appropriate my needs. Smaller, simpler to connect, quieter (no fans), and easier to teach coworkers to use. At $249.95 (sold solely via Amazon) it’s also priced fairly.

The SharkTapUSB is a single unit about the size of a deck of cards that is inserted between two Ethernet devices and outputs the captured data to either an Ethernet connection or it’s built-in USB 3.0 gigabit NIC. It also gets power from USB 3.0, eliminating an external power supply. This is perfect for what I’m usually doing, which needing to watch data going in and out of a computer and analyze it in Wireshark.

While the TP-CU3 is excellent and served me well, it also was overkill. It has a bunch of features intended for permanent install / data center use, such as bypass relays to maintain connectivity during power failures, forced air cooling, redundant power supplies, and dual gigabit egress links to support monitoring saturated full duplex connections. Even when the built-in cooling fans are working properly, it’s loud enough to be irritating in a normal office (the SharkTap USB is silent).

Compared to the TP-CU3 there are three downsides to the SharkTapUSB, but for my needs I don’t see them being a problem:

  • Cannot Capture Sustained Full Duplex Traffic: The SharkTapUSB merges the network traffic between two ports and outputs it to a single gigabit NIC. If the traffic being captured is a sustained, full-duplex gigabit flow, this is too much for the capture interface and data will be lost. For me this amount of traffic is rare in practice, especially in situations where I need an external tap. (The SharkTapUSB has a 256KB buffer to accommodate short bursts of high bandwidth traffic.)
  • Link Electrical Status Not Propagated Between Ports: The TP-CU3 uses relays so that when one of the network ports is disconnected electrically the other one is shut down. For example, when the client PC is disconnected, the TP-CU3 drops the electrical link to the switch, so the switch sees the disconnect. The SharkTapUSB does not do this, and keeps the electrical link up on one side when the other is disconnected. Should this be a problem, such as when working with a switch that takes action on link state change, this can be sidestepped by unplugging cables.
  • Link Speed Autodetection: The SharkTapUSB cannot be forced to a particular port speed. However, it does set both ports to the lowest autodetected speed, so port speed can be controlled via settings on a connected device.

After looking at the SharkTapUSB’s block diagram I got curious how it’s actually implemented, so I opened it up to see and grabbed some photos of the Rev F PCB (top, bottom, jumper wires on bottom).

Here’s the notable components:


[1] While captures can be done locally (from within the OS), using tools like Packet Monitor or Wireshark or tcpdump, there are times when an external capture is more useful or the only option, such as:

  • Troubleshooting Intel AMT related issues, as AMT sits between the normal NIC and the external port.
  • Monitoring PXE.
  • OS’ where getting a local capture is complicated, such as Windows PE, embedded stuff in televisions, or mobile OS’ (eg: Android, iOS).
  • Investigating hardware offloads, as a local capture will show invalid data for things like TCP checksum as it’s not calculated before reaching the NIC.

[2] I looked into replacing the fan, but this doesn’t seem practical. The fan is a Sunon GB0535AEV1-8.B2445.GN, which is a combination heatsink and fan, and appears to be epoxied in place. While I can get one via eBay sent from China, I’m unsure if I’ll be able to remove the fan without damaging the chip. Instead I’ll keep the mostly-working tap around for rare occasions when full-duplex monitoring is needed, using the SharkTapUSB for day-to-day use. Perhaps in the future I’ll give a heatsink/fan swap a go…

Comments closed

Bypassing Reolink SSID Length Limitation

I purchased a Reolink E1 Zoom camera for occasional around the house use. It turns out that my SSID, Smart Meter Surveillance Network is too long for their setup app. While the standard is 32 octets (32 ASCII characters) — and my SSID is exactly this — some things, such as the Reolink app, only accept 31 characters. In this case it pulled the SSID from my phone (the network in use) and then truncated it. †

So, I set out to find a workaround, and I did.

During setup the Reolink app walks you through scanning a serial number QR code on the camera, prompts for the wireless network info, and then generates a QR code and displays it on the mobile device’s screen. The camera is then pointed at the screen, this QR code is read, and the camera configures its WiFi settings based on the code.

I figured that maybe if I generated a new QR code with the correct info I’d be able to configure the camera with a longer SSID and it turns out that worked.

After a couple minutes of generating codes I found the configuration QR code is text, formatted as follows, with #### as the last four characters of the camera’s serial number:

<QR><S>ssid</S><P>password</P><C>####</C></QR>

Using the first free online QR code generator I could find, I created a new QR code with containing the following text:

<QR><S>Smart Meter Surveillance Network</S><P>notmyrealpassword</P><C>M77L</C></QR>

I reset the camera, had it scan the new QR code, and it connected to the wireless network. It worked! The camera was now on the wireless network and I was able to connect to it in the app.

There did seem to be a bit of quirkyness in the app, possibly because of the long SSID. It’s working fine with the desktop app, so all is good. It’s also really nice to now have a way of reconfiguring the camera without having to install and use their app.

The standard maximum for SSIDs is 32 octets, or 32 ASCII characters. It appears some companies treat this as 31 characters, reserving the 32nd for the string termination character. Sort-of makes me wonder how I’ve been able to use this one for so long… It was fine with my old Apple AirPorts and I’ve had it running this way for couple years on Ubiquiti UniFi. Although it looks like the UniFi v6 UI now refuses to save changes with this SSID, so I guess I’m going to have to change it…

Comments closed

Simple PAC File Pilot Testing (including WPAD)

In a network that’s isolated from the public internet, such as many enterprise networks, proxy servers are typically used to broker internet access for client computers. Configuring the client computers to use these proxies is often done via a Proxy Auto-Config (PAC) file, code that steers requests so traffic for internal sites stays internal, and public sites go through the proxies.

Commonly these PAC files are made available via Web Proxy Auto-Discovery Protocol (WPAD) as well, because some systems need to automatically discover them. Specifically, in a Windows 10 environment which uses proxies, WPAD is needed because many components of Windows (including the Microsoft Store and Azure Device Registration) will not use the browser’s PAC file settings; it’s dependent on WPAD to find a path to the internet.

WPAD is typically configured via DNS, with a hostname of wpad.companydomain.com (or anything in the DNS Search Suffix List) resolving to the IP of a webserver [1]. This server must then answer an HTTP request for http://x.x.x.x/wpad.dat (where x.x.x.x is the server’s IP) or http://wpad.company.com/wpad.dat with a PAC file, with a Content-Type of x-ns-proxy-autoconfig [2].

Because WPAD requires DNS, something which can’t easily be changed for a subset of users, putting together a mechanism to perform a pilot deployment of a new PAC file can be a bit complicated. When attempting to perform a pilot deployment engineers will often send out a test PAC file URL to be manually configured, but this misses WPAD and does not result in a complete system test.

In order to satisfy WPAD, one can set up a simple webserver to host the new PAC file and a DNS server to answer the WPAD queries. This DNS server forwards all requests except for those for the PAC file to the enterprise DNS, so everything else works as normal. Testing users then only need to change their DNS to receive the pilot PAC file and everything else will work the same; a true pilot deployment.

Below I’ll detail how I use simplified configurations of Unbound and nginx to pilot a PAC file deployment. This can be done from any Windows machine, or with very minor config changes from something as simple as a Raspberry Pi running Linux.

[1] WPAD can be configured via DHCP, but this is only supported by a handful of Microsoft applications. DNS-based WPAD works across all modern OS’.

[2] Some WPAD clients put the server’s IP in the Host: field of the HTTP request.

DNS via Unbound

Unbound is a DNS server that’s straightforward to run and is available on all modern platforms. It’s perfect for our situation where we need to forward all DNS queries to the production infrastructure, modifying only the WPAD/PAC related queries to point to our web server. While it’s quite robust and has a lot of DNSSEC validation options, we don’t need any of that.

This simple configuration forwards all requests to corporate Active Directory-based DNS’ (10.0.1.2 and 10.0.2.2) for everything except the PAC file servers. For these, pacserver.example.com and wpad.example.com, it’ll intercept the request and return our webserver’s address of 10.0.3.25.

server:
interface: 0.0.0.0
access-control: 0.0.0.0/0 allow
module-config: "iterator"

local-zone: "wpad.example.com." static
local-data: "wpad.example.com. IN A 10.0.3.25"

local-zone: "pacserver.example.com." static
local-data: "pacserver.example.com. IN A 10.0.3.25"

stub-zone:
name: "."
stub-addr: 10.0.1.2
stub-addr: 10.0.2.2

This configuration allows recursive queries from any hosts, but by specifying one or more subnets using access-control clauses to you can restrict from where it is usable. The stub-zone clause to send all requests up to two DNS’. If these upstream DNS’ handle recursion for the client, the forward-zone clause can be used instead.

PAC File via nginx

For serving up the PAC file, both for direct queries and those from WPAD, we’ll use nginx, a powerful but easy to use web server to which we can give a minimal config.

Put a copy of your PAC file at …/html/wpad.dat under nginx’s install directory so the server can find it. (There is great information on writing PAC files at FindProxyForUrl.com.)

This simple configuration will set up a web server which serves all files as MIME type application/x-ns-proxy-autoconfig, offering up the wpad.dat file by default (eg: http://pacserver.example.com) or when directly referenced (eg: http://10.0.3.25/wpad.dat or http://wpad.example.com/wpad.dat), satisfying both standard PAC file and WPAD requests.

events {
worker_connections 1024;
}

http {
default_type application/x-ns-proxy-autoconfig;
sendfile on;
keepalive_timeout 65;

server {
listen 80;
server_name localhost;

location / {
root html;
index wpad.dat;
}
}
}

Putting It All Together

With all the files in place and unbound and nginx running, you’re ready to go. Instruct pilot users to manually configure the new DNS, or push this setting out via Group Policy, VPN settings, or some other means. These users will then get the special DNS response for your PAC and WPAD servers, get the pilot PAC file from your web server, and be able to test.

Comments closed

Archiving Gallery 2 with HTTrack

Along with the static copy of the MediaWiki, I’ve been wanting to make a static, archival copy of the Gallery 2 install that I’ve been using to share photos, for 15+ years, at nuxx.net/gallery. Using HTTrack I was able to do so, after a bit of work, resulting in a copy at the same URL and with images accessed using the same paths, from static files.

The result is that I no longer need to run the aging Gallery 2 software, yet links and embedded images that point to my photo gallery did not break.

In the last few years I’ve both seen the traffic drop off, I haven’t posted many new things there, and it seems like the old Internet of pointing people to a personal photo gallery is nearly dead. I believe that blog posts, such as this, with links to specific photos, are where effort should be put. While there is 18+ years of personal history in digital images in my gallery, it doesn’t get used the same way it was 10 years ago.

On the technical side, the relatively-ancient (circa 2008) Gallery 2 has and the ~90GB of data in it has occasionally been a burden. I had to maintain an old copy of PHP just for this app, and this made updating things a pain. While there is a recent project, Gallery the Revival, which aims to update Gallery to newer versions of PHP, this is based around Gallery 3 and a migration to that brings about its own problems, including breaking static links.

I’m still not sure if I want to keep the gallery online but static as it is now, put the web app back up, completely take it off the internet and host it privately at home, or what… but figuring out how to create an archive has given me options.

What follows are my notes on how I used HTTrack, a package specifically designed to mirror websites, to archive nuxx.net’s Photo Gallery. I encountered a few bumps along the way, so this details each and how it was overcome, resulting in the current static copy. To find each of these I’d start HTTrack, let it run for a while, see if it got any errors, fix them, then try again. Eventually I got it to archive cleanly with zero errors:

Gallery Bug 83873

During initial runs, HTTrack finished after ~96MB (out of ~90GB of images) saved, reporting that it was complete. The main portions of the site looked good, but many sub-albums or original-resolution images were zero-byte HTML files on disk and displayed blank in the browser. This was caused by Gallery bug 83873, triggered by using HTTPS on the site. It seems to be fixed by adding the following line just before line 780 in .../modules/core/classes/GallerySession.class:

GalleryCoreApi::requireOnce('modules/core/classes/GalleryTranslator.class');

This error was found by via the following in Apache’s error log:

AH01071: Got error 'PHP message: PHP Fatal error: Class 'GalleryTranslator' not found in /var/www/vhosts/nuxx.net/gallery/modules/core/classes/GallerySession.class on line 780\n', referer: http://nuxx.net/gallery/

Minimize External Links / Footers

To clean things up further, minimizing external links, and make the static copy of the site as simple as possible, I also removed external links in footer by commenting out the external Gallery links and version from the footer, via .../themes/themename/templates/local/theme.tpl and .../themes/themename/templates/local/error.tpl:

<div id="gsFooter">
{*
{g->logoButton type="validation"}
*{g->logoButton type="gallery2"}
*{g->logoButton type="gallery2-version"}
*{g->logoButton type="donate"}
*}
</div>

Remove Details from EXIF/IPTC Plugin

The EXIF/IPTC Plugin for Gallery is excellent because it shows embedded metadata from the original photo, including things like date/time, camera model, location. This presents as a simple Summary view and a lengthier Details view. Unfortunately, when being indexed by HTTrack, selecting of the Details view — done via JavaScript — returns a server error. This shows up in the HTTrack UI as an increasing error count, and server errors as some pages are queried.

To not have a broken link on every page I modified the plugin to remove the Summary and Details view selector so it’d only display Summary, and used the plugin configuration to ensure that every field I wanted was shown in the summary.

To make this change copy .../modules/exif/templates/blocks/ExifInfo.tpl to .../modules/exif/templates/blocks/local/ExifInfo.tpl (to create a local copy, per the Editing Templates doc). Then edit the local copy and comment out lines 43 through 60 so that only the Summary view is displayed:

{* {if ($exif.mode == 'summary')}
* {g->text text="summary"}
* {else}
* <a href="{g->url arg1="controller=exif.SwitchDetailMode"
* arg2="mode=summary" arg3="return=true"}" onclick="return exifSwitchDetailMode({$exif.blockNum},{$item.id},'summary')">
* {g->text text="summary"}
* </a>
* {/if}
* &nbsp;&nbsp;
* {if ($exif.mode == 'detailed')}
* {g->text text="details"}
* {else}
* <a href="{g->url arg1="controller=exif.SwitchDetailMode"
* arg2="mode=detailed" arg3="return=true"}" onclick="return exifSwitchDetailMode({$exif.blockNum},{$item.id},'detailed')">
* {g->text text="details"}
* </a>
* {/if}
*}

Disable Extra Plugins

Finally, I disabled a bunch of plugins which both wouldn’t be useful in a static copy of the site, and cause a number of interconnected links which would make a mirror of the site overly complicated:

  • Search: Can’t search a static site.
  • Google Map Module: Requires a maps API key, which I don’t want to mess with.
  • New Items: There’s nothing new getting posted to a static site.
  • Slideshow: Not needed.

Fix Missing Files

My custom theme, which was based on matrix, linked to some images in the matrix directory which were no longer present in newer versions of the themes, so HTTrack would get 404 errors on these. I copied these files from my custom theme to the .../themes/matrix/images directory to fix this.

Clear Template / Page Cache

After making changes to templates it’s a good idea to clear all the template caches so all pages are rendering with the above changes. While all these steps may be overkill, I do this by going into Site Admin → Performance and setting Guest Users and Registered Users to No acceleration. I then uncheck Enable template caching and click Save. I then click Clear Saved Pages to clear any cached pages, then re-enable template caching and Full acceleration for Guest Users (which HTTrack will be working as).

PANIC! : Too many URLs : >99999

If your Gallery has a lot of images, HTTrack could quit with the error PANIC! : Too many URLs : >99999. Mine did, so I had to run it with the -#L1000000 argument so that it’ll then be limited to 1,000,000 URLs instead of the default 99,999.

Run HTTrack

After all of this, I ran the httrack binary with the security (bandwidth, etc) limits disabled (--disable-security-limits) and used its wizard mode to set up the mirror. The URL to be archived was https://nuxx.net/gallery/, stored in an appropriately named project directory, with no other settings.

CAUTION: Do not disable security limits if you don’t have good controls around the site you are mirroring and the bandwidth between the two. HTTrack has very sane defaults for rate limiting when mirroring that keep its behavior polite, it’s not wise to override these defaults unless you have good control of the source and destination site.

When httrack begins it shows no progress on screen, so I quit with Ctrl-C, switched to the project directory, and ran httrack --continue to allow the mirror to continue and show status info on the screen (the screenshot above). The argument --continue can be used to restart an interrupted mirror, and --update can be used to freshen up a complete mirror.

Alternately, the following command puts this all together, without the wizard:

httrack https://nuxx.net/gallery/ -W -O "/home/username/websites/nuxx.net Photo Gallery" -%v --disable-security-limits -#L1000000

As HTTrack spiders the site it comes across external links and needs to know what to do with them. Because I didn’t specify an action for external links on the command line, it prompts with the question “A link, [linkurl], is located beyond this mirror scope.”. Since I’m not interested in mirroring any external sites (mostly links to recipes or company websites) I answer * which is “Ignore all further links and do not ask any more questions” (text in httrack.c). (I was unable to figure out how to suppress this via a command line option before getting a complete mirror, although it’s likely possible.)

Running from a Dedicated VM

I ran this mirror task from a Linode VM, located in the same region as the VM hosting nuxx.net. This results in all traffic flowing over the Private network, avoiding bandwidth charge.

Because of the ~90GB of images, I set up a Linode 8GB, which has 160GB of disk, 8GB of RAM, and 4 CPUs. This should provide plenty of space for the mirror, with enough resources to allow the tool to work. This VM costs $40/mo (or $0.06/hr), which I find plenty affordable for getting this project done. The mirror took N days to complete, after which I tar’d it up and copied it a few places before deleting the VM.

By having a separate VM I was able to not worry about any dependencies or package problems and delete it after the work is done. All I needed to do on this VM was create a user, put it in the sudoers file, install screen (sudo apt-get install screen) and httrack (sudo apt-get install httrack), and get things running.

Wrapping It All Up

After the mirror was complete I replaced my .../gallery directory with the .../gallery directory from the HTTrack output directory and all was good.

Comments closed