Press "Enter" to skip to content

Category: computers

Pi-hole via Docker on Synology DSM with Bonded Network Interface

With consolidating and upgrading my home network I’m moving Pi-hole from a stand-alone Raspberry Pi to running under Docker on my Synology DS1019+ running DiskStation Manager (DSM) v6.2.3.

This was a little bit confusing at first as the web management UI would work, but DNS queries weren’t getting answered. This ended up being caused by the bonded network interface, which is ovs_bond0 instead of the normal default of eth0.

Using the official Pi-hole Docker image, set to run with Host networking (Use the same network as Docker host in the Synology UI), setting or changing the following variables will set up Pi-hole work from first boot, configured to:

  • Listen on ovs_bond0 (instead of the default eth0).
  • Answer DNS queries on the same IP as DSM (192.168.0.2).
  • Run the with the web-based management interface on port 8081 with password piholepassword.
  • Send internal name resolutions to the internal DNS/DHCP server at 192.168.0.1 for clients *.internal.example.com within 192.168.0.0/24.
  • Set the displayed temperature to Farenheit and time zone to America/Detroit.
  • Listen for HTTP requests on http://diskstation.internal.example.com:8081 along side the default pi.hole hostname.

DNS=127.0.0.1
INTERFACE=ovs_bond0
REV_SERVER=True
REV_SERVER_CIDR=192.168.0.0/24
REV_SERVER_DOMAIN=internal.example.com
REV_SERVER_TARGET=192.168.0.1
ServerIP: 192.168.0.2
TEMPERATUREUNIT=f
TZ: America/Detroit
VIRTUAL_HOST: diskstation.internal.example.com
WEB_PORT: 8081
WEBPASSWORD: piholepassword

Additionally, setting up volumes for /etc/dnsmasq.d/ and /etc/pihole/ will ensure changes to the UI persist across restarts and container upgrades. I do this as shown here:

Note: If you stop the Pi-hole container, clear out the contents of these directories, and then restart the container, Pi-hole will set itself up again from the environment variables. This allows tweaking the variables without recreating the container each time.

Comments closed

nginx for HTTPS Request Logging

Consider the following situation: You have a web app from a vendor and during a security scan it crashes. The web app is running over HTTPS with your certificates, but nor the scanning tool or web app offer sufficient logging to see exactly which request caused the crash.

Because you can’t decrypt HTTPS without access to a client key log file (or making a bunch of TLS changes), and the client is a security scanning tool, Wireshark is not an option to see the triggering request. Fiddler is also likely out, as that’d require the security scanner to trust a new root cert. So what can you do? Stick something else in the way to proxy the connection, logging all the requests!

Having access to the private certificates for the server this is quite easy: set up nginx as a proxy. The only wrinkle is that getting access to all of the request headers requires Lua, so you’ll need to ensure your nginx install supports this. On macOS this was easy using Homebrew to install nginx from denji’s GitHub repository (the default nginx doesn’t support Lua):

brew tap denji/nginx
brew install nginx-full --with-lua-module --with-set-misc-module

This configuration uses the web app’s certificates in nginx to proxy requests it receives to your main site, logging the client IP, request, headers, body, and request status to intercept.log. Requests are broken out by line to make for easy visual reading. You may wish to move this all on to one line to make parsing easy:

events {
}

http {
    log_format custom 'Time: $time_local'
                      '
'
                      'Remote Addr: $remote_addr'
                      '
'
                      'Request: $request'
                      '
'
                      'Request Headers: $request_headers'
                      '
'
                      'Body: $request_body'
                      '
'
                      'Status: $status'
                      '
'
                      '-----';

    server {
        listen 443 ssl;
        server_name example.com;
        access_log /path/to/intercept.log custom;
        ssl_certificate /path/to/cert.pem;
        ssl_certificate_key /path/to/privkey.pem;

        location / {
            proxy_pass https://example.com;
            proxy_set_header Accept-Encoding ''; 
            set_by_lua_block $request_headers {
                local h = ngx.req.get_headers()
                local request_headers_all = ""
                for k, v in pairs(h) do
                    request_headers_all = request_headers_all .. ""..k..": "..v..";"
                end
                return request_headers_all
            }
        }
    }
}

To put this in place, ensure that requests from the scanner go to nginx instead of the web app and then nginx will forward and log the requests. There are a few ways you could do this:

  • Run nginx on the same server as the web app, move the web app to listen to another port for HTTPS, and set proxy_pass to the other port: proxy_pass https://example.com:4430
  • Run nginx on a new server, change the DNS records for the site to point to the new server, and point nginx to the old server by IP: proxy_pass https://192.168.10.10
  • If the scanner tool’s name resolution can be adjusted, such as via a HOSTS file or custom configuration, point it to the nginx proxy for the site name.

To test you can use a web browser on a client computer and a HOSTS file to point the original hostname nginx. To get the screenshot above I ran nginx on iMac running macOS, then in a Windows VM I changed the HOSTS file to map nuxx.net to the iMac’s IP. Firefox on the Windows VM then sent requests for nuxx.net to nginx on macOS which logged and proxied the requests out to the real nuxx.net.

Comments closed

Pi-hole (and PiVPN) with Ubiquiti UniFi

Pi-hole

My home network is based around Ubiquiti’s UniFi, with a Security Gateway (USG) handling the NAT/firewall/routing duties. For ad blocking and to have better control over DNS I use Pi-hole running on a Raspberry Pi.

With the following settings you can have the two working well together with UniFi doing DHCP and Pi-hole doing DNS. Internal forward and reverse resolution will work, which means hostnames will appear properly for internal devices on both consoles while requests are still appropriately Pi-hole’d.

Here’s how:

  • Set up the Pi-hole and put it on the network at a static IP.
  • In Pi-hole, under SettingsDNS turn on:
    • Never forward non-FQDNs
    • Never forward reverse lookups for private IP ranges
    • Conditional forwarding with IP address of your DHCP server (router) as the USG
    • Local domain name (optional) as your internal DNS suffix
  • In the USG, set DHCP to hand out the Pi-hole’s IP for DHCP Name Server.
  • In USG, under ServicesDHCPDHCP Server, set Register client hostname from DHCP requests in USG DNS forwarder to On.
  • Leave the WAN interface’s DNS set to something public, such as what the ISP provides or Google’s 8.8.8.8/8.8.4.4 or whatever. This ensures that if the Pi-hole goes down then the USG can still resolve DNS.

After setting this up clients will use Pi-hole for DNS, as configured via DHCP. Requests for hostnames and addresses on the local network (shortnames or local suffix) will get forwarded to the USG, ensuring ensures that internal requests work properly.

PiVPN

Taking this a step further, I also have PiVPN running on the same Pi, to provide an endpoint for connecting into my home network via Wireguard. Pi-hole and PiVPN integrate very nicely and are designed to work together, making the setup very smooth.

By default, PiVPN sets the Pi-hole as the DNS via a DNS option in the [Interface] section of the config. To ensure appropriately geolocated search results when connected to VPN, use a DNS which supports Extended Client Subnet (ECS) (under SettingsDNS) on the Pi-hole.

(For reference, I’m running Pi-hole on a Raspberry Pi 4 Model B with 2GB of RAM and it has plenty of overhead for both Pi-hole for ~20 devices and sustaining 50 MByte/sec via Wireguard. The Pi-hole section of this was originally written up here on Reddit.)

Comments closed

Mail-Hijacking Malicious Profile on iOS

I was recently asked to look at a family member’s iPad because it was no longer sending email. Turns out that it had been set up to use an additional email account that steals copies of all their outgoing mail. Unfortunately, they didn’t notice until the attacker’s system stopped working and the iPad started showing an error message. Besides the irritating (or worse) spam they saw, their stolen emails could have been used for anything from spear phishing to accessing one’s online accounts, impersonating them, phishing others, delivering targeted spam, fake news / propaganda, etc.

So how did this get set up?

Apparently at some point this person installed the My Accurate Forecast app [1]. Included in this app was a Profile — or a set of settings for Apple devices — that added a second email account with address lazaroburst@my.minbox.email. This account was also set as the outgoing server for their Hotmail (Outlook.com) account.

This person would then have seen all messages in this account, with notifications just like their normal Hotmail email. Worse, everything they sent, from any email account, went to the attacker first. As it’s a separate email account, all the normal spam and malware protections from a normal email provider don’t apply… It’s a firehose of junk straight to their mailbox, with outgoing mail theft frosting on top.

This is bad because not only does it end up with them getting more spam, it allows the attacker to know exactly what they sent and to whom, and to modify those messages before delivering them to the intended recipients.

I think this was likely generated based on geolocated advertising, but it’s possible this individual was specifically targeted. The signed Profile had a name of “WEATHER ALERTS” a description of “Tap ‘Install’ above to get your local radar forecasts and weather alerts in 48062”, showing its intent to deceive; trying to make the normal Profile installation security alert — which is supposed to warn the user of a change to important settings — look like part of an application install.

I’m unsure when this first got installed, but judging by the the Profile signing certificate expiring on December 8, 2016 it was likely within a year or two prior. (Unfortunately I didn’t check the issuance date before deleting the profile.) The Profile which made these changes was signed by secure5g.com, an “advertising” company which has ties to minbox.email (the Unsubscribe link at the bottom of the page is a generic link to a minbox.email page).

A post from June 2018 on Medium, Unwanted Profiles Pop Up in iOS Devices, Inviting Spam and Malware, reports the same problem almost two and a half years ago. Curiously, the handful of other posts I read about this (ref: 1, 2) didn’t mention (or maybe didn’t notice) the outgoing server change? Perhaps because they only noticed before things broke, or maybe this iPad somehow ended up different? (It does seem that at least one other app: Daily Bible Verse, included similar email hijacking.)

Cleaning this up these settings was easy, just a matter of removing the malicious Profile, outgoing mail account, and setting the Hotmail account back to using the appropriate servers. But, who knows what damage was done with the theft of the sent mail and receipt of spammy stuff.


[1] The My Accurate Forecast website still shows screenshots of the app, but does not link to any app stores. It also no longer appears in the Apple App Store, implying that it’s been pulled out.

Comments closed

SharkTapUSB Gen2 Review and PCB Details

For years I’ve used an eBay-purchased Net Optics TP-CU3 (now called Ixia TP-CU3-ST) copper 10/100/1000 Ethernet tap along with a StarTech USB 3.0 to Dual Gigabit NIC for getting external network captures from client computers [1]. The fan in the tap is dying and making a lot of noise. While not just irritating, I believe this is causing the tap to overheat resulting in occasional weirdness in the data [2].

As a replacement I now have a SharkTapUSB Gen2 from midBit Technologies, LLC, and so far it’s working great. Being a simpler device, with a USB NIC built in, it’s much more appropriate my needs. Smaller, simpler to connect, quieter (no fans), and easier to teach coworkers to use. At $249.95 (sold solely via Amazon) it’s also priced fairly.

The SharkTapUSB is a single unit about the size of a deck of cards that is inserted between two Ethernet devices and outputs the captured data to either an Ethernet connection or it’s built-in USB 3.0 gigabit NIC. It also gets power from USB 3.0, eliminating an external power supply. This is perfect for what I’m usually doing, which needing to watch data going in and out of a computer and analyze it in Wireshark.

While the TP-CU3 is excellent and served me well, it also was overkill. It has a bunch of features intended for permanent install / data center use, such as bypass relays to maintain connectivity during power failures, forced air cooling, redundant power supplies, and dual gigabit egress links to support monitoring saturated full duplex connections. Even when the built-in cooling fans are working properly, it’s loud enough to be irritating in a normal office (the SharkTap USB is silent).

Compared to the TP-CU3 there are three downsides to the SharkTapUSB, but for my needs I don’t see them being a problem:

  • Cannot Capture Sustained Full Duplex Traffic: The SharkTapUSB merges the network traffic between two ports and outputs it to a single gigabit NIC. If the traffic being captured is a sustained, full-duplex gigabit flow, this is too much for the capture interface and data will be lost. For me this amount of traffic is rare in practice, especially in situations where I need an external tap. (The SharkTapUSB has a 256KB buffer to accommodate short bursts of high bandwidth traffic.)
  • Link Electrical Status Not Propagated Between Ports: The TP-CU3 uses relays so that when one of the network ports is disconnected electrically the other one is shut down. For example, when the client PC is disconnected, the TP-CU3 drops the electrical link to the switch, so the switch sees the disconnect. The SharkTapUSB does not do this, and keeps the electrical link up on one side when the other is disconnected. Should this be a problem, such as when working with a switch that takes action on link state change, this can be sidestepped by unplugging cables.
  • Link Speed Autodetection: The SharkTapUSB cannot be forced to a particular port speed. However, it does set both ports to the lowest autodetected speed, so port speed can be controlled via settings on a connected device.

After looking at the SharkTapUSB’s block diagram I got curious how it’s actually implemented, so I opened it up to see and grabbed some photos of the Rev F PCB (top, bottom, jumper wires on bottom).

Here’s the notable components:


[1] While captures can be done locally (from within the OS), using tools like Packet Monitor or Wireshark or tcpdump, there are times when an external capture is more useful or the only option, such as:

  • Troubleshooting Intel AMT related issues, as AMT sits between the normal NIC and the external port.
  • Monitoring PXE.
  • OS’ where getting a local capture is complicated, such as Windows PE, embedded stuff in televisions, or mobile OS’ (eg: Android, iOS).
  • Investigating hardware offloads, as a local capture will show invalid data for things like TCP checksum as it’s not calculated before reaching the NIC.

[2] I looked into replacing the fan, but this doesn’t seem practical. The fan is a Sunon GB0535AEV1-8.B2445.GN, which is a combination heatsink and fan, and appears to be epoxied in place. While I can get one via eBay sent from China, I’m unsure if I’ll be able to remove the fan without damaging the chip. Instead I’ll keep the mostly-working tap around for rare occasions when full-duplex monitoring is needed, using the SharkTapUSB for day-to-day use. Perhaps in the future I’ll give a heatsink/fan swap a go…

Comments closed

Bypassing Reolink SSID Length Limitation

I purchased a Reolink E1 Zoom camera for occasional around the house use. It turns out that my SSID, Smart Meter Surveillance Network is too long for their setup app. While the standard is 32 octets (32 ASCII characters) — and my SSID is exactly this — some things, such as the Reolink app, only accept 31 characters. In this case it pulled the SSID from my phone (the network in use) and then truncated it. †

So, I set out to find a workaround, and I did.

During setup the Reolink app walks you through scanning a serial number QR code on the camera, prompts for the wireless network info, and then generates a QR code and displays it on the mobile device’s screen. The camera is then pointed at the screen, this QR code is read, and the camera configures its WiFi settings based on the code.

I figured that maybe if I generated a new QR code with the correct info I’d be able to configure the camera with a longer SSID and it turns out that worked.

After a couple minutes of generating codes I found the configuration QR code is text, formatted as follows, with #### as the last four characters of the camera’s serial number:

<QR><S>ssid</S><P>password</P><C>####</C></QR>

Using the first free online QR code generator I could find, I created a new QR code with containing the following text:

<QR><S>Smart Meter Surveillance Network</S><P>notmyrealpassword</P><C>M77L</C></QR>

I reset the camera, had it scan the new QR code, and it connected to the wireless network. It worked! The camera was now on the wireless network and I was able to connect to it in the app.

There did seem to be a bit of quirkyness in the app, possibly because of the long SSID. It’s working fine with the desktop app, so all is good. It’s also really nice to now have a way of reconfiguring the camera without having to install and use their app.

The standard maximum for SSIDs is 32 octets, or 32 ASCII characters. It appears some companies treat this as 31 characters, reserving the 32nd for the string termination character. Sort-of makes me wonder how I’ve been able to use this one for so long… It was fine with my old Apple AirPorts and I’ve had it running this way for couple years on Ubiquiti UniFi. Although it looks like the UniFi v6 UI now refuses to save changes with this SSID, so I guess I’m going to have to change it…

Comments closed

Simple PAC File Pilot Testing (including WPAD)

In a network that’s isolated from the public internet, such as many enterprise networks, proxy servers are typically used to broker internet access for client computers. Configuring the client computers to use these proxies is often done via a Proxy Auto-Config (PAC) file, code that steers requests so traffic for internal sites stays internal, and public sites go through the proxies.

Commonly these PAC files are made available via Web Proxy Auto-Discovery Protocol (WPAD) as well, because some systems need to automatically discover them. Specifically, in a Windows 10 environment which uses proxies, WPAD is needed because many components of Windows (including the Microsoft Store and Azure Device Registration) will not use the browser’s PAC file settings; it’s dependent on WPAD to find a path to the internet.

WPAD is typically configured via DNS, with a hostname of wpad.companydomain.com (or anything in the DNS Search Suffix List) resolving to the IP of a webserver [1]. This server must then answer an HTTP request for http://x.x.x.x/wpad.dat (where x.x.x.x is the server’s IP) or http://wpad.company.com/wpad.dat with a PAC file, with a Content-Type of x-ns-proxy-autoconfig [2].

Because WPAD requires DNS, something which can’t easily be changed for a subset of users, putting together a mechanism to perform a pilot deployment of a new PAC file can be a bit complicated. When attempting to perform a pilot deployment engineers will often send out a test PAC file URL to be manually configured, but this misses WPAD and does not result in a complete system test.

In order to satisfy WPAD, one can set up a simple webserver to host the new PAC file and a DNS server to answer the WPAD queries. This DNS server forwards all requests except for those for the PAC file to the enterprise DNS, so everything else works as normal. Testing users then only need to change their DNS to receive the pilot PAC file and everything else will work the same; a true pilot deployment.

Below I’ll detail how I use simplified configurations of Unbound and nginx to pilot a PAC file deployment. This can be done from any Windows machine, or with very minor config changes from something as simple as a Raspberry Pi running Linux.

[1] WPAD can be configured via DHCP, but this is only supported by a handful of Microsoft applications. DNS-based WPAD works across all modern OS’.

[2] Some WPAD clients put the server’s IP in the Host: field of the HTTP request.

DNS via Unbound

Unbound is a DNS server that’s straightforward to run and is available on all modern platforms. It’s perfect for our situation where we need to forward all DNS queries to the production infrastructure, modifying only the WPAD/PAC related queries to point to our web server. While it’s quite robust and has a lot of DNSSEC validation options, we don’t need any of that.

This simple configuration forwards all requests to corporate Active Directory-based DNS’ (10.0.1.2 and 10.0.2.2) for everything except the PAC file servers. For these, pacserver.example.com and wpad.example.com, it’ll intercept the request and return our webserver’s address of 10.0.3.25.

server:
interface: 0.0.0.0
access-control: 0.0.0.0/0 allow
module-config: "iterator"

local-zone: "wpad.example.com." static
local-data: "wpad.example.com. IN A 10.0.3.25"

local-zone: "pacserver.example.com." static
local-data: "pacserver.example.com. IN A 10.0.3.25

stub-zone:
name: "."
stub-addr: 10.0.1.2
stub-addr: 10.0.2.2

This configuration allows recursive queries from any hosts, but by specifying one or more subnets using access-control clauses to you can restrict from where it is usable. The stub-zone clause to send all requests up to two DNS’. If these upstream DNS’ handle recursion for the client, the forward-zone clause can be used instead.

PAC File via nginx

For serving up the PAC file, both for direct queries and those from WPAD, we’ll use nginx, a powerful but easy to use web server to which we can give a minimal config.

Put a copy of your PAC file at …/html/wpad.dat under nginx’s install directory so the server can find it. (There is great information on writing PAC files at FindProxyForUrl.com.)

This simple configuration will set up a web server which serves all files as MIME type application/x-ns-proxy-autoconfig, offering up the wpad.dat file by default (eg: http://pacserver.example.com) or when directly referenced (eg: http://10.0.3.25/wpad.dat or http://wpad.example.com/wpad.dat), satisfying both standard PAC file and WPAD requests.

events {
worker_connections 1024;
}

http {
default_type application/x-ns-proxy-autoconfig;
sendfile on;
keepalive_timeout 65;

server {
listen 80;
server_name localhost;

location / {
root html;
index wpad.dat;
}
}
}

Putting It All Together

With all the files in place and unbound and nginx running, you’re ready to go. Instruct pilot users to manually configure the new DNS, or push this setting out via Group Policy, VPN settings, or some other means. These users will then get the special DNS response for your PAC and WPAD servers, get the pilot PAC file from your web server, and be able to test.

Comments closed

Archiving Gallery 2 with HTTrack

Along with the static copy of the MediaWiki, I’ve been wanting to make a static, archival copy of the Gallery 2 install that I’ve been using to share photos, for 15+ years, at nuxx.net/gallery. Using HTTrack I was able to do so, after a bit of work, resulting in a copy at the same URL and with images accessed using the same paths, from static files.

The result is that I no longer need to run the aging Gallery 2 software, yet links and embedded images that point to my photo gallery did not break.

In the last few years I’ve both seen the traffic drop off, I haven’t posted many new things there, and it seems like the old Internet of pointing people to a personal photo gallery is nearly dead. I believe that blog posts, such as this, with links to specific photos, are where effort should be put. While there is 18+ years of personal history in digital images in my gallery, it doesn’t get used the same way it was 10 years ago.

On the technical side, the relatively-ancient (circa 2008) Gallery 2 has and the ~90GB of data in it has occasionally been a burden. I had to maintain an old copy of PHP just for this app, and this made updating things a pain. While there is a recent project, Gallery the Revival, which aims to update Gallery to newer versions of PHP, this is based around Gallery 3 and a migration to that brings about its own problems, including breaking static links.

I’m still not sure if I want to keep the gallery online but static as it is now, put the web app back up, completely take it off the internet and host it privately at home, or what… but figuring out how to create an archive has given me options.

What follows are my notes on how I used HTTrack, a package specifically designed to mirror websites, to archive nuxx.net’s Photo Gallery. I encountered a few bumps along the way, so this details each and how it was overcome, resulting in the current static copy. To find each of these I’d start HTTrack, let it run for a while, see if it got any errors, fix them, then try again. Eventually I got it to archive cleanly with zero errors:

Gallery Bug 83873

During initial runs, HTTrack finished after ~96MB (out of ~90GB of images) saved, reporting that it was complete. The main portions of the site looked good, but many sub-albums or original-resolution images were zero-byte HTML files on disk and displayed blank in the browser. This was caused by Gallery bug 83873, triggered by using HTTPS on the site. It seems to be fixed by adding the following line just before line 780 in .../modules/core/classes/GallerySession.class:

GalleryCoreApi::requireOnce('modules/core/classes/GalleryTranslator.class');

This error was found by via the following in Apache’s error log:

AH01071: Got error 'PHP message: PHP Fatal error: Class 'GalleryTranslator' not found in /var/www/vhosts/nuxx.net/gallery/modules/core/classes/GallerySession.class on line 780\n', referer: http://nuxx.net/gallery/

Minimize External Links / Footers

To clean things up further, minimizing external links, and make the static copy of the site as simple as possible, I also removed external links in footer by commenting out the external Gallery links and version from the footer, via .../themes/themename/templates/local/theme.tpl and .../themes/themename/templates/local/error.tpl:

<div id="gsFooter">
{*
{g->logoButton type="validation"}
*{g->logoButton type="gallery2"}
*{g->logoButton type="gallery2-version"}
*{g->logoButton type="donate"}
*}
</div>

Remove Details from EXIF/IPTC Plugin

The EXIF/IPTC Plugin for Gallery is excellent because it shows embedded metadata from the original photo, including things like date/time, camera model, location. This presents as a simple Summary view and a lengthier Details view. Unfortunately, when being indexed by HTTrack, selecting of the Details view — done via JavaScript — returns a server error. This shows up in the HTTrack UI as an increasing error count, and server errors as some pages are queried.

To not have a broken link on every page I modified the plugin to remove the Summary and Details view selector so it’d only display Summary, and used the plugin configuration to ensure that every field I wanted was shown in the summary.

To make this change copy .../modules/exif/templates/blocks/ExifInfo.tpl to .../modules/exif/templates/blocks/local/ExifInfo.tpl (to create a local copy, per the Editing Templates doc). Then edit the local copy and comment out lines 43 through 60 so that only the Summary view is displayed:

{* {if ($exif.mode == 'summary')}
* {g->text text="summary"}
* {else}
* <a href="{g->url arg1="controller=exif.SwitchDetailMode"
* arg2="mode=summary" arg3="return=true"}" onclick="return exifSwitchDetailMode({$exif.blockNum},{$item.id},'summary')">
* {g->text text="summary"}
* </a>
* {/if}
* &nbsp;&nbsp;
* {if ($exif.mode == 'detailed')}
* {g->text text="details"}
* {else}
* <a href="{g->url arg1="controller=exif.SwitchDetailMode"
* arg2="mode=detailed" arg3="return=true"}" onclick="return exifSwitchDetailMode({$exif.blockNum},{$item.id},'detailed')">
* {g->text text="details"}
* </a>
* {/if}
*}

Disable Extra Plugins

Finally, I disabled a bunch of plugins which both wouldn’t be useful in a static copy of the site, and cause a number of interconnected links which would make a mirror of the site overly complicated:

  • Search: Can’t search a static site.
  • Google Map Module: Requires a maps API key, which I don’t want to mess with.
  • New Items: There’s nothing new getting posted to a static site.
  • Slideshow: Not needed.

Fix Missing Files

My custom theme, which was based on matrix, linked to some images in the matrix directory which were no longer present in newer versions of the themes, so HTTrack would get 404 errors on these. I copied these files from my custom theme to the .../themes/matrix/images directory to fix this.

Clear Template / Page Cache

After making changes to templates it’s a good idea to clear all the template caches so all pages are rendering with the above changes. While all these steps may be overkill, I do this by going into Site Admin → Performance and setting Guest Users and Registered Users to No acceleration. I then uncheck Enable template caching and click Save. I then click Clear Saved Pages to clear any cached pages, then re-enable template caching and Full acceleration for Guest Users (which HTTrack will be working as).

PANIC! : Too many URLs : >99999

If your Gallery has a lot of images, HTTrack could quit with the error PANIC! : Too many URLs : >99999. Mine did, so I had to run it with the -#L1000000 argument so that it’ll then be limited to 1,000,000 URLs instead of the default 99,999.

Run HTTrack

After all of this, I ran the httrack binary with the security (bandwidth, etc) limits disabled (--disable-security-limits) and used its wizard mode to set up the mirror. The URL to be archived was https://nuxx.net/gallery/, stored in an appropriately named project directory, with no other settings.

CAUTION: Do not disable security limits if you don’t have good controls around the site you are mirroring and the bandwidth between the two. HTTrack has very sane defaults for rate limiting when mirroring that keep its behavior polite, it’s not wise to override these defaults unless you have good control of the source and destination site.

When httrack begins it shows no progress on screen, so I quit with Ctrl-C, switched to the project directory, and ran httrack --continue to allow the mirror to continue and show status info on the screen (the screenshot above). The argument --continue can be used to restart an interrupted mirror, and --update can be used to freshen up a complete mirror.

Alternately, the following command puts this all together, without the wizard:

httrack https://nuxx.net/gallery/ -W -O "/home/username/websites/nuxx.net Photo Gallery" -%v --disable-security-limits -#L1000000

As HTTrack spiders the site it comes across external links and needs to know what to do with them. Because I didn’t specify an action for external links on the command line, it prompts with the question “A link, [linkurl], is located beyond this mirror scope.”. Since I’m not interested in mirroring any external sites (mostly links to recipes or company websites) I answer * which is “Ignore all further links and do not ask any more questions” (text in httrack.c). (I was unable to figure out how to suppress this via a command line option before getting a complete mirror, although it’s likely possible.)

Running from a Dedicated VM

I ran this mirror task from a Linode VM, located in the same region as the VM hosting nuxx.net. This results in all traffic flowing over the Private network, avoiding bandwidth charge.

Because of the ~90GB of images, I set up a Linode 8GB, which has 160GB of disk, 8GB of RAM, and 4 CPUs. This should provide plenty of space for the mirror, with enough resources to allow the tool to work. This VM costs $40/mo (or $0.06/hr), which I find plenty affordable for getting this project done. The mirror took N days to complete, after which I tar’d it up and copied it a few places before deleting the VM.

By having a separate VM I was able to not worry about any dependencies or package problems and delete it after the work is done. All I needed to do on this VM was create a user, put it in the sudoers file, install screen (sudo apt-get install screen) and httrack (sudo apt-get install httrack), and get things running.

Wrapping It All Up

After the mirror was complete I replaced my .../gallery directory with the .../gallery directory from the HTTrack output directory and all was good.

Comments closed

Archiving MediaWiki with mwoffliner and zimdump

For a number of years on nuxx.net I used MediaWiki to host technical content. The markup language is nearly perfect for this sort of content, but in recent years I haven’t been doing as much of this and maintaining the software became a bit of a hassle. In order to still make the content available but get rid of the actual software, I moved all the content to static HTML files.

These files were created by creating a ZIM file — commonly used for offline copies of a website — and then extracting that file. The extracted files, a static copy of the MediaWiki-based site, was then made available using Apache.

You can get the ZIM file here, or browse the new static pages here.

Here’s the general steps I used to make it happen.

Create ZIM file: mwoffliner --mwUrl="https://nuxx.net/" --adminEmail=steve@nuxx.net --redis="redis://localhost:6379" --mwWikiPath="/w/" --customZimFavicon=favicon-32x32.png

Create HTML Directory from ZIM File: zimpdump -D mw_archive outfile.zim

Note: There are currently issues with zimdump and putting %2f HTML character codes in filenames instead of creating paths. This is openzim/zim-tools issue #68, and will need to be fixed by hand.

Consider using find . -name "*%2f*" to find problems with files, then use rename 's/.{4}(.*)/$1/' * (or so) to fix the filenames after moving them into appropriate subdirectories.

If using Apache (as I am) create .htaccess to set MIME Types Appropriately, turning off the rewrite engine so higher-level redirects don’t affect things:

<FilesMatch "^[^.]+$">
ForceType text/html
</FilesMatch>

RewriteEngine Off

Link to http://sitename.com/outdir/A/Main_Page to get to the original main wiki page. In my case, http://nuxx.net/wiki_archive/A/Main_Page.

 

Comments closed

BorgBackup Repository on Synology DSM 6.2.2

Lately I’ve become enamored with BorgBackup (Borg) for backups of remote *NIX servers, so after acquiring a Synology DS1019+ for home I wanted to make it the destination repository for Borg-based backups of nuxx.net. While setting up Borg is usually quite straightforward (a package or stand-alone binary), it’s not so cut and dry on the Synology DiskStation Manager (DSM); the OS which runs on the DS1019+ and most other Synology NAS’.

What follows here are the steps I used to make and the reason for each step. In the end it was fairly simple, but a few of the steps are obtuse and only relevant to DSM.

These steps were written for DSM 6.2.2; I have not checked to see if it applies to other versions. Also, I leave out all details of setting up public key authentication for SSH as this is thoroughly documented elsewhere.

  1. Enable “User Home Service” via Control PanelUserAdvancedUser HomeEnable user home service: This creates a home directory for each user on the machine and thus a place to store .ssh/authorized_keys for the backup user account.
  2. Create a backup user account and make it part of the administrators group: Accounts must be part of administrators in order to log in via SSH. Starting with DSM 6.2.2 non-admin users do not have SSH access.
  3. Change the permissions on the backup user’s home directory to 755: By default users’ home directories have an ACL applied which has too broad of permissions and SSH will refuse to use the key, instead prompting for a password. Home directories are located under /var/services/homes and this can be set via chmod 755 /var/services/homes/backupuser. (See this thread for details.)
  4. Put ~/.ssh/authorized_keys, containing the remote user’s public key, in place under the backup user’s home directory and ensure that the file is set to 700: If permissions are too open, sshd will refuse to use the key.
  5. Test that you can log in remotely with ssh and public key authentication.
  6. Place the borg-linux64 binary (named borg) in the user’s home directory and confirm that it’s executable: Binaries available here.
  7. Create a directory on the NAS to be used the backup destination and give the backup user read and write permissions.
  8. Modify the backup user’s ~/.ssh/authorized_keys to prevent remote interactive logins and restrict how borg is run: This is optional, but a good idea.

    In this example only the borg serve command (the borg repository server) can be run remotely, is restricted to 120GB of disk, in a repository on DSM under the backup directory of /volume2/Backups/borg, and from remote IP of 192.168.0.23:

    command="/var/services/homes/backupuser/borg serve --storage-quota 120G --restrict-to-repository /volume2/Backups/borg",restrict,from="192.168.0.23" ssh-rsa AAAA[...restofkeygoeshere...] remoteuser@remoteserver.example.com

Please note, there are a number of articles about enabling public key authentication for SSH on DSM which mention uncommenting and setting PubkeyAuthentication yes and AuthorizedKeysFile .ssh/authorized_keys in /etc/ssh/sshd_config and restarting sshd. I did not need to do this. The settings, as commented out, are the defaults and thus already set that way (see sshd_config(5) for details).

At this point DSM should allow a remote user, authenticating with a public key and restricted to a particular source IP address, to use the Synology NAS as a BorgBackup repository. For more information about automating backups check out this article about how I use borg for backing up nuxx.net, including a wrapper script that can be run automatically via cron.

Comments closed