Press "Enter" to skip to content

Day: June 23, 2020

Archiving MediaWiki with mwoffliner and zimdump

For a number of years on nuxx.net I used MediaWiki to host technical content. The markup language is nearly perfect for this sort of content, but in recent years I haven’t been doing as much of this and maintaining the software became a bit of a hassle. In order to still make the content available but get rid of the actual software, I moved all the content to static HTML files.

These files were created by creating a ZIM file — commonly used for offline copies of a website — and then extracting that file. The extracted files, a static copy of the MediaWiki-based site, was then made available using Apache.

You can get the ZIM file here, or browse the new static pages here.

Here’s the general steps I used to make it happen.

Create ZIM file: mwoffliner --mwUrl="https://nuxx.net/" --adminEmail=steve@nuxx.net --redis="redis://localhost:6379" --mwWikiPath="/w/" --customZimFavicon=favicon-32x32.png

Create HTML Directory from ZIM File: zimpdump -D mw_archive outfile.zim

Note: There are currently issues with zimdump and putting %2f HTML character codes in filenames instead of creating paths. This is openzim/zim-tools issue #68, and will need to be fixed by hand.

Consider using find . -name "*%2f*" to find problems with files, then use rename 's/.{4}(.*)/$1/' * (or so) to fix the filenames after moving them into appropriate subdirectories.

If using Apache (as I am) create .htaccess to set MIME Types Appropriately, turning off the rewrite engine so higher-level redirects don’t affect things:

<FilesMatch "^[^.]+$">
ForceType text/html
</FilesMatch>

RewriteEngine Off

Link to http://sitename.com/outdir/A/Main_Page to get to the original main wiki page. In my case, http://nuxx.net/wiki_archive/A/Main_Page.

 

Comments closed