{"id":18945,"date":"2020-06-23T12:33:54","date_gmt":"2020-06-23T16:33:54","guid":{"rendered":"https:\/\/nuxx.net\/blog\/?p=18945"},"modified":"2020-06-23T12:33:54","modified_gmt":"2020-06-23T16:33:54","slug":"archiving-mediawiki-with-mwoffliner-and-zimdump","status":"publish","type":"post","link":"https:\/\/nuxx.net\/blog\/2020\/06\/23\/archiving-mediawiki-with-mwoffliner-and-zimdump\/","title":{"rendered":"Archiving MediaWiki with mwoffliner and zimdump"},"content":{"rendered":"\n<p><a href=\"https:\/\/nuxx.net\/blog\/wp-content\/uploads\/2020\/06\/Screenshot_2020-06-23-nuxx-net.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-medium wp-image-18949\" src=\"https:\/\/nuxx.net\/blog\/wp-content\/uploads\/2020\/06\/Screenshot_2020-06-23-nuxx-net-300x148.png\" alt=\"\" width=\"300\" height=\"148\" srcset=\"https:\/\/nuxx.net\/blog\/wp-content\/uploads\/2020\/06\/Screenshot_2020-06-23-nuxx-net-300x148.png 300w, https:\/\/nuxx.net\/blog\/wp-content\/uploads\/2020\/06\/Screenshot_2020-06-23-nuxx-net-1024x506.png 1024w, https:\/\/nuxx.net\/blog\/wp-content\/uploads\/2020\/06\/Screenshot_2020-06-23-nuxx-net-768x379.png 768w, https:\/\/nuxx.net\/blog\/wp-content\/uploads\/2020\/06\/Screenshot_2020-06-23-nuxx-net-1536x758.png 1536w, https:\/\/nuxx.net\/blog\/wp-content\/uploads\/2020\/06\/Screenshot_2020-06-23-nuxx-net-2048x1011.png 2048w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a>For a number of years on <a href=\"https:\/\/nuxx.net\">nuxx.net<\/a> I used <a href=\"https:\/\/www.mediawiki.org\/wiki\/MediaWiki\">MediaWiki<\/a> to host technical content. The markup language is nearly perfect for this sort of content, but in recent years I haven&#8217;t been doing as much of this and maintaining the software became a bit of a hassle. In order to still make the content available but get rid of the actual software, I moved all the content to static HTML files.<\/p>\n<p>These files were created by creating a <a href=\"https:\/\/wiki.openzim.org\/wiki\/ZIM_file_format\">ZIM file<\/a> &#8212; commonly used for offline copies of a website &#8212; and then extracting that file. The extracted files, a static copy of the MediaWiki-based site, was then made available using Apache.<\/p>\n<p>You can get the ZIM file <a href=\"https:\/\/nuxx.net\/wiki_archive\/nuxx_en_all_2020-06.zim\">here<\/a>, or browse the new static pages <a href=\"https:\/\/nuxx.net\/mw_archive\/A\/Main_Page\">here<\/a>.<\/p>\n<p>Here&#8217;s the general steps I used to make it happen.<\/p>\n<p>Create ZIM file: <code>mwoffliner --mwUrl=\"https:\/\/nuxx.net\/\" \n--adminEmail=steve@nuxx.net --redis=\"redis:\/\/localhost:6379\" \n--mwWikiPath=\"\/w\/\" --customZimFavicon=favicon-32x32.png<\/code><\/p>\n<p>Create HTML Directory from ZIM File: <code>zimpdump -D mw_archive outfile.zim<\/code><\/p>\n<p>Note: There are currently issues with zimdump and putting %2f HTML character codes in filenames instead of creating paths. This is <a href=\"https:\/\/github.com\/openzim\/zim-tools\/issues\/68\">openzim\/zim-tools issue #68<\/a>, and will need to be fixed by hand.<\/p>\n<p>Consider using <code>find . -name \"*%2f*\"<\/code> to find problems with files, then use <code>rename 's\/.{4}(.*)\/$1\/' *<\/code> (or so) to fix the filenames after moving them into appropriate subdirectories.<\/p>\n<p>If using Apache (as I am) create .htaccess to set MIME Types Appropriately, turning off the rewrite engine so higher-level redirects don&#8217;t affect things:<\/p>\n<p><code>&lt;FilesMatch \"^[^.]+$\"&gt;<\/code><br \/><code>ForceType text\/html<\/code><br \/><code>&lt;\/FilesMatch&gt;<\/code><\/p>\n<p><code>RewriteEngine Off<\/code><\/p>\n<p>Link to <code>http:\/\/sitename.com\/outdir\/A\/Main_Page<\/code> to get to the original main wiki page. In my case, <a href=\"https:\/\/nuxx.net\/wiki_archive\/A\/Main_Page\">http:\/\/nuxx.net\/wiki_archive\/A\/Main_Page<\/a>.<\/p>\n<p>\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13,4],"tags":[],"class_list":["post-18945","post","type-post","status-publish","format-standard","hentry","category-computers","category-nuxxnet","entry"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/18945","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/comments?post=18945"}],"version-history":[{"count":5,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/18945\/revisions"}],"predecessor-version":[{"id":18951,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/18945\/revisions\/18951"}],"wp:attachment":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/media?parent=18945"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/categories?post=18945"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/tags?post=18945"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}