This is targeting pages that have been captured by the Wayback Machine that specifically have '?do=edit' on the end of their URL. This gives us the markdown source.
See the incomplete script "archive_crawler" to see my working.
- TODO: Filter out all the non-markdown garbage. It looks like everything up to `<div class="editBox" role="application">`, and everything after `</div><!-- /content --></div>` is a good first cull.
The original copyright belongs to Jan Schampera (TheBonsai) and subsequent contributors, 2007 - 2023.
It's extremely important to me that copyright and attribution are given where required - the original contributors are worth their dues, and IIRC I'm one of them.
If you're one of the original contributors and you believe I've violated your copyright in anyway, please let me know in the first instance.