This is targeting pages that have been captured by the Wayback Machine that specifically have `?do=edit` on the end of their URL. These pages give us the Dokuwiki Markup source, relatively unmolested.
See the incomplete script "archive_crawler" to see my working. I would not recommend blindly running it - it's beta quality at best. Just read it and this page to follow the logic... or just fork this repo... or whatever, I'm not your Dad.
## Getting the latest capture URL from archive.org
archive.org does not appear to have an exhaustive API, but it does have at least one API call that is relevant to our needs. It's called `available`.
How it works is you run a `curl -X GET` against `https://archive.org/wayback/available?url=[YOUR URL HERE]` and it returns to you with a bit of JSON. For example:
```bash
curl -s -X GET "https://archive.org/wayback/available?url=https://wiki.bash-hackers.org/howto/mutex?do=edit" | jq -r '.'
So basically, we remove everything from the first line to the line that contains `name="sectok"`, and then we remove everything after `</textarea>`, and what's left should be the Dokuwiki Markup that we want.
The original copyright belongs to Jan Schampera (TheBonsai) and subsequent contributors, 2007 - 2023.
It's extremely important to me that copyright and attribution are given where required - the original contributors are worth their dues, and IIRC I'm one of them.
If you're one of the original contributors and you believe I've violated your copyright in anyway, please let me know in the first instance.