wiki.bash-hackers.org/README.md

50 lines
2.1 KiB
Markdown
Raw Normal View History

2023-04-14 12:40:45 +02:00
# wiki.bash-hackers.org
Extraction of wiki.bash-hackers.org from the Wayback Machine
2023-04-14 13:55:07 +02:00
2023-04-15 13:23:07 +02:00
This is targeting pages that have been captured by the Wayback Machine that specifically have `'?do=edit'` on the end of their URL. This gives us the markdown source.
2023-04-14 13:55:07 +02:00
See the incomplete script "archive_crawler" to see my working.
2023-04-15 13:23:07 +02:00
- TODO: Markdown linting
2023-04-15 13:54:03 +02:00
- TODO: Markdown conversion from Dokuwiki "Markup" to GitHub "Markdown" using pandoc
2023-04-15 13:23:07 +02:00
- TODO: Parse the already downloaded files for any missing links
2023-04-15 13:54:03 +02:00
- TODO: Rinse and repeat
2023-04-14 13:55:07 +02:00
2023-04-15 13:23:07 +02:00
## Extracting the markdown
So the pages that have `'?do-edit'` on the end of their URL appear to have a reliable and predictable structure:
```bash
[ LINES ABOVE REMOVED FOR BREVITY ]
<div class="toolbar group">
<div id="draft__status" class="draft__status"></div>
<div id="tool__bar" class="tool__bar"></div>
</div>
<form id="dw__editform" method="post" action="" accept-charset="utf-8" class=" form-inline"><div class="no">
<input type="hidden" name="sectok" value=""/><input type="hidden" name="id" value="wishes"/>[REST OF LINE REMOVED FOR BREVITY]
[ TARGET MARKDOWN CODE EXISTS HERE]
</textarea>
<div id="wiki__editbar" class="editBar">
<div id="size__ctl">
</div>
[ LINES BELOW REMOVED FOR BREVITY ]
```
So basically, we remove everything from the first line to the line that contains `name="sectok"`, and then we remove everything after `</textarea>`, and what's left should be the markdown that we want.
## LICENSE
2023-04-14 13:55:07 +02:00
As per the original wiki.bash-hackers.org:
> Except where otherwise noted, content on this wiki is licensed under the following license:
> [GNU Free Documentation License 1.3](https://web.archive.org/web/20220930131429/http://www.gnu.org/licenses/fdl-1.3.html)
2023-04-15 13:23:07 +02:00
## COPYRIGHT
2023-04-14 13:55:07 +02:00
The original copyright belongs to Jan Schampera (TheBonsai) and subsequent contributors, 2007 - 2023.
It's extremely important to me that copyright and attribution are given where required - the original contributors are worth their dues, and IIRC I'm one of them.
If you're one of the original contributors and you believe I've violated your copyright in anyway, please let me know in the first instance.