mirror of
https://github.com/rawiriblundell/wiki.bash-hackers.org
synced 2024-11-01 16:43:08 +01:00
Update README with comments about non-markdown filtering
This commit is contained in:
parent
50e7ca2385
commit
cae5b29740
@ -5,8 +5,8 @@ This is targeting pages that have been captured by the Wayback Machine that spec
|
||||
|
||||
See the incomplete script "archive_crawler" to see my working.
|
||||
|
||||
TODO: Second crawl
|
||||
TODO: Filter out all the non-markdown garbage.
|
||||
- TODO: Second crawl
|
||||
- TODO: Filter out all the non-markdown garbage. It looks like everything up to `<div class="editBox" role="application">`, and everything after `</div><!-- /content --></div>` is a good first cull.
|
||||
|
||||
# LICENSE
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user