From cae5b297409200937632fde3bc885c7f223f7d6d Mon Sep 17 00:00:00 2001 From: Rawiri Blundell Date: Sat, 15 Apr 2023 00:02:52 +1200 Subject: [PATCH] Update README with comments about non-markdown filtering --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 69fa8d8..a6aa23e 100644 --- a/README.md +++ b/README.md @@ -5,8 +5,8 @@ This is targeting pages that have been captured by the Wayback Machine that spec See the incomplete script "archive_crawler" to see my working. -TODO: Second crawl -TODO: Filter out all the non-markdown garbage. +- TODO: Second crawl +- TODO: Filter out all the non-markdown garbage. It looks like everything up to `
`, and everything after `
` is a good first cull. # LICENSE