From c5f1d52ae103bb52f40f53b9862022c29725a047 Mon Sep 17 00:00:00 2001 From: Lauren Ko Date: Mon, 20 Jan 2025 09:56:48 -0600 Subject: [PATCH] Updating README --- seed-lists/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/seed-lists/README.md b/seed-lists/README.md index e02d989..e66226b 100644 --- a/seed-lists/README.md +++ b/seed-lists/README.md @@ -132,3 +132,4 @@ The End of Term Web Archive team and other contributors compiled a list of sourc * fda-warnings-letters.csv - URLs derived from the FDA's sitemap.xml file with warning letters content in HTML. The format of this CSV file is: sitemap file the URL is sourced from,the URL. * nih-urls.csv - URLs derived from three NIH sitemap files: https://www.nih.gov/sitemap.xml, https://newsinhealth.nih.gov/sitemap.xml and https://nihrecord.nih.gov/sitemap.xml. The format of this CSV file is: sitemap file the URL is sourced from,the URL. * cdc-dataset-urls.csv - CDC dataset URLs derived from this sitemap file: https://s3.amazonaws.com/sa-socrata-sitemaps-us-east-1-fedramp-prod/sitemaps/sitemap-datasets-data.cdc.gov0.xml. The format of this CSV file is: sitemap file the URL is sourced from,the URL. +* cdc-dataset-download-urls.txt -Extracted list of /api/views URLs from the pages in cdc-dataset-urls.csv.