Merge branch 'end-of-term:main' into main

This commit is contained in:
James R. Jacobs 2025-01-22 17:58:09 -08:00 committed by GitHub
commit ea603195da
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 11717 additions and 0 deletions

@ -132,3 +132,4 @@ The End of Term Web Archive team and other contributors compiled a list of sourc
* fda-warnings-letters.csv - URLs derived from the FDA's sitemap.xml file with warning letters content in HTML. The format of this CSV file is: sitemap file the URL is sourced from,the URL.
* nih-urls.csv - URLs derived from three NIH sitemap files: https://www.nih.gov/sitemap.xml, https://newsinhealth.nih.gov/sitemap.xml and https://nihrecord.nih.gov/sitemap.xml. The format of this CSV file is: sitemap file the URL is sourced from,the URL.
* cdc-dataset-urls.csv - CDC dataset URLs derived from this sitemap file: https://s3.amazonaws.com/sa-socrata-sitemaps-us-east-1-fedramp-prod/sitemaps/sitemap-datasets-data.cdc.gov0.xml. The format of this CSV file is: sitemap file the URL is sourced from,the URL.
* cdc-dataset-download-urls.txt -Extracted list of /api/views URLs from the pages in cdc-dataset-urls.csv.

File diff suppressed because it is too large Load Diff