mirror of
https://github.com/end-of-term/eot2024
synced 2025-01-18 05:03:44 +01:00
Add cdc-dataset-urls.csv to README
This commit is contained in:
parent
5a83e824d4
commit
c35eb3240e
@ -115,3 +115,4 @@ The End of Term Web Archive team and other contributors compiled a list of sourc
|
||||
* fda-no-downloads-no-warning-letters.csv - URLs derived from the FDA's sitemap.xml file. It has URLs of the form /media/\*/download and the warning-letters filtered out, so in theory everything in this file is HTML. The format of this CSV file is: sitemap file the URL is sourced from,the URL.
|
||||
* fda-warnings-letters.csv - URLs derived from the FDA's sitemap.xml file with warning letters content in HTML. The format of this CSV file is: sitemap file the URL is sourced from,the URL.
|
||||
* nih-urls.csv - URLs derived from three NIH sitemap files: https://www.nih.gov/sitemap.xml, https://newsinhealth.nih.gov/sitemap.xml and https://nihrecord.nih.gov/sitemap.xml. The format of this CSV file is: sitemap file the URL is sourced from,the URL.
|
||||
* cdc-dataset-urls.csv - CDC dataset URLs derived from this sitemap file: https://s3.amazonaws.com/sa-socrata-sitemaps-us-east-1-fedramp-prod/sitemaps/sitemap-datasets-data.cdc.gov0.xml. The format of this CSV file is: sitemap file the URL is sourced from,the URL.
|
||||
|
Loading…
Reference in New Issue
Block a user