cdc datasets urls from sitemap (#26)

CDC dataset URLs derived from this sitemap file: https://s3.amazonaws.com/sa-socrata-sitemaps-us-east-1-fedramp-prod/sitemaps/sitemap-datasets-data.cdc.gov0.xml. The format of this CSV file is: sitemap file the URL is sourced from,the URL.

Note that the pages pointed to by these URLs usually include a download button to get a CSV of the dataset, but the CSVs themselves aren't included in this seed file and won't be retrieved in the crawl. But at least the existence of the dataset and links to its metadata will be documented in the archive.
This commit is contained in:
YakShaver 2025-01-06 08:05:19 -08:00 committed by GitHub
parent ee6cba5868
commit 5a83e824d4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

File diff suppressed because it is too large Load Diff