mirror of
https://github.com/end-of-term/eot2024
synced 2025-01-18 05:03:44 +01:00
Adds FDA and NIH HTML URLs for seed list (#25)
* FDA HTML urls for seed list This is a file of URLs derived from the FDA's sitemap.xml file. It has URLs of the form /media/*/download and the warning-letters filtered out, so in theory everything in this file is an HTML link. I plan on submitting the PDF content separately. The format of this CSV file is: <sitemap file the URL is sourced from>,<the URL> * FDA warning letters from sitemap.xml This is a file of URLs derived from the FDA's sitemap.xml file with warning letters content. I thought this was going to be PDF content, but it turns out to be HTML content. The format of this CSV file is: sitemap file the URL is sourced from,the URL * FDA download links from sitemaps This is a file of URLs derived from the FDA's sitemap.xml file, where the link is of the form /media/id/download. These resolve to a PDF file rendered in a HTML wrapper using Mozilla's pdfjs library -- so the download in the path name is a little misleading. The format of this CSV file is: sitemap file the URL is sourced from,the URL * NIH urls from three sitemaps. This is a file of URLs derived from three NIH sitemap files: https://www.nih.gov/sitemap.xml, https://newsinhealth.nih.gov/sitemap.xml and https://nihrecord.nih.gov/sitemap.xml. The format of this CSV file is: sitemap file the URL is sourced from,the URL
This commit is contained in:
parent
526cb84dd0
commit
a3d96841db
49386
seed-lists/fda-download-urls.csv
Normal file
49386
seed-lists/fda-download-urls.csv
Normal file
File diff suppressed because it is too large
Load Diff
30432
seed-lists/fda-no-downloads-no-warning-letters.csv
Normal file
30432
seed-lists/fda-no-downloads-no-warning-letters.csv
Normal file
File diff suppressed because it is too large
Load Diff
6073
seed-lists/fda-warnings-letters.csv
Normal file
6073
seed-lists/fda-warnings-letters.csv
Normal file
File diff suppressed because it is too large
Load Diff
15044
seed-lists/nih-urls.csv
Normal file
15044
seed-lists/nih-urls.csv
Normal file
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user