PDFs from the CDC website - single file (#17)

This is a csv file of PDF links obtained from webpages found on the US CDC website. It contains 46,873 links, with the format: the source HTML file containing the PDF link; the time in UTC in which the accessibility of the PDF file was confirmed; and a URL pointing to the PDF file itself.
    
This file replaces the two previous files. This file has had the PDF links deduped, so if multiple pages point to the same PDF, you'll only see an entry for the first reference. PDF links that point to non-gov domains have been omitted as well.If the PDF link contains a fragment, the fragment will be removed from the path (e.g.  "/a/path/mypdf.pdf#page=3" will get turned into "/a/path/mypdf.pdf"). All the PDF files have had their accessibility and content type verified with a HTTP HEAD request on Dec. 09 2024.
This commit is contained in:
YakShavingAsAService 2024-12-10 12:51:36 -08:00 committed by GitHub
parent 5a9195431e
commit f4b194553a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

File diff suppressed because it is too large Load Diff