* Update README.md
added another bulk list from Gary Price/Infodocket. File is USDA_FIS_ERS.xlsx
* Add files via upload
USDA_FIS_ERS.xlsx from Gary Price/infodocket. 1700 or so urls from the USDA. Specifically, the Food Inspection Service and Economic Research Service.
This is a csv file of PDF links obtained from webpages found on the US CDC website. It contains 46,873 links, with the format: the source HTML file containing the PDF link; the time in UTC in which the accessibility of the PDF file was confirmed; and a URL pointing to the PDF file itself.
This file replaces the two previous files. This file has had the PDF links deduped, so if multiple pages point to the same PDF, you'll only see an entry for the first reference. PDF links that point to non-gov domains have been omitted as well.If the PDF link contains a fragment, the fragment will be removed from the path (e.g. "/a/path/mypdf.pdf#page=3" will get turned into "/a/path/mypdf.pdf"). All the PDF files have had their accessibility and content type verified with a HTTP HEAD request on Dec. 09 2024.
* Update README.md
* Update README.md
* Add files via upload
* Update README.md
added bulk file from EnergyFundsForAll.org
* Bulk list from EnergyFundsForAll
* Remove extra whitespace
Signed-off-by: Lauren Ko <lauren.ko@unt.edu>
* Remove duplicate listing of infodocket-11-21-2024.xls
---------
Signed-off-by: Lauren Ko <lauren.ko@unt.edu>
Co-authored-by: James R. Jacobs <freegovinfo@gmail.com>
* adding info docket bulk seed list
* Update README.md
* Update README.md
* Add files via upload
Bulk lists from Gary Price and Kelly Smith. Seed list readme updated with file names.
* Common Crawl Foundation seeds
* clean mil list to just hostnames
* doc: add location of ccf repo that generated these files
---------
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>