Bentley Hensel
e2054da35c
Merge pull request #3 from end-of-term/main
...
update
2024-12-10 20:33:49 -05:00
Lauren Ko
3b3bf304b9
Update README for CDC PDFs
2024-12-10 14:55:01 -06:00
YakShavingAsAService
f4b194553a
PDFs from the CDC website - single file ( #17 )
...
This is a csv file of PDF links obtained from webpages found on the US CDC website. It contains 46,873 links, with the format: the source HTML file containing the PDF link; the time in UTC in which the accessibility of the PDF file was confirmed; and a URL pointing to the PDF file itself.
This file replaces the two previous files. This file has had the PDF links deduped, so if multiple pages point to the same PDF, you'll only see an entry for the first reference. PDF links that point to non-gov domains have been omitted as well.If the PDF link contains a fragment, the fragment will be removed from the path (e.g. "/a/path/mypdf.pdf#page=3" will get turned into "/a/path/mypdf.pdf"). All the PDF files have had their accessibility and content type verified with a HTTP HEAD request on Dec. 09 2024.
2024-12-10 14:51:36 -06:00
James R. Jacobs
5a9195431e
bulk list of NPS seeds submitted by Hermann-Wu - Hermann-Wu-nps-20241209.txt ( #16 )
...
* Update README.md
added NPS seeds submitted by Hermann-Wu - Hermann-Wu-nps-20241209.txt
* Add files via upload
NPS seeds submitted by Hermann-Wu - Hermann-Wu-nps-20241209.txt
* Update README.md
edited the contact section.
2024-12-10 12:48:14 -06:00
Bentley Hensel
4bef8b223d
Merge pull request #2 from end-of-term/main
...
Add some Bureau of Land Management and EnergyFundsForAll.org seeds (#15 )
2024-12-09 15:59:46 -05:00
Lauren Ko
ed4d0f0d8a
Add some Bureau of Land Management and EnergyFundsForAll.org seeds ( #15 )
...
* Update README.md
* Update README.md
* Add files via upload
* Update README.md
added bulk file from EnergyFundsForAll.org
* Bulk list from EnergyFundsForAll
* Remove extra whitespace
Signed-off-by: Lauren Ko <lauren.ko@unt.edu>
* Remove duplicate listing of infodocket-11-21-2024.xls
---------
Signed-off-by: Lauren Ko <lauren.ko@unt.edu>
Co-authored-by: James R. Jacobs <freegovinfo@gmail.com>
2024-12-09 14:28:43 -06:00
Bentley Hensel
7a74ece080
Batch commit of sitemap URL seeds under 500MB or 250 files
2024-12-05 18:41:07 -05:00
Bentley Hensel
bd3fdbde47
Batch commit of sitemap URL seeds under 500MB or 250 files
2024-12-05 18:38:34 -05:00
Bentley Hensel
49aee9c7bc
Batch commit of sitemap URL seeds under 500MB or 250 files
2024-12-05 18:37:48 -05:00
Bentley Hensel
bf267e339e
Batch commit of sitemap URL seeds under 500MB or 250 files
2024-12-05 18:37:09 -05:00
Bentley Hensel
c015b8b98d
Batch commit of sitemap URL seeds under 500MB or 250 files
2024-12-05 18:35:50 -05:00
Bentley Hensel
980fa37e2a
Batch commit of sitemap URL seeds under 500MB or 250 files
2024-12-05 18:33:46 -05:00
Bentley Hensel
4042707213
Batch commit of sitemap URL seeds under 500MB or 250 files
2024-12-05 18:33:25 -05:00
Bentley Hensel
0535ad7cf2
Batch commit of sitemap URL seeds under 500MB or 250 files
2024-12-05 18:32:01 -05:00
Bentley Hensel
73719faa91
Batch commit of sitemap URL seeds under 500MB or 250 files
2024-12-05 18:29:05 -05:00
Bentley Hensel
4d70936a23
Batch commit of sitemap URL seeds under 500MB or 250 files
2024-12-05 18:19:10 -05:00
Bentley Hensel
aeea7beac2
Batch commit of sitemap URL seeds under 500MB or 250 files
2024-12-05 17:56:32 -05:00
Lauren Ko
a6e38c7311
Add CDC .html seed list
2024-12-03 15:53:17 -06:00
Lauren Ko
47e8f8eb67
Add Bluesky URL
...
Co-authored-by: Melody Joy Kramer <melodykramer@gmail.com>
2024-12-03 12:34:32 -06:00
James R. Jacobs
d633f6965c
uploaded new bulk seed files from Gary Price and Kelly Smith ( #11 )
...
* adding info docket bulk seed list
* Update README.md
* Update README.md
* Add files via upload
Bulk lists from Gary Price and Kelly Smith. Seed list readme updated with file names.
2024-12-02 12:11:12 -06:00
Lauren Ko
4519cb1ee8
Add NLM seed list
2024-11-22 13:18:59 -06:00
Lauren Ko
37b32203c5
Add irs.gov seeds from Gary Price
2024-11-21 13:24:36 -06:00
James R. Jacobs
58e14710e3
pull requests for info docket bulk list 11-21-2024 ( #5 )
...
* adding info docket bulk seed list
* Update README.md
2024-11-21 12:56:50 -06:00
Lauren Ko
7e3d04ed8c
Update README for bsky_gov_urlverified.txt
2024-11-21 08:54:52 -06:00
Antoine McGrath
01662e4c87
Create bsky_gov_urlverified.txt ( #4 )
...
URLs for official US Senate.gov and House.gov bluesky accounts
2024-11-21 08:49:00 -06:00
Lauren Ko
3a14a8fb3f
Add seed list from EDGI
2024-11-14 09:54:06 -06:00
Lauren Ko
8e8c22e358
Add updated govspeak list
2024-11-08 09:36:35 -06:00
Lauren Ko
a325cf3f79
Add two lists supplied by James Jacobs
2024-10-25 16:48:38 -05:00
Lauren Ko
99460625a9
Add usagov.csv seed list
2024-09-23 11:10:35 -05:00
Greg Lindahl
ba124bec62
Common Crawl seeds ( #3 )
...
* Common Crawl Foundation seeds
* clean mil list to just hostnames
* doc: add location of ccf repo that generated these files
---------
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
2024-09-16 09:33:58 -05:00
Lauren Ko
4392d90188
Add more files from web resources
2024-09-12 15:54:23 -05:00
Lauren Ko
b9dfb4f189
Add seeds from https://touchpoints.app.cloud.gov/registry
2024-09-12 12:19:09 -05:00
Lauren Ko
1b1b4736b4
Add NARA's 118th House Seeds
2024-09-09 16:24:56 -05:00
Lauren Ko
e49378d304
Adding seed lists from NARA and in-scope non gov/mil PURL target domain csv
2024-09-06 16:10:50 -05:00
Lauren Ko
a7cf90dd34
Add GovSpeak seeds
2024-08-01 11:59:00 -05:00
Lauren Ko
b79e23eac5
Add Library of Congress bulk seed list
2024-08-01 09:50:06 -05:00
Lauren Ko
5fe4a4136e
Add CRS reports seeds
2024-06-04 10:42:30 -05:00
Lauren Ko
62a97a9d60
Merge pull request #1 from antoinemcgrath/main
...
Nominated URLs for CRS Reports
2024-06-04 10:35:37 -05:00
Antoine McGrath
05cc45f319
Nominated URLs for CRS Reports
...
Nominated URLs to government hosted CRS Reports from Daniel Schuman with the American Governance Institute
2024-05-29 21:05:02 -05:00
Lauren Ko
7a9154ae73
Add spreadsheet for James Jacobs
2024-05-08 16:37:50 -05:00
Lauren Ko
a355cdf1f4
Add seed lists from GPO
2024-02-16 14:13:29 -06:00
Lauren Ko
3738c930be
Create location for seed-lists and provenance README.md
2024-02-09 13:38:48 -06:00
Lauren Ko
7a9765f462
Update README.md
...
Adds initial text for the 2024 project
2024-02-09 13:35:16 -06:00
Lauren Ko
adada98bee
Initial commit
2024-02-09 13:13:01 -06:00