Commit Graph

53 Commits

Author SHA1 Message Date
James R. Jacobs
bf7cf89659 Update README.md
added bulk list Sustainability-gov-Hermann-Wu-20241220.xlsx
2024-12-20 09:19:42 -06:00
Lauren Ko
86a3364700 Add Defenders of Wildlife seeds 2024-12-19 15:30:47 -06:00
James R. Jacobs
0f565c94e4
bulk list submitted 20241219 by Ailsa Hermann-Wu (#21)
* Update README.md

bulk list on performance.gov by Ailsa Hermann-Wu

* Add files via upload

Bulk list submitted by Ailsa Hermann-Wu re performance.gov 20241219
2024-12-19 15:00:38 -06:00
James R. Jacobs
f1694a635c
bulk seed list of GAO seeds by Ailsa Hermann-Wu (#20)
* Update README.md

bulk seed list of GAO seeds sent by Ailsa Hermann-Wu

* Add files via upload

bulk seed list of GAO seeds sent by Ailsa Hermann-Wu
2024-12-18 10:14:24 -06:00
James R. Jacobs
ef3bd7d5f9
3 new bulk lists submitted by Gary Price (#19)
* Update README.md

added another bulk list from Gary Price/Infodocket. File is USDA_FIS_ERS.xlsx

* Add files via upload

USDA_FIS_ERS.xlsx from Gary Price/infodocket. 1700 or so urls from the USDA. Specifically, the Food Inspection Service and Economic Research Service.

* Update README.md

3 more bulk lists from Gary Price sent on 12/14/2024

* Add files via upload

3 new bulk lists from Gary Price submitted 12/14/2024
2024-12-16 14:11:03 -06:00
James R. Jacobs
ed7cabab8e
another bulk seed list from Gary Price (USDA) (#18)
* Update README.md

added another bulk list from Gary Price/Infodocket. File is USDA_FIS_ERS.xlsx

* Add files via upload

USDA_FIS_ERS.xlsx from Gary Price/infodocket. 1700 or so urls from the USDA. Specifically, the Food Inspection Service and Economic Research Service.
2024-12-12 15:38:05 -06:00
Lauren Ko
97a727fc4e Update README for sitemaps.txt and sitemap-url-seeds directory 2024-12-11 13:21:44 -06:00
Lauren Ko
94e610e8e1
Merge pull request #14 from TheBoatyMcBoatFace/main
* Batch commit of sitemap URL seeds under 500MB or 250 files

* Batch commit of sitemap URL seeds under 500MB or 250 files

* Batch commit of sitemap URL seeds under 500MB or 250 files

* Batch commit of sitemap URL seeds under 500MB or 250 files

* Batch commit of sitemap URL seeds under 500MB or 250 files

* Batch commit of sitemap URL seeds under 500MB or 250 files

* Batch commit of sitemap URL seeds under 500MB or 250 files

* Batch commit of sitemap URL seeds under 500MB or 250 files

* Batch commit of sitemap URL seeds under 500MB or 250 files

* Batch commit of sitemap URL seeds under 500MB or 250 files

* Batch commit of sitemap URL seeds under 500MB or 250 files

* Forgot to add sitemaps.txt

Signed-off-by: Bentley Hensel <bentleyhensel@gmail.com>

---------

Signed-off-by: Bentley Hensel <bentleyhensel@gmail.com>
2024-12-11 12:58:16 -06:00
Bentley Hensel
bad24fe745
Forgot to add sitemaps.txt
Signed-off-by: Bentley Hensel <bentleyhensel@gmail.com>
2024-12-10 20:36:47 -05:00
Bentley Hensel
e2054da35c
Merge pull request #3 from end-of-term/main
update
2024-12-10 20:33:49 -05:00
Lauren Ko
3b3bf304b9 Update README for CDC PDFs 2024-12-10 14:55:01 -06:00
YakShavingAsAService
f4b194553a
PDFs from the CDC website - single file (#17)
This is a csv file of PDF links obtained from webpages found on the US CDC website. It contains 46,873 links, with the format: the source HTML file containing the PDF link; the time in UTC in which the accessibility of the PDF file was confirmed; and a URL pointing to the PDF file itself.
    
This file replaces the two previous files. This file has had the PDF links deduped, so if multiple pages point to the same PDF, you'll only see an entry for the first reference. PDF links that point to non-gov domains have been omitted as well.If the PDF link contains a fragment, the fragment will be removed from the path (e.g.  "/a/path/mypdf.pdf#page=3" will get turned into "/a/path/mypdf.pdf"). All the PDF files have had their accessibility and content type verified with a HTTP HEAD request on Dec. 09 2024.
2024-12-10 14:51:36 -06:00
James R. Jacobs
5a9195431e
bulk list of NPS seeds submitted by Hermann-Wu - Hermann-Wu-nps-20241209.txt (#16)
* Update README.md

added NPS seeds submitted by Hermann-Wu - Hermann-Wu-nps-20241209.txt

* Add files via upload

NPS seeds submitted by Hermann-Wu - Hermann-Wu-nps-20241209.txt

* Update README.md

edited the contact section.
2024-12-10 12:48:14 -06:00
Bentley Hensel
4bef8b223d
Merge pull request #2 from end-of-term/main
Add some Bureau of Land Management and EnergyFundsForAll.org seeds (#15)
2024-12-09 15:59:46 -05:00
Lauren Ko
ed4d0f0d8a
Add some Bureau of Land Management and EnergyFundsForAll.org seeds (#15)
* Update README.md

* Update README.md

* Add files via upload

* Update README.md

added bulk file from EnergyFundsForAll.org

* Bulk list from EnergyFundsForAll

* Remove extra whitespace

Signed-off-by: Lauren Ko <lauren.ko@unt.edu>

* Remove duplicate listing of infodocket-11-21-2024.xls

---------

Signed-off-by: Lauren Ko <lauren.ko@unt.edu>
Co-authored-by: James R. Jacobs <freegovinfo@gmail.com>
2024-12-09 14:28:43 -06:00
Bentley Hensel
7a74ece080
Batch commit of sitemap URL seeds under 500MB or 250 files 2024-12-05 18:41:07 -05:00
Bentley Hensel
bd3fdbde47
Batch commit of sitemap URL seeds under 500MB or 250 files 2024-12-05 18:38:34 -05:00
Bentley Hensel
49aee9c7bc
Batch commit of sitemap URL seeds under 500MB or 250 files 2024-12-05 18:37:48 -05:00
Bentley Hensel
bf267e339e
Batch commit of sitemap URL seeds under 500MB or 250 files 2024-12-05 18:37:09 -05:00
Bentley Hensel
c015b8b98d
Batch commit of sitemap URL seeds under 500MB or 250 files 2024-12-05 18:35:50 -05:00
Bentley Hensel
980fa37e2a
Batch commit of sitemap URL seeds under 500MB or 250 files 2024-12-05 18:33:46 -05:00
Bentley Hensel
4042707213
Batch commit of sitemap URL seeds under 500MB or 250 files 2024-12-05 18:33:25 -05:00
Bentley Hensel
0535ad7cf2
Batch commit of sitemap URL seeds under 500MB or 250 files 2024-12-05 18:32:01 -05:00
Bentley Hensel
73719faa91
Batch commit of sitemap URL seeds under 500MB or 250 files 2024-12-05 18:29:05 -05:00
Bentley Hensel
4d70936a23
Batch commit of sitemap URL seeds under 500MB or 250 files 2024-12-05 18:19:10 -05:00
Bentley Hensel
aeea7beac2
Batch commit of sitemap URL seeds under 500MB or 250 files 2024-12-05 17:56:32 -05:00
Lauren Ko
a6e38c7311 Add CDC .html seed list 2024-12-03 15:53:17 -06:00
Lauren Ko
47e8f8eb67 Add Bluesky URL
Co-authored-by: Melody Joy Kramer <melodykramer@gmail.com>
2024-12-03 12:34:32 -06:00
James R. Jacobs
d633f6965c
uploaded new bulk seed files from Gary Price and Kelly Smith (#11)
* adding info docket bulk seed list

* Update README.md

* Update README.md

* Add files via upload

Bulk lists from Gary Price and Kelly Smith. Seed list readme updated with file names.
2024-12-02 12:11:12 -06:00
Lauren Ko
4519cb1ee8 Add NLM seed list 2024-11-22 13:18:59 -06:00
Lauren Ko
37b32203c5 Add irs.gov seeds from Gary Price 2024-11-21 13:24:36 -06:00
James R. Jacobs
58e14710e3
pull requests for info docket bulk list 11-21-2024 (#5)
* adding info docket bulk seed list

* Update README.md
2024-11-21 12:56:50 -06:00
Lauren Ko
7e3d04ed8c Update README for bsky_gov_urlverified.txt 2024-11-21 08:54:52 -06:00
Antoine McGrath
01662e4c87
Create bsky_gov_urlverified.txt (#4)
URLs for official US Senate.gov and House.gov bluesky accounts
2024-11-21 08:49:00 -06:00
Lauren Ko
3a14a8fb3f Add seed list from EDGI 2024-11-14 09:54:06 -06:00
Lauren Ko
8e8c22e358 Add updated govspeak list 2024-11-08 09:36:35 -06:00
Lauren Ko
a325cf3f79 Add two lists supplied by James Jacobs 2024-10-25 16:48:38 -05:00
Lauren Ko
99460625a9 Add usagov.csv seed list 2024-09-23 11:10:35 -05:00
Greg Lindahl
ba124bec62
Common Crawl seeds (#3)
* Common Crawl Foundation seeds

* clean mil list to just hostnames

* doc: add location of ccf repo that generated these files

---------

Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
2024-09-16 09:33:58 -05:00
Lauren Ko
4392d90188 Add more files from web resources 2024-09-12 15:54:23 -05:00
Lauren Ko
b9dfb4f189 Add seeds from https://touchpoints.app.cloud.gov/registry 2024-09-12 12:19:09 -05:00
Lauren Ko
1b1b4736b4 Add NARA's 118th House Seeds 2024-09-09 16:24:56 -05:00
Lauren Ko
e49378d304 Adding seed lists from NARA and in-scope non gov/mil PURL target domain csv 2024-09-06 16:10:50 -05:00
Lauren Ko
a7cf90dd34 Add GovSpeak seeds 2024-08-01 11:59:00 -05:00
Lauren Ko
b79e23eac5 Add Library of Congress bulk seed list 2024-08-01 09:50:06 -05:00
Lauren Ko
5fe4a4136e
Add CRS reports seeds 2024-06-04 10:42:30 -05:00
Lauren Ko
62a97a9d60
Merge pull request #1 from antoinemcgrath/main
Nominated URLs for CRS Reports
2024-06-04 10:35:37 -05:00
Antoine McGrath
05cc45f319
Nominated URLs for CRS Reports
Nominated URLs to government hosted CRS Reports from Daniel Schuman with the American Governance Institute
2024-05-29 21:05:02 -05:00
Lauren Ko
7a9154ae73 Add spreadsheet for James Jacobs 2024-05-08 16:37:50 -05:00
Lauren Ko
a355cdf1f4 Add seed lists from GPO 2024-02-16 14:13:29 -06:00