See [commoncrawl/ccf-eot-seeds-2024](https://github.com/commoncrawl/ccf-eot-seeds-2024) for details.
* ccf-gov-federal-web-graph-2024-jun-jul-aug.txt -- all .gov federal hostnames from current-federal.csv domains in CCF's 2024 June/July/August web graph
* ccf-mil-web-graph-2024-jun-jul-aug.txt -- all .mil hostnames from CCF's 2024 June/July/August web graph
Seeds supplied by Dorothy Bower of the U.S. Government Publishing Office:
* FDLP_WEb_Archiveseed_list_20240212.csv - list of seeds from the FDLP Web Archive with one page only seeds deleted, that were mainly embedded youtube videos.
* PURL_server_domains_20240214.csv - report of all target domains from the PURL server; some determined to be out of scope were not included in the Nomination Tool.
* PURL_server_domains_20240214_non_gov_mil.csv - non .gov/.mil seeds from the PURL_server_domains_20240214.csv list that were determined to be in scope by Mark Phillips of UNT.
### National Archives and Records Administration seeds
Seeds supplied by Elizabeth England of the U.S. National Archives and Records Administration (NARA):
* 117th_House_Seeds.xlsx - contains five sheets, one each for: House members, majority committees, minority committees, caucuses, and leadership/support/other.
* 118th_House_Seed_List.xlsx - contains five sheets, one each for: House members, majority committees, minority committees, caucuses, and leadership/support/other.
* FOIA_Libraries_Dataset_Oct_3_2023_Final.xlsx - spreadsheet with seeds for all of the federal FOIA libraries. Lisa DeLuca, who collated the list, said it would be fine to use her spreadsheet from https://works.bepress.com/lisa_deluca/59/.
Seeds supplied by Kelly L. Smith, Government Information Librarian and Librarian for Urban Studies & Planning / Environmental Studies at UC San Diego Library (via James Jacobs):
* govspeakeot080124.xlsx - list of all the live URLs from Smith's [GovSpeak acronym and abbreviation guide](https://ucsd.libguides.com/govspeak/home).
* govspeakurls1124.txt - updated list of govspeak links, about 250 new items added since the August list; also, CDC, ED, and a couple other agencies have done significant reorganization of their websites since then
* US_Digital_Registry.csv - CSV file generated on 9/11/2024 by Praneeth Rikka at UNT from the data at the [Touchpoints U.S. Digital Registry](https://touchpoints.app.cloud.gov/registry).
* Military-Departments-A-Z-List.csv - CSV file generated on 9/11/2024 by Lauren Ko at UNT from the data of the [U.S. Department of Defense's A-Z List](https://www.defense.gov/Resources/Military-Departments/A-Z-List/).
* current-federal.csv - Pulled from Cybersecurity and Infrastructure Security Agency's [dotgov-data repo](https://github.com/cisagov/dotgov-data) (via https://raw.githubusercontent.com/cisagov/dotgov-data/main/current-federal.csv on 9/12/2024).
* site-scanning-target-url-list.csv - Pulled from [GSA's federal-website-index repo](https://github.com/GSA/federal-website-index) (via https://github.com/GSA/federal-website-index/raw/main/data/site-scanning-target-url-list.csv on 9/12/2024).
* us-government-website-directory.csv - Pulled from [GSA's federal-website-directory repo](https://github.com/GSA/federal-website-directory) (via https://raw.githubusercontent.com/GSA/federal-website-directory/main/us-government-website-directory.csv on 9/12/2024). The repo README indicates "The Federal Website Directory is a comprehensive list of the public-facing websites of the U.S. Federal Government, spanning all three branches".
* dotmil_websites.csv - Pulled from [GSA's federal-website-index repo](https://github.com/GSA/federal-website-index) (via https://github.com/GSA/federal-website-index/blob/main/data/dataset/dotmil_websites.csv on 9/12/2024)
* 2_govt_urls_federal_only.csv - Pulled from [GSA's govt-urls repo](https://github.com/GSA/govt-urls/) (via https://raw.githubusercontent.com/GSA/govt-urls/main/2_govt_urls_federal_only.csv on 9/12/2024). The README indicates the repo "contains the list of public government managed domains that exist outside of the top-level .gov and .mil domains."