From 86a336470004d87f6b0b5a1f0e7c5addda80df01 Mon Sep 17 00:00:00 2001 From: Lauren Ko Date: Thu, 19 Dec 2024 15:30:47 -0600 Subject: [PATCH] Add Defenders of Wildlife seeds --- .../EoT archive submission - DoW 12-19-24.txt | Bin 0 -> 18154 bytes seed-lists/README.md | 16 ++++++++++------ 2 files changed, 10 insertions(+), 6 deletions(-) create mode 100644 seed-lists/EoT archive submission - DoW 12-19-24.txt diff --git a/seed-lists/EoT archive submission - DoW 12-19-24.txt b/seed-lists/EoT archive submission - DoW 12-19-24.txt new file mode 100644 index 0000000000000000000000000000000000000000..7b477503087e554b1eee35d3a509c8b7ce57369f GIT binary patch literal 18154 zcmd6v+in|25{CPFfqjPni{x%RD<5KK7lx4o670opf*8SG3KU6EOel)s!HV~3@}!&m zUzaK~Lw1iWk_sCFIp$1H_g{zptGcJgzy1Fzto64DtMHFd>u(T7`Yd#ApwBeign78t z_cHtxE_A%n@k#tWQR}~kzl7Z$mSL!~)9@~g)po4j20Bi3*E`*_(3w^2d#E;}_^Xb0 zjKj6g4Apw37GoW)r`@Y{xKwXbwWx(|EM#@WajiRR-Nn5lU7u^zbNvSQwJ?B{HYwZ} zI!_zJnQ8&DmGCWMk3VXZq550t9`u7g{-)1bzZOxBm44BFs+Myd$DxW>Z=!tv)W6?H zZM+WOb$a|YiEpTW=TR?9>9Mg+A7T*RM1+^p6G-#Y<=UpLoNvvQ#hn!vQ8R1(%Av2+ zb|z_%4DFakD(Pb^tmc0yTReRHrFz4o&A%%7!a#n*IdplPF*d>t^0|6M2IQJVyW`Vo zzk8@8%pc_N*@@m56aTU~H#&+mtLU5Ny^Mk0=5bcR5bG!v<6g_g%=D4kFb^tu8tATc z1`XWmu48C-8_(sPw3U3uxWe@CPUpuO^>N0$37e>e$LJlM&clz7g4}K5zG608N_XuU zcpRx#viie%dMn9cwJ)P*)q&p)`+qJ^g7J()M`Dw+@G>(3F`yL9|K-ZPV27dki76*$uvt?8}5m#5*`2lX=2C@=-15xdb|Iu0(P_Ghxi zFxqe>RAeStgZrN78)>&~jNEnjTCw(X`PsRSKZGxI#`%H~{}aB~wYU1DV=Zxdpm8mZ zU}NLNZ=$7{y;r)Ek;ma1U0a2J3n#57`sIvWC(pjseb@R!4j3%$HF%UqRb=((vCzoO zAu_IRtVibmN~Z5xdi%c9OD(C}z1HDhl7!6cm+&U`=_AgC?po(xOLN3#tcCwuO#N2K z$hjoeD!{@RbJX zuNi)6?n`q((?gr-ZC)^Qu^E{x^O5L*u90Xedu)WIis+3Mj1$P7(pH z-9yhy$;H^9%SGQq$06=?SBH22YY|_=_sx`?bbhbTx+a%|*;wP+sQ5lU&o@57*-93< ziF3lN(my`=mBlfOfW!eJd)p_m)kp+tLM7E9a(All}7uJwud9?u9MkKY3B-# zrN7nTD6siJ8uV44;qcteQS`Su9tzK0@c(`!`Id*H!4fbJ@x%Pf_flLR`4-*LyK8H_ z(&p@iq=Wy7^6WaC%R9+6K8Z7M8P8nEvtJ7bW5G4-1ROrgn_aY8Jrcuw<-s^av@T{? zCmpYC71m{p{wSDUBet^>SQU+sr4sd6XAN~E3m^mPm!JF(&uOnf`+E2M$Jdh&%YSmx zwAlk0V0CzlY>Akd&N=syjO+`Jo<<+B)e}BT?AWJ2G)yc)A4CT*(;NM3y>kXikt@9ohaYfLyFP6)+Xpcl%Z+(#`)!YdmYb&ujfG#x z2EKmW-4T5(OWco+=N=%#O1>_}SDC|Pj+XJmRL9mFS(ve~G4^}mARi(_-*+t5`>gM6 zwPpMC?7tC(;o)QV%Xq#^!c42~^v3MwNein1W&v~E(clxi_odtUZb>^|-GA2Pv}0(< zPfZ3i)r~)_tgfO4En>2^BBp&vq&v3G;TFb5W9-3!Y+LUZ*$kcM*vH1~AX#yxB4b;*J-+Q7@9q=&I--K(n9DPvOg{zd$n;Uaw{-Q<-2q9(qwVb45U?YNSKt%G zYgSL_&7LVWyLM>I_VMs;_J}-WiT+YpY>fn z!|u-nYPBFcwP!>;oBK?EuXKFY%i(`MCU(cm!DK5H)+(i8Ix7&<(*iL)D-hH30x`WP z5YwjxVtQF1rm}e4II}z+KZ|=#W$`#HURrE9)Bap}n9AdEYH!NZQ+Yg2wL^J&Dv!s{ z3d_f>zN|bRH(X`$IE-1Ek3TJt$Eh?dPfw@jaocsY$`^K{OIe@iaGjbHc5#)lPS4>w zH8Y>-Ze5GWq`W>-n_4PQRwl z;VP5w$p=rH6Xx_)Cf}c_vb0QnmC5(KhgPP(%H(^V4VI{{)35u}JaPK9e-2lfe1G~q zfSkS#<9n-T7~R#C@SMK;R#UrDfC>yMn(S>ztJUDlG*%5*&7|!IP-n2v72a~~zQ@y7 z`_T8?eix@#>G38Mb$edB-Nr$^1a-W;H`T5s;9VK&Ar|-45!k)Wds)cN`<7Ia{ZB`) z>hd}ru2VCFw8qXH{ub0sbPHt>VMxi6Xdox$!zrQ`%Bgf zo;#FQkYoKvMW7?!s84EDnR%}J!?EMR%uKXH|I`OjTkJOX|9@or{}^G$BWD%;tIZ5l sWwqb2ewYn+wR?Po{w|BZ*K^-HnzyRN!^6iq`g_-n*WpV)%jb~ucSXhci~s-t literal 0 HcmV?d00001 diff --git a/seed-lists/README.md b/seed-lists/README.md index 4a26fe7..c27ed48 100644 --- a/seed-lists/README.md +++ b/seed-lists/README.md @@ -8,8 +8,12 @@ Provenance notes are included below. These lists will be uploaded into the See [commoncrawl/ccf-eot-seeds-2024](https://github.com/commoncrawl/ccf-eot-seeds-2024) for details. -* ccf-gov-federal-web-graph-2024-jun-jul-aug.txt -- all .gov federal hostnames from current-federal.csv domains in CCF's 2024 June/July/August web graph -* ccf-mil-web-graph-2024-jun-jul-aug.txt -- all .mil hostnames from CCF's 2024 June/July/August web graph +* ccf-gov-federal-web-graph-2024-jun-jul-aug.txt - all .gov federal hostnames from current-federal.csv domains in CCF's 2024 June/July/August web graph +* ccf-mil-web-graph-2024-jun-jul-aug.txt - all .mil hostnames from CCF's 2024 June/July/August web graph + +### Defenders of Wildlife seeds +Seeds submitted by Andrew Carter on behalf of Defenders of Wildlife: +* EoT archive submission - DoW 12-19-24.txt ### Environmental Data & Governance Initiative (EDGI) seeds Seeds supplied by Gretchen Gehrke of EDGI: @@ -34,9 +38,9 @@ Seed lists produced by Gary Price, editor of infoDOCKET: * HRSA (2020-).xlsx * BLM 2020-2024.xlsx - 2544 entries from the Bureau of Land Management. Most but not all PDFs. Along with the usual techniques, a number of extra searches were done to find documents that include terms like ANWR, oil, fracking, etc. * USDA_FIS_ERS.xlsx. 1700 or so urls from the USDA. Specifically, the Food Inspection Service and Economic Research Service. A few xlsx urls too. -* IARPA.gov 406 seeds HTML and PDF. 2020-Present IARPA 2020-Present.xlsx -* APRA-H.gov 412 Seeds HTML and PDF 2020-Present. ARPA-H.xlsx -* Medicaid.gov 1983 seeds PDF and a few XLSX 2020-Present MEDICAID 2020-2024.xlsx +* IARPA 2020-Present.xlsx - IARPA.gov 406 seeds HTML and PDF 2020-Present. +* ARPA-H.xlsx - APRA-H.gov 412 Seeds HTML and PDF 2020-Present. +* MEDICAID 2020-2024.xlsx - Medicaid.gov 1983 seeds PDF and a few XLSX 2020-Present. ### Internet Archive seeds Seeds supplied by Antoine McGrath of Internet Archive: @@ -82,7 +86,7 @@ Seeds supplied by Kelly L. Smith, Government Information Librarian and Librarian * Federal URLs linked to on EnergyFundsForAll.org.xlsx - Submitted by Sally Robertson, EnergyFundsForAll.org * Hermann-Wu-nps-20241209.txt - NPS seeds submitted by Ailsa Hermann-Wu * GAO-hermann-wu-20241218.xlsx - GAO seeds submitted by Ailsa Hermann-Wu -* Performance.gov-equity-hermann-wu-20241219.xlsx seeds submitted by Ailsa Hermann-Wu on 20241219 centered around Performance.gov -- these are all PDFs of agency equity action plans or AANHPI plans. +* Performance.gov-equity-hermann-wu-20241219.xlsx - seeds submitted by Ailsa Hermann-Wu on 20241219 centered around Performance.gov -- these are all PDFs of agency equity action plans or AANHPI plans. ### Seeds sourced from Web resources The End of Term Web Archive team and other contributors compiled a list of sources on the Web from which to source seeds: