/AAD/ - Archiving And Donating computer resources general - /g/ (#105744667) [Archived: 746 hours ago]

Anonymous
6/29/2025, 5:59:13 PM No.105744667
1720566063838193
1720566063838193
md5: df44ebe0c439300d36cbfe38edc76515🔍
Useful archiving efforts and other projects to help out with for people new to and interested in archiving:

HIGH priority (If you don't help archive these automatically, the data will probably be lost forever):

1. http://warrior.archiveteam.org/
Help out automatically archive things being shut down right now by running ArchiveTeam Warrior program (or specific containers) in the background:
Requirements: Few GB of space, some bandwidth and small amount of CPU power, more info: https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

If you learn that a site or any online data is in danger of shutting down, read through this page and contact ArchiveTeam on their IRC if required in order to have it archived: https://wiki.archiveteam.org/index.php/Projects

2. Help out automatically forward URLs you browse that are not archived on https://archive.org to them for archival with a browser extension:
https://github.com/internetarchive/wayback-machine-webextension
Replies: >>105744796
Anonymous
6/29/2025, 5:59:46 PM No.105744670
MEDIUM priority (Important overall)

3. Seed torrents for as long as possible, rare data forever. Make sure to look up a guide for your router to PORT FORWARD your torrent client port, to substantially increase your upload (and your download) speed. In low population torrent swarms, if no one is port forwarded then you might not be able to connect to each other at all and exchange any data despite having it.
Requirements: As much or as little bandwitdh you want (you can set the limits if you need to)
https://github.com/qbittorrent/qBittorrent (Recommended client, especially to replace uTorrent)

4. Archive web pages you want to have a local copy of with a "Web Extension for saving a faithful copy of a complete web page in a single HTML file with a single click"
https://github.com/gildas-lormeau/SingleFile

5. Archive videos with "GUI front-end for youtube-dl, yt-dlp and other compatible video downloaders"
https://github.com/axcore/tartube

6. "Capture or record any area of your screen and share it with a single press of a key"
https://github.com/ShareX/ShareX

7. Archive entire websites you want to have a local copy of
https://www.httrack.com/
Anonymous
6/29/2025, 6:00:46 PM No.105744674
8. Publish the data that you have archived that isn't easily or at all available online. You can easily create torrents yourself in your torrent client and then share the magnet link to it anywhere online for anyone to access and, as long as DHT (Distributed Hash Table, decentralized way to share torrents without the need for any specific tracker) is enabled in settings (on by default), your files will be searchable on DHT by DHT crawlers, local or online (for example https://btdig.com/, where you can actually also search for FILE NAMES within all DHT torrents)
(archive.org also creates torrents for all uploads automatically but their torrents shouldn't be relied on because of an error-prone implementation and since they can also break when more files are uploaded or if the item's metadata changes, which includes even getting a new comment on the item)


OTHER useful things:

- In your torrent client settings add the best trackers to be automatically added for all of your newly added torrents (helps more easily connect to peers, especially in obscure torrents):
https://github.com/ngosang/trackerslist

- Look into running a node for I2P (anonymous private network within the global internet):
Requirements: Mostly bandwidth, more info: https://geti2p.net/en/faq
https://geti2p.net/

- Look into running Tor/Hyphanet(Freenet)/IPFS/YaCy/SearXNG nodes.

- "A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI"
https://github.com/bitmagnet-io/bitmagnet

- "ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline"
https://github.com/ArchiveBox/ArchiveBox

- Look into donating your PC resources to be used more intensively in projects:
BOINC (Berkeley Open Infrastructure for Network Computing: https://boinc.berkeley.edu/projects.php
GIMPS (Great Internet Mersenne Prime Search): https://www.mersenne.org/
Anonymous
6/29/2025, 6:01:42 PM No.105744680
Based. I like going to libraries and scanning old books to upload that aren't available on mainstream archival platforms.
Replies: >>105745144
Anonymous
6/29/2025, 6:01:47 PM No.105744681
- Additional archiving tools: https://github.com/iipc/awesome-web-archiving

- Additional links to archiving and similar communities:
https://wiki.archiveteam.org/index.php/Archiveteam:IRC
https://www.reddit.com/r/Archiveteam
https://www.reddit.com/r/DataHoarder
https://www.reddit.com/r/DataHoarder/wiki/index/ - Hardware and software for data hoarding FAQ
https://www.reddit.com/r/lostmedia
https://www.reddit.com/r/GamePreservationists
https://www.reddit.com/r/torrents
https://www.reddit.com/r/qBittorrent
https://annas-archive.se/torrents
>>>/t/

What are you archiving or want to archive?
Do you have or know anyone who has some rare interesting data or media not available online?
Anonymous
6/29/2025, 6:05:04 PM No.105744705
Thread changelog:

Added quick info for
>YaCy/SearXNG nodes

YaCy: free and open-source distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance

SearXNG: free and open-source federated metasearch engine
Anonymous
6/29/2025, 6:11:56 PM No.105744762
>No mention of archive.is
Go back tranny
Replies: >>105744895
Anonymous
6/29/2025, 6:16:48 PM No.105744796
>>105744667 (OP)
Good thread
Anonymous
6/29/2025, 6:29:35 PM No.105744895
>>105744762
>archive.is
1. In my experience it literally never had a page saved that wasn't in the Wayback Machine already
2. It has orders of magnitude less data than the Wayback Machine, especially old data
3. It doesn't have the important ability to "Save outlinks" when saving a page
4. It doesn't save any Flash files
5. It doesn't save any PDFs
6. It doesn't save any videos
7. It doesn't save any sounds
8. 50MB limit per page, which is a big problem for a lot of websites, especially nowadays which, if all images are included, can easily be hundreds of MB

Aside from this, it doesn't have a single unique and useful feature compared to the Wayback Machine, it's still centralized, theres no extension to automatically forward unarchived pages there for archival, there's no way to search all the text inside the pages, or any other feature that would make it worth for someone to go out their way to use it. It's simply another place with some copies of some limited amount of data already available elsewhere.
So there is no point in singling it out here, it's already mentioned in the linked further reading in https://github.com/iipc/awesome-web-archiving
Replies: >>105745202 >>105745239 >>105745551
Anonymous
6/29/2025, 6:59:39 PM No.105745144
>>105744680
very based
Anonymous
6/29/2025, 7:04:57 PM No.105745202
>>105744895
In my experience, point 1 is wrong. Maybe for questionable reasons, like archive.today (or is or ph or whatever) not giving a shit about takedown requests from regular people. I agree with the rest though.
Anonymous
6/29/2025, 7:08:37 PM No.105745239
>>105744895
>What are subscription news articles
>What is twitter, instagram, reddit etc.
Replies: >>105745308
Anonymous
6/29/2025, 7:15:48 PM No.105745308
>>105745239
For proper archival, these sites still basically require burner accounts, which anyone who archives anything on them will already have, but for subscription news articles I do agree, I think archive.is does make that easier, even though I believe that any news site that requires a subscirption to read doesn't have anything useful to say anyway.
Replies: >>105745469
Anonymous
6/29/2025, 7:31:48 PM No.105745469
>>105745308
>For proper archival, these sites still basically require burner accounts, which anyone who archives anything on them will already have
What the fuck are you talking about. You are wrong only all accounts. Let alone the fact that some random retard's personal archive can't be used as a trusted source.
>any news site that requires a subscirption to read doesn't have anything useful to say anyway.
Holy shit, you may actually be retarded.
Replies: >>105745551
Anonymous
6/29/2025, 7:39:11 PM No.105745551
>>105745469
>Let alone the fact that some random retard's personal archive can't be used as a trusted source.
I didn't say it can't be used as a trusted source, just that any actual archival of any news article website/reddit/twitter/instagram accounts won't be done by you inputing thousands of URLs into archive.is and then waiting for it to be done, you will use local tools to do that where you control all of the parameters and aren't severely gimped by the large limits of archive.is as outlined >>105744895
And those local tools will basically require a login to prevent harsh rate limits or to allow access to the content in the first place.

>>any news site that requires a subscirption to read doesn't have anything useful to say anyway.
And this is correct, any website that requires you to pay before you can read something doesn't have anything useful to say and is most probably also just regurgitating information from elsewhere. I'm not saying you shouldn't archive it, just that this is an edge case that doesn't point to archive.is requiring a separate mention in my OP, which the discussion was about.
Replies: >>105745909
Anonymous
6/29/2025, 8:13:07 PM No.105745909
>>105745551
>doesn't have anything useful to say and is most probably also just regurgitating information from elsewhere
Where in the world did you get this idea? Do you think journalism is free? Every major newspaper runs exclusive interviews and in depth coverage of random niche fields. It sounds like you just don't care about anything that isn't some 8bit retro synth wave faggotry.
What exactly do you think is worth archiving? Random trannyporn you watch? Le bing bang wahoo executables that are all already archived on thousands of computers, rip sites and private trackers? The most important things to archive are books - which archive.org does, but zlib and other sites also do, prehaps better - and journals which it also does to some degree. Then news articles, of which only archive.is does and no one else.
>inputing thousands of URLs into archive.is and then waiting for it to be done
More than 99% of social media posts are useless. These sites only need specific (aka influential) posts to be archived.
Replies: >>105746159 >>105746796 >>105746938
Anonymous
6/29/2025, 8:38:52 PM No.105746159
>>105745909
The initial argument was against archive.is not being included directly in the OP list. I am arguing against the usefulness of archive.is for the average new archivist or someone wanting to mass archive a lot of data, not that archive.is is itself bad.

These niche exceptions of its usefulness for niche, mostly single-use cases prove the rule. There are hundreds of more useful tools that are also not listed because this is a basic start guide with further linked readings just to get people going.
Anonymous
6/29/2025, 9:45:18 PM No.105746796
>>105745909
The initial argument was against archive.is not being included directly in the OP list. I am arguing against the usefulness of archive.is for the average new archivist or someone wanting to mass archive a lot of data, not that archive.is is itself bad.

These niche exceptions of its usefulness for niche, mostly single-use cases prove the rule. There are hundreds of more useful tools that are also not listed because this is a basic start guide with further linked readings just to get people going.
Anonymous
6/29/2025, 10:03:55 PM No.105746938
>>105745909
The initial argument was against archive.is not being included directly in the OP list. I am arguing against the usefulness of archive.is for the average new archivist or someone wanting to mass archive a lot of data, not that archive.is is itself bad.

These niche exceptions of its usefulness for niche, mostly single-use cases prove the rule. There are hundreds of more useful tools that are also not listed because this is a basic start guide with further linked readings just to get people going