← Home ← Back to /g/

Thread 105688615

27 posts 2 images /g/
Anonymous No.105688615 >>105689517 >>105689581 >>105699168 >>105701818
/AAD/ - Archiving And Donating computer resources general
Useful archiving efforts and other projects to help out with for people new to and interested in archiving:

HIGH priority (If you don't help archive these automatically, the data will probably be lost forever):

1. http://warrior.archiveteam.org/
Help out automatically archive things being shut down right now by running ArchiveTeam Warrior program (or specific containers) in the background:
Requirements: Few GB of space, some bandwidth and small amount of CPU power, more info: https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

If you learn that a site or any online data is in danger of shutting down, read through this page and contact ArchiveTeam on their IRC if required in order to have it archived: https://wiki.archiveteam.org/index.php/Projects

2. Help out automatically forward URLs you browse that are not archived on https://archive.org to them for archival with a browser extension:
https://github.com/internetarchive/wayback-machine-webextension
Anonymous No.105688617
MEDIUM priority (Important overall)

3. Seed torrents for as long as possible, rare data forever. Make sure to look up a guide for your router to PORT FORWARD your torrent client port, to substantially increase your upload (and your download) speed. In low population torrent swarms, if no one is port forwarded then you might not be able to connect to each other at all and exchange any data despite having it.
Requirements: As much or as little bandwitdh you want (you can set the limits if you need to)
https://github.com/qbittorrent/qBittorrent (Recommended client, especially to replace uTorrent)

4. Archive web pages you want to have a local copy of with a "Web Extension for saving a faithful copy of a complete web page in a single HTML file with a single click"
https://github.com/gildas-lormeau/SingleFile

5. Archive videos with "GUI front-end for youtube-dl, yt-dlp and other compatible video downloaders"
https://github.com/axcore/tartube

6. "Capture or record any area of your screen and share it with a single press of a key"
https://github.com/ShareX/ShareX

7. Archive entire websites you want to have a local copy of
https://www.httrack.com/
Anonymous No.105688624
8. Publish the data that you have archived that isn't easily or at all available online. You can easily create torrents yourself in your torrent client and then share the magnet link to it anywhere online for anyone to access and, as long as DHT (Distributed Hash Table, decentralized way to share torrents without the need for any specific tracker) is enabled in settings (on by default), your files will be searchable on DHT by DHT crawlers, local or online (for example https://btdig.com/, where you can actually also search for FILE NAMES within all DHT torrents)
(archive.org also creates torrents for all uploads automatically but their torrents shouldn't be relied on because of an error-prone implementation and since they can also break when more files are uploaded or if the item's metadata changes, which includes even getting a new comment on the item)


OTHER useful things:

- In your torrent client settings add the best trackers to be automatically added for all of your newly added torrents (helps more easily connect to peers, especially in obscure torrents):
https://github.com/ngosang/trackerslist

- Look into running a node for I2P (anonymous private network within the global internet):
Requirements: Mostly bandwidth, more info: https://geti2p.net/en/faq
https://geti2p.net/

- Look into running Tor/Hyphanet(Freenet)/IPFS nodes.

- "A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI"
https://github.com/bitmagnet-io/bitmagnet

- "ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline"
https://github.com/ArchiveBox/ArchiveBox

- Look into donating your PC resources to be used more intensively in projects:
BOINC (Berkeley Open Infrastructure for Network Computing: https://boinc.berkeley.edu/projects.php
GIMPS (Great Internet Mersenne Prime Search): https://www.mersenne.org/

- Additional archiving tools: https://github.com/iipc/awesome-web-archiving
Anonymous No.105688633
- Additional links to archiving and similar communities:
https://wiki.archiveteam.org/index.php/Archiveteam:IRC
https://www.reddit.com/r/Archiveteam
https://www.reddit.com/r/DataHoarder
https://www.reddit.com/r/DataHoarder/wiki/index/ - Hardware and software for data hoarding FAQ
https://www.reddit.com/r/lostmedia
https://www.reddit.com/r/GamePreservationists
https://www.reddit.com/r/torrents
https://www.reddit.com/r/qBittorrent
https://annas-archive.se/torrents
>>>/t/

What are you archiving or want to archive?
Do you have or know anyone who has some rare interesting data or media not available online?
Anonymous No.105689517
>>105688615 (OP)
BBUMP
Anonymous No.105689581 >>105696421
>>105688615 (OP)
simple command to completely mirror a site:
wget --continue --mirror --execute robots=off --convert-links --wait 1 --random-wait

if you just want to download as fast as possible, remove the "--wait 1" and "--random-wait", as they introduce random intervals to lessen the load on the server you are mirroring.
Anonymous No.105691708
bump
Anonymous No.105692576
have a bump
Anonymous No.105693175 >>105693595
Bump and recommending adding yacy to the OP
Anonymous No.105693595
>>105693175
Will add YaCy and SearXNG.
Anonymous No.105695760
A bump from me as well and a reminder that archives often have browser extensions, useful whether you want to archive or just browse.
Anonymous No.105696421 >>105696898 >>105699698
>>105689581
What if its a big site and I dont want it to get literally everything on the domain, only stuff from a subdomain or from the site I'm on or whatever
How well does this work for js heavy sites?
Anonymous No.105696508 >>105696540 >>105696581
Is there any convenient way to download entire TikTok/Instagram channels? The sites themselves are absolutely awful to browse, but there are some decent tech channels on there.
Anonymous No.105696540
>>105696508
yt-dlp and gallery-dl respectively all you need
IG blocks VPN and bans you after a while anyways its shit avoid but you can download for a bit, if I ever need to open these I don't even open them just download, works well.
Anonymous No.105696581
>>105696508
If you're looking for convenience, just go for the paid shit like "4K Video Downloader". I use that one for it's auto download feature on youtube channels. Otherwise, your other best option is just to get gud with yt-dlp.
Anonymous No.105696898
>>105696421
>How well does this work for js heavy sites?
Depends how heavy, you will probably have to look into Playwright, Puppeteer, Selenium...
Anonymous No.105697794 >>105697821 >>105700892
You have now made this thread 216 times.
Anonymous No.105697821
>>105697794

>>105671707
>Great, that's probably at least a few thousand people who are now more knowledgeable about the tools they have to archive things they care about.
Anonymous No.105698585
bump
Anonymous No.105699168
>>105688615 (OP)
bump
Anonymous No.105699698
>>105696421
check the man pages, with a quick search I could find the following 2 which may or may not be of use to you

-D domain-list
--domains=domain-list
Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H.

-np
--no-parent
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.
Anonymous No.105700892
>>105697794
You are a nigger
Anonymous No.105701818
>>105688615 (OP)
based
Anonymous No.105701979
Bump
Anonymous No.105703094 >>105703839
https://www.publicdomaintorrents.info/

all torrents on this site are in public domain, meaning there's no copyright on them. just thought I'd bump and post a cool link.
Anonymous No.105703579
fags!
Anonymous No.105703839
>>105703094
Thanks. I've been meaning to investigate legal torrenting for a while now. I believe there are a few more sites out there like the one you posted.