Thread 106115204 - /g/ [Archived: 251 hours ago]

Anonymous
8/2/2025, 1:42:20 PM No.106115204
1749649864803088
1749649864803088
md5: c21306024209878960f3d4357c1b7b48🔍
now that the internet is getting locked up, what's the best way to grab a pristine offline copy of a whole website?
Replies: >>106115227 >>106115273 >>106115312 >>106115711 >>106115774 >>106115978 >>106116060
Anonymous
8/2/2025, 1:45:53 PM No.106115227
>>106115204 (OP)
Wget is in my experience easier to use than httrack and does the job much better without any unnecessary fuss.
wget -r –page-requisites –html-extension –convert-links example.com

and voila you've got a copy
Replies: >>106115580 >>106116076
Anonymous
8/2/2025, 1:50:49 PM No.106115273
>>106115204 (OP)
You just answered your own question, but you'll need to alter the defaults for a more encompassing approach.
Anonymous
8/2/2025, 1:56:40 PM No.106115312
>>106115204 (OP)
there is a download button for the entire internet. But I forgot where it is.
Anonymous
8/2/2025, 2:03:06 PM No.106115363
wget -r https://*
Replies: >>106115580
Anonymous
8/2/2025, 2:29:27 PM No.106115580
>>106115227
>>106115363
That doesn't work with a lot of "modern" websites. I use pywb, and it's pretty good, although there are bugs that never get fixed even though people submitted PRs.
Replies: >>106115698 >>106115807
Anonymous
8/2/2025, 2:45:59 PM No.106115698
>>106115580
ntas but i've been looking an alternative for httrack as it also does not work for javascript heavy sites, looks like what i've been looking for this whole time
Replies: >>106115807
Anonymous
8/2/2025, 2:48:04 PM No.106115711
>>106115204 (OP)
A lot of good websites implement anti-scrapping measures. Including anti-wget.
Just read a book.
Fuck the governments.
Anonymous
8/2/2025, 2:57:02 PM No.106115774
>>106115204 (OP)
most modern websites use javascript and talk to the server db all the time you can't run them offline
Anonymous
8/2/2025, 3:02:20 PM No.106115807
>>106115698
>>106115580
had issues with some simple cosmetic js too with httrack
Anonymous
8/2/2025, 3:24:38 PM No.106115978
>>106115204 (OP)
man, the ancient tools for offline website. rare to see it here. lmao
Replies: >>106117693
Anonymous
8/2/2025, 3:35:38 PM No.106116060
>>106115204 (OP)
Cyotek WebCopy is what you want
Anonymous
8/2/2025, 3:37:49 PM No.106116076
>>106115227
lol retardo you're supposed to use mirror, also the command can be shortened to "wget -mkEp"
have to say httrack gui is better, it grabs stuff that wget can't
Anonymous
8/2/2025, 4:54:25 PM No.106116792
im kinda interested in Squid Cache proxy + SSL bumping. anyone ever tried it? how does the cache works? are they stored in a 'opaque' database or can you just open the cache and browse it like normal files?
Anonymous
8/2/2025, 6:34:26 PM No.106117693
>>106115978
>ancient
considering how old it is it still does the job fine for some sites