Thread 105680498 - /g/ [Archived: 841 hours ago]

Anonymous
6/23/2025, 3:42:41 PM No.105680498
1734312052510262
1734312052510262
md5: 44443d8f60736989c9915ec344ff4e49๐Ÿ”
>website has 100 pages
>cf showing 10,000 unique hits per hour, increasing
these are bots. but why are they bots? what the fuck is going on
Replies: >>105680754 >>105680819 >>105680823 >>105680829 >>105681482 >>105682966
Anonymous
6/23/2025, 3:56:15 PM No.105680598
Jeets programming botnet scrapers, getting rewarded by jeet managers for scrapes per minute.
Replies: >>105680726 >>105680745
Anonymous
6/23/2025, 4:15:09 PM No.105680726
>>105680598
>botnet scrapers
what is that
> rewarded by jeet managers for scrapes per minute
i could probably do it better than they can how do i do this for a job
Replies: >>105681042
Anonymous
6/23/2025, 4:17:30 PM No.105680745
>>105680598
>scraper
ok. so at 100 pages, my entire website is being scraped 72,000 times over every month? perpetually? are there 72,000 different scrapers all looking at my important website?
Anonymous
6/23/2025, 4:18:29 PM No.105680754
>>105680498 (OP)
People's minds have been uploaded to the matrix but most people were robots anyway so now they are more robots than humans.
Replies: >>105680813
Anonymous
6/23/2025, 4:27:20 PM No.105680813
>>105680754
there's no universe in which matrix robots care this mouth about my website
Anonymous
6/23/2025, 4:28:12 PM No.105680819
>>105680498 (OP)
What are you going to do about it?
Replies: >>105680831
Anonymous
6/23/2025, 4:28:41 PM No.105680823
>>105680498 (OP)
AI garbage looking for useful content.
Replies: >>105680831 >>105680860
Anonymous
6/23/2025, 4:29:39 PM No.105680829
>>105680498 (OP)
Your site is being scraped by AI companies to feed their LLMs.
Block them with Anubis: https://github.com/TecharoHQ/anubis
Replies: >>105680860 >>105680936 >>105681625 >>105682633
Anonymous
6/23/2025, 4:29:47 PM No.105680831
>>105680823
scrapers looking to feed their AI garbage*

fixed.

>>105680819
PoW seems to be the only effective tool in slowing it down, sort of like Xe's Anubis.
Replies: >>105680860
Anonymous
6/23/2025, 4:33:06 PM No.105680856
internet coffee phone meme
internet coffee phone meme
md5: b0dc9555f4afbcf262cc171357e22ccb๐Ÿ”
why not DDoS the entire web so that everybody puts up annoying captchas so people end up using ChatGPT instead of searching?
Replies: >>105682353
Anonymous
6/23/2025, 4:33:33 PM No.105680860
>>105680829
>>105680823
>>105680831
>its scrapers!
72,000 times they scrape my whole site every month. i can't understand this philosophy of scraping im sorry i don't get it
Replies: >>105680878 >>105680889 >>105680969 >>105681397
Anonymous
6/23/2025, 4:35:09 PM No.105680878
>>105680860
literally:
a) chink botnets looking for backdoors (all the /phpMyAdmin like requests)
b) AI companies desperate for training data
c) some faggot doing a research paper for his bachelors' thesis or some other edu institution.
Anonymous
6/23/2025, 4:35:54 PM No.105680883
look son you're doing great just make sure we keep scraping that small website with the erotic pokemon fanfiction 24 hours a day. put more servers on it
Anonymous
6/23/2025, 4:36:33 PM No.105680889
>>105680860
They look for updates in the pages.
If there is updated text they will save the new text.
Replies: >>105680893
Anonymous
6/23/2025, 4:37:02 PM No.105680893
>>105680889
if they did this once a month, there would still be 72,000 of them
Replies: >>105680966 >>105681048
Anonymous
6/23/2025, 4:43:23 PM No.105680936
>>105680829
>enbyware
>CoCked

At least it's MIT
Replies: >>105680966
Anonymous
6/23/2025, 4:47:34 PM No.105680966
>>105680893
Jeet coders. They are not efficient at scraping.
Let's say there are 50 AI companies, they all scrape 1 time a day. If there are 72 000 requests a month:
72000 รท 30 = 2400 requests a day.
2400 รท 50 = 48.
An average of 48 requests per day per AI company is not that much desu.

Use Anubis, it's not hard to set up. If you use a load balancer deploy it there.

>>105680936
Yeah, the author is a tranny but at least he built something useful.
Anonymous
6/23/2025, 4:47:52 PM No.105680969
>>105680860
it's called being based and making webshits seethe. if you can't deal with the ancient problem of "HIGH INCOMING TRAFFIC" then you don't deserve to host a website.
Anonymous
6/23/2025, 4:58:34 PM No.105681042
>>105680726
>what is that
Some scrapers use rented IPs ranges for scraping so they don't get blocked. It's basically a botnet, almost impossible to block.
Replies: >>105682885
Anonymous
6/23/2025, 4:59:36 PM No.105681048
>>105680893
>if they did this once a month, there would still be 72,000 of them
You underestimate their incompetence.
Replies: >>105681058
Anonymous
6/23/2025, 5:00:34 PM No.105681058
>>105681048
incompetence is fine but that's a waste of money
Anonymous
6/23/2025, 5:02:15 PM No.105681069
Holy shit enable ads and make some money.
Replies: >>105681116
Anonymous
6/23/2025, 5:09:00 PM No.105681116
>>105681069
yea i should do it. i bet the thousands of scraper bots would click on all the ads and make me rich
Anonymous
6/23/2025, 5:45:01 PM No.105681397
>>105680860
it's AI scrapers. they hit up everything they can as frequently as they can to harvest data for LLM training. It gets worse wen your site has "expensive" endpoints (e.g. git blame on the web). They do it a few times a day: https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html
Anonymous
6/23/2025, 5:56:57 PM No.105681482
1734733429149168
1734733429149168
md5: ce7a2f0b606e1070a66ecfeb55bdca4e๐Ÿ”
>>105680498 (OP)
Anonymous
6/23/2025, 6:16:19 PM No.105681625
>>105680829
>may inhibit "good bots" like the Internet Arc
ah that won't work in my case unfortunately. need to be friendly to search engines and IA
Replies: >>105681767
Anonymous
6/23/2025, 6:31:28 PM No.105681767
>>105681625
I believe Internet Archive is whitelisted by default now via their hardcoded IP addresses.
Anonymous
6/23/2025, 7:51:06 PM No.105682353
1750701065375
1750701065375
md5: f55c4110b5cc73a3647479c768a51327๐Ÿ”
>>105680856
yea
Anonymous
6/23/2025, 8:24:07 PM No.105682633
>>105680829
lmao another snake oil "solution" that can be circumvented with few lines of code?
Replies: >>105682885
Anonymous
6/23/2025, 8:54:03 PM No.105682885
>>105681042
You can block ASN IP blocks who belong to cheap VPS services, the lists are publicly available just be sure to update them from time to time, or set a cron job, pull the list from an API and update the ranges automatically.

>>105682633
Seems to be working.
Git forge instances and wiki instances are the most affected and they all praise how well Anubis is protecting their services.
https://anubis.techaro.lol/docs/user/known-instances
Anonymous
6/23/2025, 9:00:41 PM No.105682966
>>105680498 (OP)
240k hits a day?
Tf are you running, userbenchmark?
Just get adsense and get yourself a threadripper