← Home ← Back to /g/

Thread 105680498

32 posts 8 images /g/
Anonymous No.105680498 [Report] >>105680754 >>105680819 >>105680823 >>105680829 >>105681482 >>105682966
>website has 100 pages
>cf showing 10,000 unique hits per hour, increasing
these are bots. but why are they bots? what the fuck is going on
Anonymous No.105680598 [Report] >>105680726 >>105680745
Jeets programming botnet scrapers, getting rewarded by jeet managers for scrapes per minute.
Anonymous No.105680726 [Report] >>105681042
>>105680598
>botnet scrapers
what is that
> rewarded by jeet managers for scrapes per minute
i could probably do it better than they can how do i do this for a job
Anonymous No.105680745 [Report]
>>105680598
>scraper
ok. so at 100 pages, my entire website is being scraped 72,000 times over every month? perpetually? are there 72,000 different scrapers all looking at my important website?
Anonymous No.105680754 [Report] >>105680813
>>105680498 (OP)
People's minds have been uploaded to the matrix but most people were robots anyway so now they are more robots than humans.
Anonymous No.105680813 [Report]
>>105680754
there's no universe in which matrix robots care this mouth about my website
Anonymous No.105680819 [Report] >>105680831
>>105680498 (OP)
What are you going to do about it?
Anonymous No.105680823 [Report] >>105680831 >>105680860
>>105680498 (OP)
AI garbage looking for useful content.
Anonymous No.105680829 [Report] >>105680860 >>105680936 >>105681625 >>105682633
>>105680498 (OP)
Your site is being scraped by AI companies to feed their LLMs.
Block them with Anubis: https://github.com/TecharoHQ/anubis
Anonymous No.105680831 [Report] >>105680860
>>105680823
scrapers looking to feed their AI garbage*

fixed.

>>105680819
PoW seems to be the only effective tool in slowing it down, sort of like Xe's Anubis.
Anonymous No.105680856 [Report] >>105682353
why not DDoS the entire web so that everybody puts up annoying captchas so people end up using ChatGPT instead of searching?
Anonymous No.105680860 [Report] >>105680878 >>105680889 >>105680969 >>105681397
>>105680829
>>105680823
>>105680831
>its scrapers!
72,000 times they scrape my whole site every month. i can't understand this philosophy of scraping im sorry i don't get it
Anonymous No.105680878 [Report]
>>105680860
literally:
a) chink botnets looking for backdoors (all the /phpMyAdmin like requests)
b) AI companies desperate for training data
c) some faggot doing a research paper for his bachelors' thesis or some other edu institution.
Anonymous No.105680883 [Report]
look son you're doing great just make sure we keep scraping that small website with the erotic pokemon fanfiction 24 hours a day. put more servers on it
Anonymous No.105680889 [Report] >>105680893
>>105680860
They look for updates in the pages.
If there is updated text they will save the new text.
Anonymous No.105680893 [Report] >>105680966 >>105681048
>>105680889
if they did this once a month, there would still be 72,000 of them
Anonymous No.105680936 [Report] >>105680966
>>105680829
>enbyware
>CoCked

At least it's MIT
Anonymous No.105680966 [Report]
>>105680893
Jeet coders. They are not efficient at scraping.
Let's say there are 50 AI companies, they all scrape 1 time a day. If there are 72 000 requests a month:
72000 ÷ 30 = 2400 requests a day.
2400 ÷ 50 = 48.
An average of 48 requests per day per AI company is not that much desu.

Use Anubis, it's not hard to set up. If you use a load balancer deploy it there.

>>105680936
Yeah, the author is a tranny but at least he built something useful.
Anonymous No.105680969 [Report]
>>105680860
it's called being based and making webshits seethe. if you can't deal with the ancient problem of "HIGH INCOMING TRAFFIC" then you don't deserve to host a website.
Anonymous No.105681042 [Report] >>105682885
>>105680726
>what is that
Some scrapers use rented IPs ranges for scraping so they don't get blocked. It's basically a botnet, almost impossible to block.
Anonymous No.105681048 [Report] >>105681058
>>105680893
>if they did this once a month, there would still be 72,000 of them
You underestimate their incompetence.
Anonymous No.105681058 [Report]
>>105681048
incompetence is fine but that's a waste of money
Anonymous No.105681069 [Report] >>105681116
Holy shit enable ads and make some money.
Anonymous No.105681116 [Report]
>>105681069
yea i should do it. i bet the thousands of scraper bots would click on all the ads and make me rich
Anonymous No.105681397 [Report]
>>105680860
it's AI scrapers. they hit up everything they can as frequently as they can to harvest data for LLM training. It gets worse wen your site has "expensive" endpoints (e.g. git blame on the web). They do it a few times a day: https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html
Anonymous No.105681482 [Report]
>>105680498 (OP)
Anonymous No.105681625 [Report] >>105681767
>>105680829
>may inhibit "good bots" like the Internet Arc
ah that won't work in my case unfortunately. need to be friendly to search engines and IA
Anonymous No.105681767 [Report]
>>105681625
I believe Internet Archive is whitelisted by default now via their hardcoded IP addresses.
Anonymous No.105682353 [Report]
>>105680856
yea
Anonymous No.105682633 [Report] >>105682885
>>105680829
lmao another snake oil "solution" that can be circumvented with few lines of code?
Anonymous No.105682885 [Report]
>>105681042
You can block ASN IP blocks who belong to cheap VPS services, the lists are publicly available just be sure to update them from time to time, or set a cron job, pull the list from an API and update the ranges automatically.

>>105682633
Seems to be working.
Git forge instances and wiki instances are the most affected and they all praise how well Anubis is protecting their services.
https://anubis.techaro.lol/docs/user/known-instances
Anonymous No.105682966 [Report]
>>105680498 (OP)
240k hits a day?
Tf are you running, userbenchmark?
Just get adsense and get yourself a threadripper