Thread 106127949 - /g/ [Archived: 416 hours ago]

Anonymous
8/3/2025, 6:39:26 PM No.106127949
selenium
selenium
md5: b3f7dbb38f2b399e2b7ea8b8d82d07f9🔍
anyone here got experience with writing helium/selenium web scraping scripts? my project has several standalone executing scripts that scrape data from different websites. the problem is that now each script has to execute Its own browser instance just to open a website and scrape data from it. what I want to achieve is to open one browser instance on program start and then make my scripts use that one instance instead since this will make the whole thing a lot faster. is there any way to do that? I will be thankful for any sort of advice
Replies: >>106128483 >>106128562 >>106130644 >>106132563
Anonymous
8/3/2025, 6:47:43 PM No.106128017
1673880827986776
1673880827986776
md5: d394a7a6f81347a980d21e3473b94ca7🔍
Which programming language are you using? I think most PL can use the Seleniums scripts use just one browser instance, but wouldn't many browsers instances be faster? Almost like parallel processing...
Anonymous
8/3/2025, 7:38:28 PM No.106128483
>>106127949 (OP)
you can do this in puppeteer.
just get the first tab when browser opens and keep reusing it
Anonymous
8/3/2025, 7:47:23 PM No.106128562
>>106127949 (OP)
Couldn't you just
wget your_site | xmllint --xpath your_stuff ?
Anonymous
8/3/2025, 11:05:41 PM No.106130644
>>106127949 (OP)
maybe I'm late but I have experience with Selenium, it is generally not advised to share a single browser for an automation task because you might run into conflicts provided they run in parallell

if automation jobs run serially then there's no problem, you just have to open the browser with automation extensions and an open port, then tell your script to connect to that port and voila, I can provide you the chrome command line options I use for this
Anonymous
8/4/2025, 2:43:37 AM No.106132563
>>106127949 (OP)
i dont understand how retards use selenium for web scraping. its overkill like you wouldnt believe. just use curl-impersonate if the site has some cancerous ddos protection. if you have to resort to selenium/puppeteer you're doing something wrong
Replies: >>106132584 >>106134177
Anonymous
8/4/2025, 2:45:34 AM No.106132584
>>106132563
clearly you haven't scraped enough to realize some sites require selenium for scraping
Replies: >>106132931
Anonymous
8/4/2025, 3:37:31 AM No.106132931
1751058297492417
1751058297492417
md5: e79e9f7965819da2b4d2983262f4f015🔍
>>106132584
example usecase?
Replies: >>106132940
Anonymous
8/4/2025, 3:38:14 AM No.106132940
>>106132931
gooning
Anonymous
8/4/2025, 7:11:52 AM No.106134177
>>106132563
trvthnvke. encountered my first stubborn website that had some of javascript redirection magic. curl-impersonate bypassed it