Thread 105846425 - /g/ [Archived: 411 hours ago]

Anonymous
7/9/2025, 11:33:52 AM No.105846425
dothis
dothis
md5: 69f84b07cadc8778a20baa43d35f7559🔍
why do redditors get mad when you parse html with regex?
Replies: >>105846715 >>105846802 >>105847260
Anonymous
7/9/2025, 11:37:05 AM No.105846439
Wrong tool for the job.
Replies: >>105846510
Anonymous
7/9/2025, 11:40:27 AM No.105846462
Because they think you're always supposed to write generic code that can be reused anywhere, instead of just doing the specific thing you want to do
Replies: >>105846586
Anonymous
7/9/2025, 11:41:31 AM No.105846470
I've seen this exact same pepe image on /adv/, /o/, and on /g/ twice now. and that's just in a span of two weeks.
Replies: >>105846505
Anonymous
7/9/2025, 11:47:19 AM No.105846505
654657653656764
654657653656764
md5: dc0da7c829d3f08aec16d744d1054a0c🔍
>>105846470
repeat after me:
- this is not the image i have seen before
Anonymous
7/9/2025, 11:47:33 AM No.105846510
>>105846439
What's the correct tool, anon?
Replies: >>105846529 >>105846571 >>105846611 >>105846643 >>105846669 >>105847131 >>105847788
Anonymous
7/9/2025, 11:51:44 AM No.105846529
>>105846510
a 20mb html parsing library
Anonymous
7/9/2025, 11:58:11 AM No.105846561
I still don't understand why you're not supposed to make up your own tags
a) i can't keep track of all the shit they introduced since "semantic html"
b) browsers have different defaults and styles for official tags so you're fighting an uphill battle
c) style and content are supposed to be separate, behavior should be too
Replies: >>105846577
Anonymous
7/9/2025, 11:59:48 AM No.105846571
>>105846510
a giant if-else statement
Anonymous
7/9/2025, 12:01:15 PM No.105846577
>>105846561
I've never heard this advice before, but I imagine the reasoning is that if the tag is ever used in the future, it will retroactively alter the behavior of your website
Anonymous
7/9/2025, 12:02:20 PM No.105846586
1000010534
1000010534
md5: d4275aa1d9c45d0facfbc582febcc5f2🔍
>>105846462
many such cases
Anonymous
7/9/2025, 12:05:13 PM No.105846611
>>105846510
Any actual HTML parsing library. It doesn't need to be a full featured one that would weigh several megabytes as the other anon suggested.

The problem with using regular expressions for HTML is that HTML is not a regular language, it's context-free. Regular expressions cannot, for instance, tell where closing tags are meant to apply to, nor if close tags are evenly matched. Your ability to parse within a given HTML document is quite limited with regular expressions.
Replies: >>105846674 >>105847720
Anonymous
7/9/2025, 12:09:56 PM No.105846643
1723451734746451
1723451734746451
md5: 6adb6748ef616cf17412f945604bbc16🔍
>>105846510
beautifulsoup
Replies: >>105847556
Anonymous
7/9/2025, 12:13:57 PM No.105846669
>>105846510
something that takes more processing power because... because you just le have to, okay?????
Anonymous
7/9/2025, 12:14:43 PM No.105846674
>>105846611
>bro trust me, you HAVE to import an entire html parsing library to locate the text within <span class="nigger"></span> and nothing else
>you cannot just regex match that exact text because... well you just can't, okay?
Anonymous
7/9/2025, 12:21:44 PM No.105846715
>>105846425 (OP)
Because the same HTML formatted differently will fail to get selected
I’ve started to format some of my basic no-build-step sites with Prettier and I’m thinking I could get away with it for common elements that need search-and-replace because the form of the HTML is locked in
Replies: >>105846788
Anonymous
7/9/2025, 12:31:37 PM No.105846788
>>105846715
if a website changed something that causes your regex to break, then that is good because it means you can go and verify that the information is still correct. your parsing library continuing to work would not alert you to go and make sure it's actually getting the right data. who knows, maybe the website developer wanted to switch attribute values without changing the names
Anonymous
7/9/2025, 12:35:11 PM No.105846802
>>105846425 (OP)
because regex is retarded and if I ever have to work on a codebase thats reliant on regex I will kill myself and the person who wrote it
Anonymous
7/9/2025, 1:10:31 PM No.105847003
redditors are the ones using regex though
Anonymous
7/9/2025, 1:33:29 PM No.105847131
>>105846510
a web browser, it will parse html into a beautiful web page as a feast for your eyes
Anonymous
7/9/2025, 1:55:33 PM No.105847260
>>105846425 (OP)
You can do some HTML queries with regex.
Anonymous
7/9/2025, 2:40:00 PM No.105847556
:)
:)
md5: 36049b4d68b691812fb4d2acb08964c2🔍
>>105846643
There're various powershell packages that include html parsers so no python needed
Anonymous
7/9/2025, 3:04:34 PM No.105847720
741557ab-3ebc-422a-abb9-4e720bc91e0c
741557ab-3ebc-422a-abb9-4e720bc91e0c
md5: 8c0700bb9a5cabde14b00dc49bda300b🔍
>>105846611
the kind of subexpression that I'm looking for is context-free THOUGH
Anonymous
7/9/2025, 3:10:53 PM No.105847766
It's a dangerous way to do it, easy to introduce unexpected bugs. Since HTML is a tree structure you should be traversing it as such not randomly grabbing strings.
Replies: >>105849068
Anonymous
7/9/2025, 3:14:12 PM No.105847788
>>105846510
AI, of course
Anonymous
7/9/2025, 5:28:40 PM No.105848797
1748094567317653
1748094567317653
md5: 6affa945ef70e4ca548b5f0821ee03c4🔍
>2025
>websites literally transmit raw html rather than compiling it into some kind of bytecode language
YOU CAN'T MAKE THIS SHIT UP
Replies: >>105849090
Anonymous
7/9/2025, 5:57:25 PM No.105849068
>>105847766
>Source: my butthole (peer reviewed)
Anonymous
7/9/2025, 5:59:52 PM No.105849090
>>105848797
It can be compressed with gzip doe
Replies: >>105849120
Anonymous
7/9/2025, 6:02:57 PM No.105849120
>>105849090
Bytecode isn't a compression scheme, newfren.