Thread 105846425 - /g/ [Archived: 411 hours ago]

Anonymous

7/9/2025, 11:33:52 AM No.105846425

dothis

md5: 69f84b07cadc8778a20baa43d35f7559🔍

why do redditors get mad when you parse html with regex?

Replies: >>105846715 >>105846802 >>105847260

Anonymous

7/9/2025, 11:37:05 AM No.105846439

Wrong tool for the job.

Replies: >>105846510

Anonymous

7/9/2025, 11:40:27 AM No.105846462

Because they think you're always supposed to write generic code that can be reused anywhere, instead of just doing the specific thing you want to do

Replies: >>105846586

Anonymous

7/9/2025, 11:41:31 AM No.105846470

I've seen this exact same pepe image on /adv/, /o/, and on /g/ twice now. and that's just in a span of two weeks.

Replies: >>105846505

Anonymous

7/9/2025, 11:47:19 AM No.105846505

654657653656764

md5: dc0da7c829d3f08aec16d744d1054a0c🔍

>>105846470
repeat after me:
- this is not the image i have seen before

Anonymous

7/9/2025, 11:47:33 AM No.105846510

>>105846439
What's the correct tool, anon?

Replies: >>105846529 >>105846571 >>105846611 >>105846643 >>105846669 >>105847131 >>105847788

Anonymous

7/9/2025, 11:51:44 AM No.105846529

>>105846510
a 20mb html parsing library

Anonymous

7/9/2025, 11:58:11 AM No.105846561

I still don't understand why you're not supposed to make up your own tags
a) i can't keep track of all the shit they introduced since "semantic html"
b) browsers have different defaults and styles for official tags so you're fighting an uphill battle
c) style and content are supposed to be separate, behavior should be too

Replies: >>105846577

Anonymous

7/9/2025, 11:59:48 AM No.105846571

>>105846510
a giant if-else statement

Anonymous

7/9/2025, 12:01:15 PM No.105846577

>>105846561
I've never heard this advice before, but I imagine the reasoning is that if the tag is ever used in the future, it will retroactively alter the behavior of your website

Anonymous

7/9/2025, 12:02:20 PM No.105846586

1000010534

md5: d4275aa1d9c45d0facfbc582febcc5f2🔍

>>105846462
many such cases

Anonymous

7/9/2025, 12:05:13 PM No.105846611

>>105846510
Any actual HTML parsing library. It doesn't need to be a full featured one that would weigh several megabytes as the other anon suggested.

The problem with using regular expressions for HTML is that HTML is not a regular language, it's context-free. Regular expressions cannot, for instance, tell where closing tags are meant to apply to, nor if close tags are evenly matched. Your ability to parse within a given HTML document is quite limited with regular expressions.

Replies: >>105846674 >>105847720

Anonymous

7/9/2025, 12:09:56 PM No.105846643

1723451734746451

md5: 6adb6748ef616cf17412f945604bbc16🔍

>>105846510
beautifulsoup

Replies: >>105847556

Anonymous

7/9/2025, 12:13:57 PM No.105846669

>>105846510
something that takes more processing power because... because you just le have to, okay?????

Anonymous

7/9/2025, 12:14:43 PM No.105846674

>>105846611
>bro trust me, you HAVE to import an entire html parsing library to locate the text within <span class="nigger"></span> and nothing else
>you cannot just regex match that exact text because... well you just can't, okay?

Anonymous

7/9/2025, 12:21:44 PM No.105846715

>>105846425 (OP)
Because the same HTML formatted differently will fail to get selected
I’ve started to format some of my basic no-build-step sites with Prettier and I’m thinking I could get away with it for common elements that need search-and-replace because the form of the HTML is locked in

Replies: >>105846788

Anonymous

7/9/2025, 12:31:37 PM No.105846788

>>105846715
if a website changed something that causes your regex to break, then that is good because it means you can go and verify that the information is still correct. your parsing library continuing to work would not alert you to go and make sure it's actually getting the right data. who knows, maybe the website developer wanted to switch attribute values without changing the names

Anonymous

7/9/2025, 12:35:11 PM No.105846802

>>105846425 (OP)
because regex is retarded and if I ever have to work on a codebase thats reliant on regex I will kill myself and the person who wrote it

Anonymous

7/9/2025, 1:10:31 PM No.105847003

redditors are the ones using regex though

Anonymous

7/9/2025, 1:33:29 PM No.105847131

>>105846510
a web browser, it will parse html into a beautiful web page as a feast for your eyes

Anonymous

7/9/2025, 1:55:33 PM No.105847260

>>105846425 (OP)
You can do some HTML queries with regex.

Anonymous

7/9/2025, 2:40:00 PM No.105847556

md5: 36049b4d68b691812fb4d2acb08964c2🔍

>>105846643
There're various powershell packages that include html parsers so no python needed

Anonymous

7/9/2025, 3:04:34 PM No.105847720

741557ab-3ebc-422a-abb9-4e720bc91e0c

md5: 8c0700bb9a5cabde14b00dc49bda300b🔍

>>105846611
the kind of subexpression that I'm looking for is context-free THOUGH

Anonymous

7/9/2025, 3:10:53 PM No.105847766

It's a dangerous way to do it, easy to introduce unexpected bugs. Since HTML is a tree structure you should be traversing it as such not randomly grabbing strings.

Replies: >>105849068

Anonymous

7/9/2025, 3:14:12 PM No.105847788

>>105846510
AI, of course

Anonymous

7/9/2025, 5:28:40 PM No.105848797