← Home ← Back to /g/

Thread 105846425

29 posts 14 images /g/
Anonymous No.105846425 [Report] >>105846715 >>105846802 >>105847260
why do redditors get mad when you parse html with regex?
Anonymous No.105846439 [Report] >>105846510
Wrong tool for the job.
Anonymous No.105846462 [Report] >>105846586
Because they think you're always supposed to write generic code that can be reused anywhere, instead of just doing the specific thing you want to do
Anonymous No.105846470 [Report] >>105846505
I've seen this exact same pepe image on /adv/, /o/, and on /g/ twice now. and that's just in a span of two weeks.
Anonymous No.105846505 [Report]
>>105846470
repeat after me:
- this is not the image i have seen before
Anonymous No.105846510 [Report] >>105846529 >>105846571 >>105846611 >>105846643 >>105846669 >>105847131 >>105847788
>>105846439
What's the correct tool, anon?
Anonymous No.105846529 [Report]
>>105846510
a 20mb html parsing library
Anonymous No.105846561 [Report] >>105846577
I still don't understand why you're not supposed to make up your own tags
a) i can't keep track of all the shit they introduced since "semantic html"
b) browsers have different defaults and styles for official tags so you're fighting an uphill battle
c) style and content are supposed to be separate, behavior should be too
Anonymous No.105846571 [Report]
>>105846510
a giant if-else statement
Anonymous No.105846577 [Report]
>>105846561
I've never heard this advice before, but I imagine the reasoning is that if the tag is ever used in the future, it will retroactively alter the behavior of your website
Anonymous No.105846586 [Report]
>>105846462
many such cases
Anonymous No.105846611 [Report] >>105846674 >>105847720
>>105846510
Any actual HTML parsing library. It doesn't need to be a full featured one that would weigh several megabytes as the other anon suggested.

The problem with using regular expressions for HTML is that HTML is not a regular language, it's context-free. Regular expressions cannot, for instance, tell where closing tags are meant to apply to, nor if close tags are evenly matched. Your ability to parse within a given HTML document is quite limited with regular expressions.
Anonymous No.105846643 [Report] >>105847556
>>105846510
beautifulsoup
Anonymous No.105846669 [Report]
>>105846510
something that takes more processing power because... because you just le have to, okay?????
Anonymous No.105846674 [Report]
>>105846611
>bro trust me, you HAVE to import an entire html parsing library to locate the text within <span class="nigger"></span> and nothing else
>you cannot just regex match that exact text because... well you just can't, okay?
Anonymous No.105846715 [Report] >>105846788
>>105846425 (OP)
Because the same HTML formatted differently will fail to get selected
I’ve started to format some of my basic no-build-step sites with Prettier and I’m thinking I could get away with it for common elements that need search-and-replace because the form of the HTML is locked in
Anonymous No.105846788 [Report]
>>105846715
if a website changed something that causes your regex to break, then that is good because it means you can go and verify that the information is still correct. your parsing library continuing to work would not alert you to go and make sure it's actually getting the right data. who knows, maybe the website developer wanted to switch attribute values without changing the names
Anonymous No.105846802 [Report]
>>105846425 (OP)
because regex is retarded and if I ever have to work on a codebase thats reliant on regex I will kill myself and the person who wrote it
Anonymous No.105847003 [Report]
redditors are the ones using regex though
Anonymous No.105847131 [Report]
>>105846510
a web browser, it will parse html into a beautiful web page as a feast for your eyes
Anonymous No.105847260 [Report]
>>105846425 (OP)
You can do some HTML queries with regex.
Anonymous No.105847556 [Report]
>>105846643
There're various powershell packages that include html parsers so no python needed
Anonymous No.105847720 [Report]
>>105846611
the kind of subexpression that I'm looking for is context-free THOUGH
Anonymous No.105847766 [Report] >>105849068
It's a dangerous way to do it, easy to introduce unexpected bugs. Since HTML is a tree structure you should be traversing it as such not randomly grabbing strings.
Anonymous No.105847788 [Report]
>>105846510
AI, of course
Anonymous No.105848797 [Report] >>105849090
>2025
>websites literally transmit raw html rather than compiling it into some kind of bytecode language
YOU CAN'T MAKE THIS SHIT UP
Anonymous No.105849068 [Report]
>>105847766
>Source: my butthole (peer reviewed)
Anonymous No.105849090 [Report] >>105849120
>>105848797
It can be compressed with gzip doe
Anonymous No.105849120 [Report]
>>105849090
Bytecode isn't a compression scheme, newfren.