← Home ← Back to /g/

Thread 105873324

99 posts 38 images /g/
Anonymous No.105873324 >>105873392 >>105873496 >>105873516 >>105873540 >>105874047 >>105875694 >>105876260 >>105879673 >>105883207 >>105893513 >>105893819 >>105899900
>REGEX
>REGEX REGEX
>REGEX REGEX REGEX
>REGEX REGEX REGEX REGEX
>REGEX REGEX REGEX REGEX REGEX
>REGEX REGEX REGEX REGEX REGEX REGEX
>REGEX REGEX REGEX REGEX REGEX REGEX REGEX
>REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX
>REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX
>REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX
Anonymous No.105873348 >>105899869
more like regre(t)x
Anonymous No.105873392 >>105873608 >>105874433 >>105882701 >>105888729 >>105889965 >>105890316 >>105891174 >>105893360
>>105873324 (OP)
whoever made regex is a fucking genius. Everyone uses it at some point but theres probably 5 people who can write regex without google.
Anonymous No.105873478 >>105873563 >>105875686
@echo off
setlocal enabledelayedexpansion

set "myREGEX=%1"
set "REGEX="
set /a count=0

:loop_start
set /a count=!count! + 1

if !count! leq 10 (
if !count! equ 1 (
set "REGEX=!myREGEX!"
) else (
set "REGEX=!REGEX! !myREGEX!"
)
echo !REGEX!
goto :loop_start
)

endlocal
pause
Anonymous No.105873496 >>105873510
>>105873324 (OP)
Anonymous No.105873510
>>105873496
Anonymous No.105873516 >>105889774
>>105873324 (OP)
The one thing AI is based for
Anonymous No.105873540 >>105873563
>>105873324 (OP)
Best thread on /g/ right now
Anonymous No.105873563 >>105888692
>>105873478
>>105873540
Anonymous No.105873567
regex but with unicode
Anonymous No.105873608 >>105873623 >>105877360
>>105873392
It's simple enough to explain each separate thing you can do with it, but once you start combining patterns into a single expression you might as well be writing in binary. Shit is unreadable.
Anonymous No.105873623
>>105873608
Also, combining changes function of previous solo patterns. Soup sandwich comes to mind...
Anonymous No.105873725
The best thing about LLMs is that I'll never have to remember or manually write regex ever again
Anonymous No.105874039 >>105874394 >>105877825
Anonymous No.105874047 >>105874299 >>105874994 >>105877353 >>105877375 >>105895730
>>105873324 (OP)
I love Regex, but unfortunately I found out that most implementations of Regex use the really slow perl implementation
https://swtch.com/~rsc/regexp/regexp1.html
The company I work for (allegedly) have implemented this regex engine into an FPGA to minimize latency, but they won't let me touch it.
I'm surprised I haven't heard of someone implementing this in software, because this paper is old. If I remember right, this doesn't support some more advanced Regex features like capturing(?), but still. find should not take minutes at a time.
Anonymous No.105874299 >>105874333
>>105874047
This is exactly how regexes in go work. If you look at the author of this paper you’ll understand why.
Anonymous No.105874333
>>105874299
based cox
(verification not required)
Anonymous No.105874394 >>105874474
>>105874039
sed s 's///g ; s///'
neat, ty
Anonymous No.105874433 >>105879169
>>105873392
The trick, like everything tedious, is to learn it in small pieces and use it as much as possible, then go back and learn another piece.
Regex puzzles are also fun if you're bored.
Anonymous No.105874474 >>105874523 >>105883240
>>105874394
You can use sed syntax in vim too.
Anonymous No.105874523
>>105874474
I was equally humbled the other day regarding use of the grep command.
Anonymous No.105874994
>>105874047
>The james bond submarine-car is 100000000000000000% faster in water than a regular car, therefore all non-submarine cars are inferior
I read about 30 seconds of that. So its significantly faster at a very very specific kind of regex pattern that you are never going to use in practice?
.
Anonymous No.105875686 >>105888692
>>105873478
move the if !count! equ 1 before the loop for better performance sir
Anonymous No.105875694
>>105873324 (OP)
>you have a problem
>you decide to use regex to solve it
>you have two problems
Anonymous No.105876260
>>105873324 (OP)
fucking frogposters
Anonymous No.105876930 >>105876975
Can't believe I got filtered by this language.
Anonymous No.105876975
>>105876930
It's easy, just play regex puzzles. Tons online
Anonymous No.105877353 >>105877701
>>105874047
you are the dumbest gorilla nigger of this board, everything you say is bullshit
Anonymous No.105877360 >>105882865
>>105873608
>once you start combining patterns into a single expression you might as well be writing in binary.
skill issue
Anonymous No.105877375
>>105874047
>backtracing regex is bad except for this laundry list of extremely useful things it can do but NFAs can't
exactly the kind of sophistry you'd expect from a go dev
Anonymous No.105877701 >>105877758 >>105877758
>>105877353
Give me a banana then faggot because I want to know why I get to enjoy lightning fast Regex at work and when I come home and use perl Regex for anything more than a small text file I have to wait. Commit suicide or educate this gorilla
Anonymous No.105877758 >>105877778 >>105878039
>>105877701
>>105877701
here you go nigger

perl's regex engine is not fast,
making a backtracking regex (regex program/string) fast requires understanding backtracking and parsing at a level that you don't,
NFA-simulating-a-DFA and backtracing regex engines are both called regex engines but are 2 completely different things with different purposes and different semantics (longest match vs user specified ordered match)
Anonymous No.105877778
>>105877758 (me)
>perl's regex engine is not fast,
*but a backtracking engine CAN be crazy fast, for the kind of parsing that it does
Anonymous No.105877825
>>105874039
now do it with awk instead
Anonymous No.105878039 >>105878439 >>105879262
>>105877758
>backreferences
nice to have, but bloat
>backtracking
bloat
Surely it would be better to default to the NFA engine until needing an DFA feature? This may be just how I use regex, attempting to find any specific match as fast as I can, but I don't see why DFA is the default. Things like backreferences are very useful when I need them, but I don't need them that often, and with the speed of the NFA model, greedy matching doesn't seem necessary.
What are the different purposes you mention?
Anonymous No.105878439 >>105878442 >>105892263
>>105878039
>>backreferences
Everyone always cites this but backreferences are irrelevant. Imo they are rarely useful if ever and the extended features that really make perl regexes what they are is rather things like lookarounds and backtracking control (quantifier modifiers, independent patterns, backtracking control verbs), recursive patterns. Next to that there are useful little features like named capturing groups, branch reset patterns, word boundaries, anchors. (the last 2 can require a lookahead, semantically, so I don't see how they can be part of CS(tm) regular expression).

>Surely it would be better to default to the NFA engine until needing an DFA feature?
I think you mean the opposite? NFA is ambiguous, backtracking regexes engine use a NFA (with a single state position) and your blog articles presents an NFA that simulates a DFA useing multiple "concurrent" states. But anyway, I understand your question.

DFA or NFA-simulating-DFA regex engines are good for matching things where there can be overlaps (only use case I can think of right now is matching DNA sequences but there must be other ones) and more importantly, when trying to match an input string against many simple patterns. For example, you have a URL and you try to see if needs to be filtered against a hunderds or thoudsands or domains. For this use case this kind of regex engine really shines and will really outpeform a backtracking engine by a lot, no question. There a lot of use cases like that, packet filtering for example and many others.
Anonymous No.105878442 >>105878450 >>105879388
>>105878439
Backtracking regex engine are good when you are really parsing something, for example a URL, where you would extract the protocol, domain, path, framgent, etc... For this kind of "parsing" purpose, the grammar of the thing you're trying to match is usually LL(1) or LL(k) with a small k.

I claim that a backtracking regex engine where the regex (program) is matching a LL(1)/LL(k) grammar will run in linear time when matching a valid input string. I can't give you a proof but it's not hard to see if you follow the control flow of the matching of a regex like that.

Now, if the input string is not a valid string from the grammar and there is basically a syntax error in the middle, the runtime can be anywhere from linear to exponential depending on how much there is backtracking in the regex. If you removes the unecessary backtracking, the regex engine with linearly backtrack (similarly to how a recursive descent parser would) until the root of regex and fail. But if you didn't, yeah, it can backtrack to hell depending on the input string and on the regex. To me the important thing is that it's possible to write regexes in such a way that you make the runtime linear.

That's the very important part for using a backtring regex engine efficiently: it's not because backtracking regex egines can backtrack to hell (exponentially) that your regex should allow this to happen. Perl/pcre backtracking regex engines allows you to remove unecessary backtracking (with independent patterns (?> ..), quantifier modifiers and backtracknig control verbs) and for the kind of LL(k) parsing tasks ***you don't need*** lots of backtracking, you almost don't need any backtracking in fact.
Anonymous No.105878450
>>105878442
This is why the defaults of backtracking regex engines are all wrong and should be changed. Quantifiers should be possessive by default and optionally backtracking (lazily or greedily) using a quantifier modifier to do so. Same thing with alternation I guess / (A | B | C) D / should really be / ((?> A | B | C)) D / and if backtracking is needed the user shoud explicitly express it.

This brings me to your question:
>Surely it would be better to default to the NFA engine until needing an DFA feature?
If you're matching a LL(k) grammar (with or without recursion however, this doesn't change anything), if the regex engine (that doesn't exist yet) has non backtracking default or you explictly remove the unecessary backtracking, then the "backtracking" NFA one should be always better then the NFA-simulating-DFA because it doesn't have the overhead of the latter one. There will be simply less instructions to execute, less jumps and less conditionals in the "backtracking" NFA. And if the regex engine natively compiles the regex, the machine code should/could be as good if not better than a manually written program in C that does the equivalent matching/parsing.

Meanwhile, the NFA-simulating-DFA would have all the overhead of looping over all the simultaneous states for every input character. This is way more cpu instructions and branches to exeucte and this can't really be compiled to native code afaik, this can only be interpretred with all the overhead this implies.
Anonymous No.105879169
>>105874433
>Regex puzzles
this actually sounds like fun desu
Anonymous No.105879262
>>105878039
>Surely it would be better to default to the NFA engine until needing an DFA feature?
No, it's stupid and needlessly complex. Russ Cox only argued for it in his paper because he accidentally destroyed his own central argument against backtracing regex.
Anonymous No.105879388
>>105878442
>To me the important thing is that it's possible to write regexes in such a way that you make the runtime linear.
*that you can make the runtime linear.
Anonymous No.105879461 >>105879563
Today I think I'll try implementing regex in C, or maybe Zig.
Anonymous No.105879563 >>105879621
>>105879461
Nice, you can do it anon. Bracktracking or the other kind?
Anonymous No.105879621 >>105880230 >>105891476
>>105879563
I'll need to learn the fundamental difference between the two first, but I'm just going to start with the easier stuff (literals, character classes, quantifiers) and build up from there.
Anonymous No.105879673
>>105873324 (OP)
REGEX IS NOT CODING REEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Anonymous No.105880230 >>105880802
>>105879621
I see. Well essentially, for both of them, first you compile the regex string to the bytecode that your regex engine (the switch case in the match function) will recognize. It's a false good idea to use the AST directly, it makes some things a lot more tricky to implement. The same goes for interpreter for general purpose language.

Then the match function takes as argument the input string, optionally the start position for matching, and the regex bytecode.
For the backtracking regex engine, the state you have is the pointer to the input string and the PC (the pointer into the bytecode array) and a stack used for backtracking. It execute bytecode instruction until either it runs into an "accept" instruction, it tries to backtrack but the bracktrack stack is empty, and optionally, it runs into a "reject" instruction. For each assertion, you try to match it against the current character and if the bounds check or assertion fail you backtrack, it it matches you increment the PC and the input string pointer if need be. When encountering the start of an alternation or quantifier, you need to push the current input string pointer and PC on the stack. When backtracking you pop those to restore the state and resume exeuction.
The backtrack stack is where everything happens. Quantifers are more tricky than altnerations and require more state, but you can imagine that you are implementhing them using recursive function and think about how the stack would be affected.
Anonymous No.105880802
>>105880230
Thanks, I wasn't sure where to begin and didn't realise that it would involve bytecode.
Anonymous No.105882701
>>105873392
What do you mean by genius, it's just a DFA interface. Try writing a lexer without regex and whatever you end up using to represent state and state transitions will effectively be a primitive version of regex.
Anonymous No.105882865 >>105882975
>>105877360
But if they can write in binary wouldn't that make them more skilled than someone who only knows RegEx?
Anonymous No.105882973 >>105883115 >>105888856 >>105893043
>regex is unreadable
raku solved this problem
grammar URL
{
regex TOP { ? ? ? ? }
regex SchemeW { }
regex SchemeS { ':' }
regex Scheme { <[a..z]><[a..z 0..9 + . : \-]>* }
regex Hostinfo { '//' ? ? }
regex UserinfoW { }
regex Userinfo { .*[\:.+]? }
regex UserinfoS { '@' }
regex Host { <[\w \. \-]>* }
regex PortW { }
regex PortS { ':' }
regex Port { \d+ }
regex Path { '/'? <[\w \d -] - [#?]>+ }
regex QueryW { }
regex QueryS { '?' }
regex Query { <[\w \d \- =]>* }
regex FragmentW { }
regex FragmentS { '#' }
regex Fragment { .+ }
}
Anonymous No.105882975 >>105883001
>>105882865
except that by definition they don't regex
Anonymous No.105883001
>>105882975
*except that by definition they don't know regex
Anonymous No.105883115
>>105882973
>>regex is unreadable
>raku solved this problem
the syntax is alright, but the semantics and the implementation are both fucked
Anonymous No.105883207
>>105873324 (OP)
>tfw no one ITT realized that OP put a frog in the pic because REGEX is supposed to resemble the calls made by frogs
never change, /g/
Anonymous No.105883240 >>105883999 >>105884417
>>105874474
The syntax are similar but not the same.

I believe it lacks grouping call backs.

Vim also uses \< \> for word delineation, as oppose to \b

Hey /g/ what does this do?

sed -E 's/^\s+//;s/.+/\L\0/;s/(\W|^)(ht|sm|f)tps?:\/\///;s/(\W|^)www.//;s/\/.*//g;s/.+\W((\w+.)+\w+)\W.+/\1/'
Anonymous No.105883999 >>105888422
>>105883240
Looks like some url link cleaner.
You can break it down and study..
sed -E ' s/^\s+//;
s/.+/\L\0/;
s/(\W|^)(ht|sm|f)tps?:\/\///;
s/(\W|^)www.//;
s/\/.*//g;
s/.+\W((\w+.)+\w+)\W.+/\1/'
Anonymous No.105884417 >>105888422 >>105888422
>>105883240
You should change www.//; to www\.//;
Anonymous No.105885503
I like regex threads but I feel like everything has already been said
Anonymous No.105887393
>trying to make an autohotkey script, but for some reason I can't get the result I need by combining SubStr(), StrIn(); tested the logic and it works in MS Excel with MID() and FIND/SEARCH() and it's just not working
>fuck it, I'll just use RegExMatch() even if it's less performant
>"(\S+)" between delimiters works immediately
God I love regex.
Anonymous No.105888422
>>105884417
I mean to respond to this earlier, but was not able to.

You are right. This mistake was actually in several places including in the last portion. I am aware that dot is wildcard, but the way I copied it deleted several escapes.

>>105883999
>>105884417
Nice job btw!
Anonymous No.105888692 >>105888818
>>105875686
Then it doesn't do what I want, which is to mimic OP >>105873563
Anonymous No.105888729
>>105873392
The concept is fine and useful, is just the bloated standard groups filling the implementations with trash beyond their reasonable use case. It's like adding built-in wheels to lego blocks. You're supposed to build more scalable solutions not hacks like look-ahead and capture groups.
Anonymous No.105888818 >>105888850 >>105889416
>>105888692
There are script languages with better syntax than bat
Anonymous No.105888850
>>105888818
>ruby
>using a semicolon
fo shizzle, my nizzle
Anonymous No.105888856 >>105889047
>>105882973
RakuGODS haven't stopped winning since its inception
Anonymous No.105889047
>>105888856
Raku has always sucked balls
Anonymous No.105889104 >>105889298
$ perl -E 'join(" ", ("REGEX") x 10) =~ /\A ((?:REGEX\s*+)+?) (?{ say $1 }) (?!)/x'
REGEX
REGEX REGEX
REGEX REGEX REGEX
REGEX REGEX REGEX REGEX
REGEX REGEX REGEX REGEX REGEX
REGEX REGEX REGEX REGEX REGEX REGEX
REGEX REGEX REGEX REGEX REGEX REGEX REGEX
REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX
REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX
REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX REGEX
Anonymous No.105889298 >>105889895
>>105889104
Noise, i must go back to python
temp = ""
for n in range(10):
print (">","regex " * n)
temp += ("regex " * n + "\n")
print (temp.replace("regex", "Python ")
Anonymous No.105889374 >>105889640
>python print 1 to 10
>prints 1 to 9, doesn't print for 0
Yikes, if you do go back, bring an extra ) with you.
Anonymous No.105889416 >>105889656
>>105888818
I legit forgot Ruby still exists.
Anonymous No.105889640 >>105889647
>>105889374
Python range start with 0
Multiplied by 0 is nil.

Range (1, 10) will still use 0, print 9 values
Range (5, 10) will still use 0, print 4 values
Range (100, 10) will still use 0, print no values
Anonymous No.105889647
>>105889640
>doesn't print for 0
Thanks, anon.
Anonymous No.105889656
>>105889416
webshits raped it and now the language is as dead as lisp outside of rails
Anonymous No.105889774
>>105873516
Exactly. You'd have to be a complete retard to write regex from scratch. Even before AI I've used regex generators.
Anonymous No.105889895 >>105891111
>>105889298
admit it, perl is cooler than python
Anonymous No.105889965 >>105890196
>>105873392
I use it a lot on Excel, well to a lesser extent now that CTRL+E exists, but it's still super useful.
Anonymous No.105890196 >>105890254
>>105889965
Since when does Excel have regex?
Or wait are you talking about the Python library stuff they added?
Anonymous No.105890254
>>105890196
>Since when does Excel have regex?
2024 or so
Anonymous No.105890316
>>105873392
Yes, it's like magnets, downloading mp3 files or acid. Impossible to understand!
Anonymous No.105891111 >>105891145 >>105891475
>>105889895
perl looks like an absolute shit, no wonder it's dead
Anonymous No.105891145
>>105891111
quads of absolute truth

perl is horrific, I've written exactly one script in it and then never again
Anonymous No.105891174
>>105873392
you can learn 99% of problem solving regex in a few minutes
Anonymous No.105891475 >>105892384
>>105891111
you're judging it on the wrong things, it's the semantics that matters
I know it's not suited to make any serious program but I think you can learn a few things programming-wise from it, in particular from the regex engine. take the little piece of code I've written for example, you can't really express it in any other language and it's a damn shame. it's like a completely distinct programming paradigm that doesn't exist anywhere else (except in Prolog) but you can feel the expressive power it has
Anonymous No.105891476
>>105879621
Implement NFA first and then write the algorithm to transform it to DFA.
Anonymous No.105892263
>>105878439
>packet filtering
This is the smoking gun. This is what I do at my job most of the time, so it seems I've got a hammer view of a screw. I was the gorilla all along, I'm not the one who designed the system, just the wagie who uses it. I've got much to learn.
Anonymous No.105892384
>>105891475
What you trolling..
Perl is not only suitable, but would in many cases do better than other languages.

Perl Compatible Regular Expressions (PCRE) has been implemented in a variety of languages.
Anonymous No.105893043
>>105882973
seems quite incomplete -- there's no support for %HH?

here's a regex per rfc vervatim: https://github.com/zhong-j-yu/rekex/blob/main/rekex-example/src/main/java/org/rekex/exmple/regexp/ExampleRegExp_Uri.java
Anonymous No.105893360 >>105897913
>>105873392
just like everything the only people that need to google regex are the ones that use it once a decade. i use it weekly and have no problems
Anonymous No.105893513 >>105893819
>>105873324 (OP)
Hi I do regex at work on bi-daily basis.
It's not that bad desu.
At first it's hard to remember, but you get used to it.
Anonymous No.105893819 >>105897942
>>105873324 (OP)
>>105893513
Is there a use case for learning regex besides impressing the nerds at IT?
Anonymous No.105895730
>>105874047
Look into Intel hyperscan/chimera, it'll solve your throughput regex problems
Anonymous No.105897913
>>105893360
Wow, what are the odds that one of only 5 people who don't need to google regex are here on /g/.
Anonymous No.105897942 >>105898996
>>105893819
syntax highlighting
Anonymous No.105898996
>>105897942
I don't rely on it, and I have never seen a perfect implementation of it, I do enjoy the reduced strain it allows.
Anonymous No.105899834 >>105899875
>they didn't take computational theory class
you guys need to get educated about the shit you spout opinions about
Anonymous No.105899869 >>105899934
>>105873348
regrex
regretx

You got filtered (pun intended)
Anonymous No.105899875
>>105899834
Thanks for the advice, stranger.
Anonymous No.105899900
>>105873324 (OP)
(>REGEX (REGEX )*\n)* || >REGEX
Empty string not included!
Anonymous No.105899934
>>105899869