← Home ← Back to /g/

Thread 106899677

31 posts 24 images /g/
Anonymous No.106899677 [Report] >>106899788 >>106899886 >>106900276 >>106900573 >>106902081 >>106902485 >>106903773 >>106903805 >>106903815
Leftpad considered complicated
The absolute state of computer science.

https://lukeplant.me.uk/blog/posts/breaking-provably-correct-leftpad/
Anonymous No.106899788 [Report] >>106899832 >>106903887
>>106899677 (OP)
kekd
otherwise, what does that have to do with computer sciences?
its just bad implementations
Anonymous No.106899832 [Report]
>>106899788
>He doesn't code in hieroglyphs
ngmi
Anonymous No.106899886 [Report] >>106899960 >>106903424
>>106899677 (OP)
>java fails
in java its just:
private static String LeftPad(String topad, char pad) {
int charstoadd = 10 - topad.length();
char[] padding = new char[charstoadd];
Arrays.fill(padding, pad);
return new String(padding) + topad;
}
Anonymous No.106899911 [Report]
>Entry 6 is not a mistake, by the way, it just does “e acute” in a different way to entry 5. Nothing to see here, move along…
Since I know nobody will read the article to find out why 'résumé' is in there twice.
(not OP desu)
Anonymous No.106899916 [Report] >>106899960
>Human writing is incredibly complicated and things cannot always be easily divided into characters
>This is somehow the fault of computer science
The solution is to ignore Unicode and do everything in ASCII. Why waste effort supporting people who don't speak English?
Anonymous No.106899960 [Report] >>106899980
>>106899886
This was the implementation used. It even has obnoxiously pedantic over-commenting so I assume this is the official implementation

https://github.com/hwayne/lets-prove-leftpad/blob/ea9c0f09a2d3e981d82118497c307844fc7b1f49/java/LeftPad.java#L7

public class LeftPad {
//@ requires n >= 0;
//@ requires s != null;
//@ ensures \result.length == Math.max(n, s.length);
//@ ensures \forall int i; i >= 0 && i < Math.max(n - s.length, 0); \result[i] == c;
//@ ensures \forall int i; i >= 0 && i < s.length; \result[Math.max(n - s.length, 0) + i] == s[i];
static char[] leftPad(char c, int n, char[] s) {
int pad = Math.max(n - s.length, 0);
char[] v = new char[pad + s.length];
int i = 0;

//@ maintaining i >= 0 && i <= pad;
//@ maintaining \forall int j; j >= 0 && j < i; v[j] == c;
for(; i<pad; i++) v[i] = c;

//@ maintaining i >= pad;
//@ maintaining \forall int j; j >= 0 && j < pad; v[j] == c;
//@ maintaining \forall int j; j >= pad && j < i; v[j] == s[j - pad];
for(i = pad; i < v.length; i++) v[i] = s[i - pad];

return v;
}
}


>>106899916
The number of characters in a string is 'complicated' and 'nuanced' in unicode? Too tough for computer scientists and rust programmers to crack?
Anonymous No.106899980 [Report]
>>106899960
>notes down the real test data
>zero width chars and UTF codepoints
"working as intended"
Anonymous No.106900149 [Report] >>106900214
made it pass the UTF codepoints, as ive dealt with them in java before with minecraft chat color plugin bs, still fails the zero width test cuz that would end up being a giant switch sttatment of every zero width char
>will "fail" large chars such as \u+FDFD
Anonymous No.106900214 [Report]
>>106900149
what if the pad character is whacky unicode, like just a combining mark or some supersize double byte thing
Anonymous No.106900276 [Report]
>>106899677 (OP)
>Rust
>I vibe-coded some shit ass implementation
>because Rust doesn't provide such a function for the exact reasons of my article
>This is somehow Rust's fault
Anonymous No.106900573 [Report] >>106903469
>>106899677 (OP)
>his toy language needs to download 500 packages to determine the width of a string
kek
leftpad() {
local i
printf '\e[?1049h%s\e[6n' "$1"
IFS=';' read -d R _ i
printf '\e[?1049l'
for ((; i <= $2; i++)); do
printf -- -
done
printf '%s\n' "$1"
}

leftpad 𝄞 10
leftpad Å 10
leftpad 10
leftpad אֳֽ֑ 10
leftpad résumé 10
leftpad résumé 10
Anonymous No.106902081 [Report] >>106903516
>>106899677 (OP)
>swift just destroys the competition
based apple
Anonymous No.106902485 [Report] >>106903514
>>106899677 (OP)
nigga in python this is just

>" " * n + string
Anonymous No.106903424 [Report]
>>106899886
that will fail with surrogate characters as java char can only encode a single UTF-16 character.

that's why in his article java succeeds with 'résumé' because é is in the basic multilingual plane (BMP), while other codepoints require surrogates to encore (making them two chars)
Anonymous No.106903469 [Report] >>106904373
>>106900573
toss in a few wide characters, some wide grapheme cluster and see how that world or some zero-width codepoint
Anonymous No.106903510 [Report]
Usecase for properly left padding stuff?
Anonymous No.106903514 [Report]
>>106902485
Dumbass.
>>> "abc".rjust(10, '-')
'-------abc'
Anonymous No.106903516 [Report] >>106903607
>>106902081
if you read the whole article, you'll find that it has a cost. swift string handling is done not on a byte-basis, nor on a codepoint-basis, but on a grapheme-basis. this imply that whatever application you use the output of your program with, will also perform that same handling of grapheme clusters. that has measurable costs because building grapheme clusters is way more complicated than simply iterating over codepoints.
while this is what should be done for proper unicode support, this isn't always done by all programs.

unicode doesn't tell you how your grapheme will end up being displayed. Some applications have bogus unicode handling and will display non-zero width for stray invisible characters, e.g U+200D. Web browsers are notable for doing that.
some applications reserve larger width for CJK ideograms.
even unicode has 'user-defined' ranges where you can put glyphs of undefined sizes that have to be specified in the font used by the underlying application.

if he had run his swift program with `现` it would have likely failed due to the fact that swift doesn't control how his terminal will render that character, therefore won't be able to know it's visual width at the end.
Anonymous No.106903607 [Report] >>106903680 >>106903880
>>106903516
you're not wrong about the first part, but the padding has nothing to do with rendered width and instead with character count in order to know how many padding characters you need to add to get to the desired size. see pic rel
Anonymous No.106903680 [Report] >>106904115
>>106903607
try padding with a wide character and see for yourself. swift doesn't handle wide characters in the same way as your terminal does. in fact wide characters have nothing to do with the unicode specs and are terminal specific features
Anonymous No.106903773 [Report]
>>106899677 (OP)
I say he doesn't go far enough. The ~5 "tiers" of correctness here would be:
>0: basic ascii
>1: BMP
>2: codepoint
>3: wide characters
>4: combining characters
>5: zwj
his tests skip 3, include 4, and don't make it to 5. the order to do 3 and 4 in is debatable, 3 gets you correct behaviour for east asian languages (and most emoji), 4 gets you semitic languages i guess (and european languages whenever precombined characters aren't used (rarely)). regardless of that wcwidth is enough to get you every tier but 5, which should handle everything except the latest emojis correctly. Anything beyond that is really impossible in practice, because it's unlikely that any given terminal will support it correctly, and even if they do they likely won't display it the same as any other terminal. Hence your output would at best work in a single environment and fail everywhere else.
Anonymous No.106903805 [Report] >>106903898
>>106899677 (OP)
TLDR; he's a retarded faggot. He had ChatGPT
>As I know nothing about Rust, I got ChatGPT to tell me how to convert from a string to that. It gave me two options, I picked the one that looked simpler and less <<angry>>. I didn’t deliberately pick the one which made Rust look even worse than all the others, out of peevish resentment for every time someone has rewritten some Python code (my go-to language) in Rust and made it a million times faster – that’s a ridiculous suggestion.
KEK
Anonymous No.106903815 [Report]
>>106899677 (OP)
>Rust does also have an easily accessible chars method/concept, which corresponds to a Unicode code point. I didn’t use this above - Rust would have behaved the same as Haskell/Lean if I had.
Anonymous No.106903880 [Report] >>106903895
>>106903607
That poster was unaware of things like proportionally spaced fonts, kerning, negative space, and ligatures.
Anonymous No.106903887 [Report] >>106904152
>>106899788
Anonymous No.106903895 [Report]
>>106903880
fuck off nocoder
Anonymous No.106903898 [Report]
>>106903805
Most rust code was written with either chat GPT and/or c2rust transliteration.
Anonymous No.106904115 [Report]
>>106903680
the pad has nothing to do with the rendered width, man
Anonymous No.106904152 [Report]
>>106903887
>python for under 18
lel. accurate.
Anonymous No.106904373 [Report]
>>106903469
like this? lmao