Thread 106899677

31 posts 24 images /g/

Anonymous 10/15/2025, 8:35:49 PM No.106899677 [Report] >>106899788 >>106899886 >>106900276 >>106900573 >>106902081 >>106902485 >>106903773 >>106903805 >>106903815

Leftpad considered complicated

leftpad_implementations.png md5: 53d4c6aa...

The absolute state of computer science.

https://lukeplant.me.uk/blog/posts/breaking-provably-correct-leftpad/

Anonymous 10/15/2025, 8:48:29 PM No.106899788 [Report] >>106899832 >>106903887

Screenshot from 2025-10-15 20-45-36.png md5: ea73c398...

>>106899677 (OP)
kekd
otherwise, what does that have to do with computer sciences?
its just bad implementations

Anonymous 10/15/2025, 8:53:13 PM No.106899832 [Report]

>>106899788
>He doesn't code in hieroglyphs
ngmi

Anonymous 10/15/2025, 8:58:58 PM No.106899886 [Report] >>106899960 >>106903424

>>106899677 (OP)
>java fails
in java its just:
private static String LeftPad(String topad, char pad) {
int charstoadd = 10 - topad.length();
char[] padding = new char[charstoadd];
Arrays.fill(padding, pad);
return new String(padding) + topad;
}

Anonymous 10/15/2025, 9:02:04 PM No.106899911 [Report]

>Entry 6 is not a mistake, by the way, it just does “e acute” in a different way to entry 5. Nothing to see here, move along…
Since I know nobody will read the article to find out why 'résumé' is in there twice.
(not OP desu)

Anonymous 10/15/2025, 9:03:37 PM No.106899916 [Report] >>106899960

>Human writing is incredibly complicated and things cannot always be easily divided into characters
>This is somehow the fault of computer science
The solution is to ignore Unicode and do everything in ASCII. Why waste effort supporting people who don't speak English?

Anonymous 10/15/2025, 9:08:03 PM No.106899960 [Report] >>106899980

Duke_(Java_mascot)_waving.svg.png md5: 217421aa...

>>106899886
This was the implementation used. It even has obnoxiously pedantic over-commenting so I assume this is the official implementation

https://github.com/hwayne/lets-prove-leftpad/blob/ea9c0f09a2d3e981d82118497c307844fc7b1f49/java/LeftPad.java#L7

public class LeftPad {
//@ requires n >= 0;
//@ requires s != null;
//@ ensures \result.length == Math.max(n, s.length);
//@ ensures \forall int i; i >= 0 && i < Math.max(n - s.length, 0); \result[i] == c;
//@ ensures \forall int i; i >= 0 && i < s.length; \result[Math.max(n - s.length, 0) + i] == s[i];
static char[] leftPad(char c, int n, char[] s) {
int pad = Math.max(n - s.length, 0);
char[] v = new char[pad + s.length];
int i = 0;

//@ maintaining i >= 0 && i <= pad;
//@ maintaining \forall int j; j >= 0 && j < i; v[j] == c;
for(; i<pad; i++) v[i] = c;

//@ maintaining i >= pad;
//@ maintaining \forall int j; j >= 0 && j < pad; v[j] == c;
//@ maintaining \forall int j; j >= pad && j < i; v[j] == s[j - pad];
for(i = pad; i < v.length; i++) v[i] = s[i - pad];

return v;
}
}

>>106899916
The number of characters in a string is 'complicated' and 'nuanced' in unicode? Too tough for computer scientists and rust programmers to crack?

Anonymous 10/15/2025, 9:10:02 PM No.106899980 [Report]

lp.png md5: 0ea83420...

>>106899960
>notes down the real test data
>zero width chars and UTF codepoints
"working as intended"

Anonymous 10/15/2025, 9:26:45 PM No.106900149 [Report] >>106900214

lputf.png md5: 5bd82ce2...

made it pass the UTF codepoints, as ive dealt with them in java before with minecraft chat color plugin bs, still fails the zero width test cuz that would end up being a giant switch sttatment of every zero width char
>will "fail" large chars such as \u+FDFD

Anonymous 10/15/2025, 9:32:28 PM No.106900214 [Report]

>>106900149
what if the pad character is whacky unicode, like just a combining mark or some supersize double byte thing

Anonymous 10/15/2025, 9:38:55 PM No.106900276 [Report]

>>106899677 (OP)
>Rust
>I vibe-coded some shit ass implementation
>because Rust doesn't provide such a function for the exact reasons of my article
>This is somehow Rust's fault

Anonymous 10/15/2025, 10:11:49 PM No.106900573 [Report] >>106903469

Screenshot_20251015_230804.png md5: a85ebd88...

>>106899677 (OP)
>his toy language needs to download 500 packages to determine the width of a string
kek
leftpad() {
local i
printf '\e[?1049h%s\e[6n' "$1"
IFS=';' read -d R _ i
printf '\e[?1049l'
for ((; i <= $2; i++)); do
printf -- -
done
printf '%s\n' "$1"
}

leftpad 𝄞 10
leftpad Å 10
leftpad 10
leftpad אֳֽ֑ 10
leftpad résumé 10
leftpad résumé 10

Anonymous 10/16/2025, 12:24:58 AM No.106902081 [Report] >>106903516

>>106899677 (OP)
>swift just destroys the competition
based apple

Anonymous 10/16/2025, 1:16:35 AM No.106902485 [Report] >>106903514

>>106899677 (OP)
nigga in python this is just

>" " * n + string

Anonymous 10/16/2025, 2:47:11 AM No.106903424 [Report]

2025-10-16-02:45:06.png md5: 8f0db641...

>>106899886
that will fail with surrogate characters as java char can only encode a single UTF-16 character.

that's why in his article java succeeds with 'résumé' because é is in the basic multilingual plane (BMP), while other codepoints require surrogates to encore (making them two chars)

Anonymous 10/16/2025, 2:53:57 AM No.106903469 [Report] >>106904373

>>106900573
toss in a few wide characters, some wide grapheme cluster and see how that world or some zero-width codepoint

Anonymous 10/16/2025, 3:01:42 AM No.106903510 [Report]

Usecase for properly left padding stuff?

Anonymous 10/16/2025, 3:02:14 AM No.106903514 [Report]

>>106902485
Dumbass.
>>> "abc".rjust(10, '-')
'-------abc'

Anonymous 10/16/2025, 3:02:26 AM No.106903516 [Report] >>106903607

>>106902081
if you read the whole article, you'll find that it has a cost. swift string handling is done not on a byte-basis, nor on a codepoint-basis, but on a grapheme-basis. this imply that whatever application you use the output of your program with, will also perform that same handling of grapheme clusters. that has measurable costs because building grapheme clusters is way more complicated than simply iterating over codepoints.
while this is what should be done for proper unicode support, this isn't always done by all programs.

unicode doesn't tell you how your grapheme will end up being displayed. Some applications have bogus unicode handling and will display non-zero width for stray invisible characters, e.g U+200D. Web browsers are notable for doing that.
some applications reserve larger width for CJK ideograms.
even unicode has 'user-defined' ranges where you can put glyphs of undefined sizes that have to be specified in the font used by the underlying application.

if he had run his swift program with `现` it would have likely failed due to the fact that swift doesn't control how his terminal will render that character, therefore won't be able to know it's visual width at the end.

Anonymous 10/16/2025, 3:21:06 AM No.106903607 [Report] >>106903680 >>106903880

1750193332375717.png md5: 97364d88...

>>106903516
you're not wrong about the first part, but the padding has nothing to do with rendered width and instead with character count in order to know how many padding characters you need to add to get to the desired size. see pic rel

Anonymous 10/16/2025, 3:32:04 AM No.106903680 [Report] >>106904115

>>106903607
try padding with a wide character and see for yourself. swift doesn't handle wide characters in the same way as your terminal does. in fact wide characters have nothing to do with the unicode specs and are terminal specific features

Anonymous 10/16/2025, 3:43:02 AM No.106903773 [Report]

>>106899677 (OP)
I say he doesn't go far enough. The ~5 "tiers" of correctness here would be:
>0: basic ascii
>1: BMP
>2: codepoint
>3: wide characters
>4: combining characters
>5: zwj
his tests skip 3, include 4, and don't make it to 5. the order to do 3 and 4 in is debatable, 3 gets you correct behaviour for east asian languages (and most emoji), 4 gets you semitic languages i guess (and european languages whenever precombined characters aren't used (rarely)). regardless of that wcwidth is enough to get you every tier but 5, which should handle everything except the latest emojis correctly. Anything beyond that is really impossible in practice, because it's unlikely that any given terminal will support it correctly, and even if they do they likely won't display it the same as any other terminal. Hence your output would at best work in a single environment and fail everywhere else.

Anonymous 10/16/2025, 3:47:40 AM No.106903805 [Report] >>106903898

Screenshot_20251015_214547.png md5: c481e3c9...

>>106899677 (OP)
TLDR; he's a retarded faggot. He had ChatGPT
>As I know nothing about Rust, I got ChatGPT to tell me how to convert from a string to that. It gave me two options, I picked the one that looked simpler and less <<angry>>. I didn’t deliberately pick the one which made Rust look even worse than all the others, out of peevish resentment for every time someone has rewritten some Python code (my go-to language) in Rust and made it a million times faster – that’s a ridiculous suggestion.
KEK

Anonymous 10/16/2025, 3:48:42 AM No.106903815 [Report]

>>106899677 (OP)
>Rust does also have an easily accessible chars method/concept, which corresponds to a Unicode code point. I didn’t use this above - Rust would have behaved the same as Haskell/Lean if I had.

Anonymous 10/16/2025, 3:57:15 AM No.106903880 [Report] >>106903895

>>106903607
That poster was unaware of things like proportionally spaced fonts, kerning, negative space, and ligatures.

Anonymous 10/16/2025, 3:57:52 AM No.106903887 [Report] >>106904152

1759081524913764.jpg md5: 7fc14a4c...