← Home ← Back to /g/

Thread 106133377

325 posts 156 images /g/
Anonymous No.106133377 [Report] >>106133533 >>106133661 >>106134205
/ldg/ - Local Diffusion General
Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>106130699

https://rentry.org/ldg-lazy-getting-started-guide

>UI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassic
SD.Next: https://github.com/vladmandic/sdnext
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com
https://tensor.art
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://github.com/Wan-Video
2.2 Guide: https://rentry.org/wan22ldgguide
https://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y

>Chroma
https://huggingface.co/lodestones/Chroma1-Base/tree/main
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Samplers: https://stable-diffusion-art.com/samplers/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
https://rentry.org/ldg-lazy-getting-started-guide#rentry-from-other-boards
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
Anonymous No.106133401 [Report] >>106133777
>no genjam
it's over
Anonymous No.106133410 [Report] >>106133415
why wanvideo video have no preview?
Anonymous No.106133415 [Report]
>>106133410
the org simply wanted us not to have it
Anonymous No.106133428 [Report] >>106133438 >>106133468 >>106133487
I'm completely lost for what's the best and fastest workflows and tricks to use wan 2.2.
Anonymous No.106133438 [Report] >>106133505
>>106133428
just use the all in one and use it like 2.1
Anonymous No.106133468 [Report] >>106133505
>>106133428
for what it's worth i was I2Ving a picture of a woman sitting with her legs spread trying to get her to turn around, and with the rapid AIO workflow I got a a dozen gens of her reverting to the starting position or weird body morphing. with kijai's it got it on the first try but it's kinda bright and washed out
Anonymous No.106133487 [Report] >>106133505
>>106133428
just wait new light lora for 2.2. for now, t2v gens are meh. only i2v worth it
Anonymous No.106133492 [Report] >>106133500
Anonymous No.106133500 [Report]
>>106133492
why post a fuck-up? do you need help or something?
Anonymous No.106133503 [Report] >>106133704 >>106134424
>>106133101
>>106133308
Well I got deformed output when I enabled v-param as well (same seed, nothing else changed).
So does this mean scaled v pred loss is mandatory? Or is it conflicting with stuff like Min SNR gamma, pyramid noise, etc?
I am about the test the first hypothesis now but wanted to ask in case it doesn't result in any fix.
Anonymous No.106133505 [Report] >>106133557
>>106133468
>>106133438
>>106133487
Guess I'll try kijai, hopefully it's not completely obtuse with weird nodes.
I'm mostly i2v anyway.
Anonymous No.106133514 [Report] >>106133528 >>106134197
EVERY DAY UNTIL I DIE
Anonymous No.106133517 [Report] >>106133531
Anonymous No.106133528 [Report]
>>106133514
You know vegans are a minority when you see them constantly telling everyone else how veganism is amazing.
Anonymous No.106133531 [Report] >>106133538
>>106133517
we can't help you if you don't type the issues you're having anon. I get you are struggling to get a good output but you need to show us the nodes so we can fix it
Anonymous No.106133532 [Report] >>106134193
Anonymous No.106133533 [Report]
>>106133377 (OP)
checkd
Anonymous No.106133538 [Report]
>>106133531
that's wanschizo. he actually thinks that's good
Anonymous No.106133550 [Report]
Anonymous No.106133557 [Report] >>106133566
>>106133505
if you like i2v, comfyui wan.2.2 workflow is enough for fun
Anonymous No.106133558 [Report]
why does this hobby attract so many mentally ill schizos?
Anonymous No.106133566 [Report]
>>106133557
Yeah, I'm mostly thrown off by latest speed enhancing lora thing, and the fact wan2.2 is actually 2 models.
Anonymous No.106133578 [Report] >>106133631 >>106133704
>torch.OutOfMemoryError: CUDA out of memory.
I'm getting this message on both Easy Scripts and Kohya_ss while trying to train an Illustrious Lora. Any ideas as to what might be causing it? I have 12GB of VRAM and I feel like this error shouldn't be happening.
Anonymous No.106133609 [Report] >>106133761
Anonymous No.106133631 [Report] >>106133686
>>106133578
It could be so many things it makes my head spin just thinking about it.

The first and most likely issue is that you're training at too high a resolution, rank and or batch size for your card.
After that it could be anything. Could be your torch version, could be your driver, could be anything.
Anonymous No.106133658 [Report] >>106133664
Man the WanVideoWrapper really doesn't like my 12GB VRAMlet ass. I always get an OOM even with GGUFs and block swap set high.
Anonymous No.106133661 [Report] >>106133672 >>106133912
>>106133377 (OP)
anything that can run on AMD hardware?
Anonymous No.106133664 [Report] >>106133867
>>106133658
>really doesn't like my 12GB VRAMlet ass
I don't like your vramlet ass either.
Anonymous No.106133672 [Report]
>>106133661
I'll get you a box of crayons and you can pretend to gen with us.
Anonymous No.106133686 [Report] >>106133700 >>106133704
>>106133631
For once I am thankful for redditors because I just stumbled on a thread that solved the problem. I had to turn on Gradient Checkpointing and Cache text encoder outputs in Kohya and it started training. I spent almost all day on this trying to learn how to train a lora as well as troubleshooting that fucking OOM error message.
Anonymous No.106133700 [Report]
>>106133686
Yeah that would almost certainly be an issue if those weren't turned on.
Anonymous No.106133704 [Report] >>106134424
>>106133503
Well it is still bad.
I hope it doesn't take too long to trial and error what is causing this...
>>106133578
Batch size? Shouldn't be higher than 2 I think.
LR predicting Prodigy uses extra VRAM.
Should enable Xformers and gradient checkpointing probably.
Also 12gb vramlet. I am also a noob trying to figure this out
>>106133686
Oh well glad you figured it out.
Anonymous No.106133710 [Report] >>106133733 >>106133734 >>106133784 >>106139114
I have put every word ever that describes lip movement and talking into negative prompt but she still moves her mouth. I got the boobs smaller but it seems talking is not possible to fix with just prompts.
Anonymous No.106133714 [Report]
i will suck off a chinaman if they release a model that can extend videos with no degradation or color shift
Anonymous No.106133733 [Report] >>106133766
>>106133710
but did you do it in chinese?
Anonymous No.106133734 [Report] >>106133766
>>106133710
If you put closed mouth, muted or any synonym of it? Dunno never used wan but have you tried?
Anonymous No.106133743 [Report] >>106133917 >>106134395
3090 bros i didn't get any boost from moving cuda 12.8 to cuda 12.6...
Anonymous No.106133761 [Report]
>>106133609
Why do you like Asuka? I like Rei because stoic, unemotional women melt my heart. But Asuka's fans make me curious.
Anonymous No.106133766 [Report]
>>106133733
yes I tried chinese translation too
>>106133734
pos: closed mouth, nonverbal, silent, holds her breath
various things like that
neg: talking, speaking, mouth, mouth movement, lips moving, inside mouth, throat, lips, teeth, tongue, open mouth, open smile, mouth animation, moving mouth, gums, screaming, shouting, etc.
The neg does seem to get rid of things like teeth and make the mouth a bit smaller but it never stops a seed that has mouth animation.
Anonymous No.106133777 [Report] >>106133787 >>106133797 >>106133814 >>106134069
GenJam2 is GO.

Album: https://e.pcloud.link/publink/show?code=kZox1EZMxwWS1tRTwhF3jQElno88yLqtv97

>>106133401
Slept in.
Anonymous No.106133780 [Report] >>106133807 >>106134401
>>106133088
made a comparison with new WF, non light is 30 steps total

https://files.catbox.moe/3rxj1k.json
Anonymous No.106133784 [Report]
>>106133710
Did you try shift the focus of atention to sonething else to the AI?
Anonymous No.106133787 [Report]
>>106133777
nice
Anonymous No.106133797 [Report]
>>106133777
Also welcoming any volunteer collage makers.
Anonymous No.106133799 [Report]
Misato best girl
Anonymous No.106133807 [Report] >>106133834 >>106133848
>>106133780
Am I retard if I say I like the light one better?
Anonymous No.106133809 [Report]
Should I spin for GenJam 3 now or wait for euros to wake up? I'm thinking the latter.
Anonymous No.106133813 [Report] >>106133817 >>106133850
where does Wan2.2 save files?
also this line should be changed in the wan_autoinstall.bat file, this is an old version of triton that didn't work with my PyTorch
Anonymous No.106133814 [Report]
>>106133777
Also by the way you can still submit post-deadline. No hard deadlines on any of these; I'll just add it to the album.
Anonymous No.106133817 [Report] >>106133919
>>106133813
nigga that depends on what ui you're using
Anonymous No.106133821 [Report]
oo oo there should be prizes for winning gen jam like amazon gift cards and GPUs
Anonymous No.106133834 [Report]
>>106133807
no?
Anonymous No.106133848 [Report]
>>106133807
On average quick gen distills will always be worse by their very nature but they can occasionally gen better than normal steps by sheer chance.
So no, you are not retarded.
Anonymous No.106133850 [Report] >>106133863 >>106133919
>>106133813
output folder, or "temp" folder if you're using kijai's workflow (save_output should be switched on or you'll lose your gens)
Anonymous No.106133860 [Report] >>106133864 >>106135272
yt women b like
Anonymous No.106133863 [Report]
>>106133850
The gens in my heart are never lost.
Anonymous No.106133864 [Report]
>>106133860
delete this
Anonymous No.106133867 [Report]
>>106133664
Wait until you get a hold of me.
Anonymous No.106133869 [Report] >>106133877 >>106133890
>most character loras on civitai are 200mbs+
>mine is 40mb
Can someone explain? Did they do like 500 images with like 20 repeats and 5 batches or something?
Anonymous No.106133877 [Report] >>106133887
>>106133869
what rank did you train it at? that is all that matters for size, for smaller / not complicated loras a lower rank is ok
Anonymous No.106133887 [Report] >>106133895
>>106133877
rank?

Sorry I'm new to this.
Anonymous No.106133890 [Report]
>>106133869
Rank dictates the size of the LoRA.
Anonymous No.106133895 [Report] >>106133902
>>106133887
Basically how much of the model's layers you actually trained. For most simple stuff rank 32 / 64 is plenty, some people do 128 or even 256 for small gains in quality
Anonymous No.106133902 [Report] >>106133910 >>106133922
>>106133895
This setting?
Anonymous No.106133910 [Report] >>106133935
>>106133902
yes, 8 should be ok if its just something like a single subject, for styles / concepts / multi concepts you would want more
Anonymous No.106133912 [Report]
>>>>106133661
Wanvideo on ComfyUI works on a 7900 XTX based on my testing. You need to install ROCm Pytorch on a Linux distro however, and VAE Decode is super slow unless you run tiled decode on the minimum possible tile size.
Anonymous No.106133917 [Report] >>106134395
>>106133743
3090 on 12.9 here & saw that about 12.6, whats a good baseline test people are using?

14b fp8 848x480 81 frame 8 step vids here are 150 seconds (though it depends on the sampler/scheduler)
Anonymous No.106133919 [Report] >>106133932
>>106133850
>>106133817
OK I found it, for some reason it made a ComfyUI/ComfyUI/output folder
Anonymous No.106133922 [Report] >>106133933
>>106133902
Just be aware, bigger =\= better. For most purposes 32 - 64 is more than enough, 128 if you're sure about it. The higher you go, the more likely you are to just deep fry (furk) your model.
Anonymous No.106133923 [Report] >>106133957
i'm trying kijai's 2.2 workflow with both e4m3fn and e5m2 models on the same seed with my prompt of "woman turns around". the outputs are similar but sometimes significantly different. in one case she turns clockwise on one model and counter-clockwise with the other model. in another case the outputs are very similar but with e4m3fn her butt jiggled more. i don't know if one is necessarily better than the other but i'm leaning towards e4m3fn
Anonymous No.106133927 [Report]
still working on the WF, might be better to trade some high noise steps for some low noise ones
Anonymous No.106133932 [Report]
>>106133919
what does your video combine/save video node look like?
Anonymous No.106133933 [Report] >>106133944
>>106133922
not if you adjust alpha accordingly, use about the same to double the rank
Anonymous No.106133935 [Report]
>>106133910
hmm kk thanks. Ill have to experiment further tommorow. Trying to make a lora for a character that looked fine in Pony models but looks like an ultra generic chinese doll in Illustrious models. The first lora I made is making a small difference but I need it to be a stronger change. I may also just need to create a better dataset.
Anonymous No.106133938 [Report] >>106133952
Any other 3090 users here worried about the trend of higher cuda versions just being a straight downgrade?
I had to downgrade my cuda for diffusion pipe yesterday because it wouldn't let me train on multiple GPUs on 12.8
Anonymous No.106133944 [Report] >>106133972
>>106133933
Using the alpha to account for the rank means you probably should have just used half the rank in the first place.
Anonymous No.106133952 [Report]
>>106133938
I doubt there's much point in upgrading CUDA versions anymore for a 3090 anyway. The most you could optimize is using SageAttention instead of default attention for inferencing (video gen).
Anonymous No.106133957 [Report] >>106133979
>>106133923
They are different trade offs of representing 8-bit float precision. In general neither should be strictly "better" than the other.
>i don't know if one is necessarily better than the other but i'm leaning towards e4m3fn
This will typically require a lot of testing to say confidently, but roll with it if you like it.
I don't see why it will matter across many gens, just roll with one.
Anonymous No.106133972 [Report] >>106134001
>>106133944
higher rank really is higher quality though if you dont burn it, also faster training
Anonymous No.106133979 [Report]
>>106133957
From what I remember, there's a "e4m3fn fast" option available too for quantization. There's a negligible difference in performance, so I just go with whatever generates the fastest.
Anonymous No.106133994 [Report]
Can we get a T2IV 5B workflow and guide in the new wan rentry OP?
Anonymous No.106134001 [Report]
>>106133972
That is a big if.
Anonymous No.106134002 [Report] >>106134008 >>106134009 >>106134016
use case for krea?
Anonymous No.106134008 [Report]
>>106134002
Schizo collages.
Anonymous No.106134009 [Report]
>>106134002
shilling for bfl but that's about it
Anonymous No.106134016 [Report]
>>106134002
diy stock images
Anonymous No.106134027 [Report]
ok, giving the low noise model 1 starting step without light / with cfg was also helpful, still 12 total steps

https://files.catbox.moe/fmlcrd.json
Anonymous No.106134049 [Report]
new comparison, the gen time is like 10 secs off now but whatever
Anonymous No.106134063 [Report] >>106134081 >>106134166
any tips for making your wan 2.2 gens not so frantic? im trying to make a girl shake her ass but she turns into like a bunny demon that shakes at like 500x the normal speed, basically all the motion is way too frantic and fast ??
Anonymous No.106134069 [Report]
>>106133777
fucking waldorf
Anonymous No.106134081 [Report] >>106134210
>>106134063
just set the fps lower
Anonymous No.106134094 [Report] >>106134103 >>106134125
>chroma v48
when will it finish training?
Anonymous No.106134103 [Report]
>>106134094
Chrome is done. This is it.
Anonymous No.106134125 [Report] >>106134130 >>106134139 >>106134163
>>106134094
v49 & v50 will take months due to 1024x1024 training. it most likely won't be finished until october.
Anonymous No.106134130 [Report]
>>106134125
training is quadratic, so if its 2x as big it will take 4x as long, so if it was about every 4 days it will be about every 16 days per epoch. But that is if he uses the same amount of compute
Anonymous No.106134139 [Report] >>106134143
>>106134125
why not 2048x2048
Anonymous No.106134143 [Report]
>>106134139
that would take 16x as long / 16x as much compute
Anonymous No.106134149 [Report] >>106134164 >>106134170 >>106134183
This epoch for sure guys!
Anonymous No.106134163 [Report]
>>106134125
Very much doubt it, they will likely just throw more compute at the last two epochs, I would expect Chroma v49 to drop within days, and the final release to be this month
Anonymous No.106134164 [Report]
>>106134149
i mean its good at its res as is, just not as good as wan
Anonymous No.106134166 [Report] >>106134210
>>106134063
Lower lora strength on the high noise model.
Anonymous No.106134170 [Report]
>>106134149
Yeah I am not hopeful at all for chroma lol.
Anonymous No.106134182 [Report] >>106135287
I don't see the hate there, its amazing for artsy stuff and unlike midjourney its not censored, it being able to do 1024 x 1024 will make it the best local model in that regard
Anonymous No.106134183 [Report]
>>106134149
With two 1024 resolution epochs left to go, Chroma is already easily the best model behind Wan, and it's faster both to use and to train.

Loras give great results both in realism and artstyles, it's also uncensored meaning you don't have to endlessly fight the model when you want to do NSFW.

In short, it will be THE new community model for general purpose.
Anonymous No.106134188 [Report] >>106134190
they always say it's great but they never post a gen
Anonymous No.106134190 [Report]
>>106134188
here is a few thousand, some of them are mine
https://civitai.com/models/1330309/chroma
Anonymous No.106134193 [Report]
>>106133532
yjk
Anonymous No.106134197 [Report]
>>106133514
Based kek
Anonymous No.106134200 [Report]
its better than midjourney already imo and if you want hardcore smut / copyrighted stuff unlike midjourney it wont stop you
Anonymous No.106134203 [Report] >>106134271
light2xv for wan2.2 when??????
Anonymous No.106134205 [Report] >>106134251
>>106133377 (OP)
>2.2 Guide: https://rentry.org/wan22ldgguide
is it me or is the links to the T2V workflows broken
Anonymous No.106134206 [Report]
five seconds is not enough reeeeeeeeee. chinks get on it
Anonymous No.106134207 [Report]
Anonymous No.106134210 [Report] >>106134220
>>106134081
i cant seem to find how to do that? do i lower the length on the gen? brand new to this and im using a prebuilt runpod that says it comes set to 60fps
>>106134166
ty ill try that
Anonymous No.106134215 [Report]
Anonymous No.106134220 [Report] >>106134228
>>106134210
>do i lower the length on the gen?
no
>im using a prebuilt runpod
wut
Anonymous No.106134221 [Report] >>106135047
Anonymous No.106134228 [Report] >>106134238
>>106134220
i have a shit gpu so im using a cloud gpu service, people can prebuild this gpus with settings so they are easy to use, the one im using is called Wan_i2V_60fps
Anonymous No.106134229 [Report]
Anonymous No.106134237 [Report]
Anonymous No.106134238 [Report] >>106134249
>>106134228
that doesn't answer anything
Anonymous No.106134241 [Report]
Anonymous No.106134248 [Report]
Anonymous No.106134249 [Report] >>106134263 >>106134271
>>106134238
i am saying within my comfy workflow i do not see the option to modify my fps
Anonymous No.106134251 [Report]
>>106134205
yeah theyre broken, im mostly doing T2I with the T2V models tho, pretty decent
Anonymous No.106134263 [Report] >>106134289 >>106134308
>>106134249
fps is usually set in the final node which is typically video combine or save video, but how should anyone know what you're talking about if you give no information?
Anonymous No.106134271 [Report] >>106134289
>>106134203
The only thing we know is that the lightx2v team is working on it.

>>106134249
What is your VAE Decode (or WanVideo Decode) node connected to? Post a screenshot.
Anonymous No.106134289 [Report] >>106134323 >>106134338
>>106134263
im sorry ive never used comfy so i am lost, i am trying to edit the fps in the final node now, i didnt think that would help as the preview is still really frantic, but hoping it works!
>>106134271
is this the one?
Anonymous No.106134299 [Report] >>106134309 >>106134311 >>106134425
reminder there are people on youtube getting half a million views just by making basic shit.
Anonymous No.106134302 [Report] >>106134330
Anonymous No.106134308 [Report]
>>106134263
changing the final output fps seemed to help a lot, ty
Anonymous No.106134309 [Report] >>106134548
>>106134299
You can tell they used midjourney because the results actually look good, unlike the shit in this thread.
Anonymous No.106134311 [Report] >>106135788 >>106135799
>>106134299
Late 80's, early 90's anime artstyle is still the best artstyle to date. I can't wait for AI to be good enough that we can go back to it.
Anonymous No.106134323 [Report] >>106134331 >>106134337
>>106134289
I think the biggest problem people have with learning Comfy is that the first workflow(s) they are exposed to is some insane spaghetti with tons of third-party nodes, 90% of which are superflous since the functionality already exist in base
Anonymous No.106134330 [Report] >>106134412
>>106134302
Nice, reminds me of those old Japanese scroll illustrations
Anonymous No.106134331 [Report]
>>106134323
This is a big factor. I don't know why they do it, but many people who share their workflows have a fetish for making them overly complex and full of obscure nodes that offer zero functionality to the final output. It's like adding lucky charms to their outfit to look more impressive or something.
Anonymous No.106134337 [Report] >>106134367 >>106134375 >>106134376 >>106134384 >>106134424
>>106134323
this is a real workflow that someone was proud of and posted for people to use
Anonymous No.106134338 [Report]
>>106134289
The framerate is set to 24, you want to change that to 16, without seeing the rest of nodes that handle the interpolation and their values I can't say for certain what the end result will be like.
Anonymous No.106134367 [Report]
>>106134337
I can smell the mental illness
Anonymous No.106134375 [Report]
>>106134337
well what does it do tho
Anonymous No.106134376 [Report] >>106134378
>>106134337
>pic
That HAS to be a fucking joke.
Anonymous No.106134378 [Report] >>106134405 >>106134427
>>106134376
https://www.reddit.com/r/comfyui/comments/1mg46fi/spaghettification/
Anonymous No.106134384 [Report]
>>106134337
holy nodes
Anonymous No.106134395 [Report] >>106135509
>>106133743
>>106133917
Maybe it has been corrected since then, or maybe it's just impacting slower cards like 3060s.
To test it's easy, literally try any wan, fix the seed, and run it on 12.6,12.8,12.9 (is it out?).
You can easily do that by creating new venv and downloading pytorch for the version.
Anonymous No.106134401 [Report]
>>106133780
lightx2v fucks up the rain
Anonymous No.106134402 [Report] >>106134408 >>106134411 >>106134419 >>106135036
so who won genjam 2?
Anonymous No.106134405 [Report]
>>106134378
I wonder how long it took to zoom out/in
Anonymous No.106134408 [Report]
>>106134402
feds won
Anonymous No.106134411 [Report]
>>106134402
me, I won
Anonymous No.106134412 [Report]
>>106134330
Ty
Anonymous No.106134419 [Report]
>>106134402
i have seen no indication thus far that it was a contest
Anonymous No.106134424 [Report]
>>106133503
>>106133704
Well, I just can't get v param working. I am out of trivial ideas.
I guess it is possible that this shit in the sticky https://github.com/derrian-distro/LoRA_Easy_Training_Scripts is bugged (it doesn't seem to be too actively maintained anymore) or the other anon might have mislead me but whatever I am going to bed now.
I will figure this out another day...
>>106134337
I regularly waste hours making spergy experiments about shit no one cares about but I will never reach THIS level of autism.
Anonymous No.106134425 [Report]
>>106134299
that sakura looks really good
Anonymous No.106134427 [Report] >>106134458
>>106134378
>2700+ nodes
Anonymous No.106134428 [Report]
Trying to get lightx2v to work is pointless. It fucks up too much. Just wait till they update it.
Anonymous No.106134429 [Report]
jam'd and gen'd
Anonymous No.106134458 [Report] >>106134481
>>106134427
this guy is probably employed to do this
Anonymous No.106134481 [Report] >>106134500
>>106134458
That image has the same legal authority as an unemployment certificate from a previous employer.
Anonymous No.106134500 [Report] >>106134510
>>106134481
it should be a case for why comfy shouldn't be employed
Anonymous No.106134510 [Report] >>106134552
>>106134500
is comfy technically employed?
Anonymous No.106134548 [Report] >>106134684
>>106134309
It doesn't look that good, the mouth animations are pathetic, the characters are otherwise mostly static, and there's a lot of asymmetry and blobbiness in the fine details
Anonymous No.106134552 [Report] >>106134561
>>106134510
Depends on how he set up the company structure, but he owns the company, so it's 'employee' on paper only.
Anonymous No.106134561 [Report]
>>106134552
>he owns the company
technically, his ceo owns the company. he just has more equity than the drooling retards working under him
Anonymous No.106134684 [Report] >>106134703 >>106134705
>>106134548
doesn't matter. that guy got 500k views. if he can maybe videos like that every 2-4 weeks, he'd be making $100k+ a year. ridiculous
Anonymous No.106134703 [Report] >>106134713 >>106134752
>>106134684
I made one of those harry potter Balenciaga videos. It got like a million views, I got monetized and then like three days later You tube demonetized me permanently because my content was unoriginal and low effort. Which is true.
Anonymous No.106134705 [Report] >>106136146
>>106134684
>10k subs
>videos are all less than 2 minutes
They aren't making anything because the videos aren't monetized.
Anonymous No.106134713 [Report] >>106134741
>>106134703
Did you atleast manage to get any money within those 3 days?
Anonymous No.106134741 [Report]
>>106134713
Yeah like 200 bucks.
Anonymous No.106134752 [Report] >>106134758
>>106134703
>unoriginal and low effort.
it's higher effort than reaction videos but whatever.
Anonymous No.106134758 [Report]
>>106134752
desu, I spent like all day making them because I was using Stable diffusion 2 at the time. It was by no means an easy process. It took like two days per video of non stop genning.
Anonymous No.106134775 [Report] >>106134813 >>106134852 >>106134878 >>106134885
Hear me out. Batch size of 1 when training details.
Thoughts?
Anonymous No.106134780 [Report] >>106134804 >>106134844 >>106135716 >>106135740 >>106135757
>There are people on /sdg/ and /ldg/ that didn't go all in on nvidia stocks when SD1.5 kick off.

ngmi. Whoring out your digital waifu for coomer bucks is pathetic. Learn to make money with money. If you're spineless and has weak hands, the SP500 should protect against inflation at the minimum.
Anonymous No.106134789 [Report]
insufferable prick
Anonymous No.106134804 [Report]
>>106134780
rude
Anonymous No.106134813 [Report] >>106134820
>>106134775
Does it matter?
Anonymous No.106134820 [Report] >>106134828 >>106134860 >>106134931
>>106134813
I genuinely do not know. What does a large batch size look like to a model compared to a batch size of 1?
Anonymous No.106134828 [Report]
>>106134820
the same
Anonymous No.106134844 [Report] >>106134848
>>106134780
I like how the AI can't decide whether to make an axe or sword and constantly switches back and forth between them.
Anonymous No.106134848 [Report]
>>106134844
I don't like that it's a repost and it's probably not the author
Anonymous No.106134852 [Report]
>>106134775

I do details training by separating items from the character and include it in the dataset. My logic is that if I can generate that particular trinket by itself in high res, it should improve details. And that does indeed work upon generation. Dataset is king after all. Optimal and efficient? Who knows. Prove me wrong.
Anonymous No.106134860 [Report]
>>106134820
I imagine it'd highly depend on the thing you're training. You won't ever get consistent results. I just stick to batch size 1.
Anonymous No.106134878 [Report]
>>106134775
I did a bunch of tests back on Flux and SDXL, and yes, batch 1 was overall best quality, both in details and overall capture of concept.

However, it's also much slower, just going from batch 1 to batch 2 is ~25-30% faster depending on resolution / hardware / model, I typically land at batch 4 which is ~40-45% faster for me.

Also you need to increase the LR from what is good on batch 1 when you go to higher batches, else the results will suffer.
Anonymous No.106134880 [Report] >>106134893
What are your best videos you have seen so far /ldg/?
Anonymous No.106134885 [Report]
>>106134775
it won't really matter on most settings I've tried
Anonymous No.106134893 [Report] >>106134915 >>106136301
>>106134880
a meme or porn. pretty much all this tech is good for. throwaway content
Anonymous No.106134910 [Report] >>106134920 >>106134921
Fuck I hate comfy ui

What nodes do i use if I DON'T want to preprocess my depth/pose/canny control net? I already have my maps generated. in voldie's i could just choose preprocessor type none but that's not a thing
Anonymous No.106134915 [Report]
>>106134893
not much you can do with 5 seconds aside from memes/porn
Anonymous No.106134920 [Report] >>106135004
>>106134910
bypass the node
noob
Anonymous No.106134921 [Report] >>106135004 >>106135043
>>106134910
input image and load the preprocessed image retard
Anonymous No.106134931 [Report] >>106134954
>>106134820
The difference is in how it updates its learning, with batch 1, the model is at its 'optimal state' in terms of the images it has learned, since it learns all images sequentually and thus the model learning can 'grok' the next image based upon all the other images it has learned.

When you go above batch 1, you are learning several images at the same time, independantly, so they get nothing from eachother in terms of learning, and the more images you train simultaneously (higher batch) the more this hampers learning quality.

The reason you want higher batches is for speed, not quality, I've seen some people argue that the gradients are more normalised when using higher batches which should help in learning, but every single test I've done and have seen others do, show that batch 1 gives the best quality. But again, unless you are traininig small amounts of images, it's too much performance to throw away by not using higher batches.
Anonymous No.106134954 [Report] >>106135231
>>106134931
unless you're getting paid to make loras, there is no point in trying to aim for speed while sacrificing quality. personally if I got paid to make loras, I'd use runpod or something and just run dozens of training in parallel at batch 1.

SDXL in particular only takes 1-2 hours on a 3090 anyway. I doubt you need to make 20+ loras a day.
Anonymous No.106134983 [Report]
Anonymous No.106135001 [Report] >>106135081
Anonymous No.106135003 [Report]
For some reason saving the output of the high noise model always produces garbage. Why? It looks fine in the sample preview. Has to be a bug or something
Anonymous No.106135004 [Report] >>106135019 >>106135043 >>106135046 >>106135114
>>106134921
>>106134920
I mean what node do I replace the comfyui control net node with if I want to run it without a preprocessor?
Anonymous No.106135019 [Report]
>>106135004
i dont know what you're doing so I could be off but there's a custom node called ComfyUI-Advanced-ControlNet you can install which has more options
Anonymous No.106135036 [Report]
>>106134402
you did! congrats
Anonymous No.106135043 [Report] >>106135122
>>106135004
>>106134921
Anonymous No.106135046 [Report] >>106135122
>>106135004
The image input should be your preprocessed image bypass any processing on image going into it.
Anonymous No.106135047 [Report]
>>106134221
nice
Anonymous No.106135081 [Report] >>106135142
>>106135001
She's crying because of her malformed twin behind her.
Anonymous No.106135114 [Report]
>>106135004
again, bypass the node. I setup a bool so all i have to do is click a button to disable it once my image has been processed.
Anonymous No.106135122 [Report] >>106135389
>>106135046
>>106135043
I figured it out. The node is not preprocessing the images and it's fine to put depth maps in directly.

1. I copied a flux workflow that was putting normal images directly into that node
2. I assumed the node was doing preprocessing somewhere because why would you do that if it wasn't
3. I tested anyway putting in a depth map that usually gets good results in SDXL
4. The good depth map got shit results and assumed it was preprocessing the depth badly

turns out I had a bad controlnet model and the workflow I followed was also using it wrong. I replaced it with the flux union one and now i'm getting good results. Sorry for the trouble.
Anonymous No.106135142 [Report] >>106135197
>>106135081
heh. you know a character lora is bad when clones start popping up. OVERFITTED
Anonymous No.106135197 [Report]
>>106135142
I wish I still had the video of the highly overfitted bog LoRA where it was deepfried and everyone looked like a bog.
Anonymous No.106135231 [Report]
>>106134954
Sure, if the extra time doesn't bother you, why not go for the best quality
Anonymous No.106135272 [Report] >>106135435 >>106137290
>>106133860
Anonymous No.106135287 [Report] >>106135291
>>106134182
the hate is from 3060 vramlets who cant run it, as usual
Anonymous No.106135291 [Report]
>>106135287
(fast)
Anonymous No.106135389 [Report]
>>106135122
> falsely blaming ComfyUI again, ep.1333
Anonymous No.106135435 [Report] >>106136912
>>106135272
https://eu.news-press.com/story/news/crime/2025/04/17/lee-county-woman-gets-prison-for-having-sex-with-household-pets/83137545007/
Anonymous No.106135509 [Report] >>106135648 >>106136671
>>106134395
Same for my 3060, uninstalled pytorch+cuda128 and installed pytorch+cuda126,I tested multiple gens and there's no noticeable improvement. Might be already fixed? Or anon was just trolling?
Anonymous No.106135534 [Report] >>106135610
sometimes im wondering if my prompt isn't good enough or if I'm getting trolled by bad seed rolls
Anonymous No.106135547 [Report]
comfy should be dragged out on the street and shot
Anonymous No.106135591 [Report]
>>106134119
Turns out if I use the low noise model solo in kijai workflow it does this shit. I prompted for character knocking on viewer's screen with "changing scene" in NAG and it still did it.
Anonymous No.106135610 [Report]
>>106135534
Cursed GPU I'm afraid.
Anonymous No.106135648 [Report] >>106136642
>>106135509
The reddit thread about it dates from months ago, so my guess is whatever the issue, it's not there anymore.
Or maybe it's linked to driver version too?
https://www.reddit.com/r/LocalLLaMA/comments/1jlofc7/performance_regression_in_cuda_workloads_with/

You can try:
Driver Version: 560.35.05
CUDA Version: 12.6
Anonymous No.106135657 [Report]
>106135547
Very organic.
Anonymous No.106135716 [Report] >>106135750
>>106134780
i was 15 when sd1.5 kicked off
im only about to get a bank account
Anonymous No.106135740 [Report] >>106135750
>>106134780
hard to invest when you have no money
Anonymous No.106135750 [Report] >>106135759 >>106135879
>>106135716
Fuck off zoomer

>>106135740
Fuck off poor fag
Anonymous No.106135757 [Report]
>>106134780
>didn't go all in on nvidia stocks when SD1.5 kick off
I COULD BE A MILLIONNAIRE
well, it is what it is
Anonymous No.106135759 [Report] >>106135939 >>106135947
kek
>>106135750
fuck off r*dditor
Anonymous No.106135775 [Report] >>106135801 >>106135857 >>106135878 >>106135879 >>106135886
So are nvidia chips like salvaged from a giant alien wreck in in Taiwan or something? Why can't any other company even come close to their product? And don't bullshit me with lies like AMD being held back by software alone. I know it's shit hardware too.
Anonymous No.106135788 [Report]
>>106134311
It's really doesn't look any worse than animation out of that era either. The originals could be pretty janky
Anonymous No.106135799 [Report]
>>106134311
Kinda agree
Anonymous No.106135801 [Report]
>>106135775
Because it will take years and a lot of money to catch them. Even mainland chinese tried and failed.
Anonymous No.106135857 [Report]
>>106135775
Mainly because of CUDA, and they're also in the bleeding edge performance wise.
Anonymous No.106135878 [Report]
>>106135775
Nvidia's hardware is better optimized for AI since they've been aboard on it for ~a decade, but the amount AMD lags behind is also a large part not having the same software optimizations.
Anonymous No.106135879 [Report] >>106136504 >>106138079
>>106135775
1. CUDA
2. While AMD focused on gaming GPUs and CPUs, Nvidia went full all in on AI infrastructure.
3. First-Mover Advantage in AI’s gold rush.
4. Nvidia bet on AI acceleration before it was cool (see: 2012 AlexNet on GPUs).
5. TSMC’s 4nm/5nm nodes are bottlenecked, and Nvidia booked capacity years ahead. AMD has to fight for scraps (MI300X is TSMC 5nm/6nm), while Nvidia’s H100s are printing money.
6. Smaller players (Cerebras, Graphcore) can’t scale due to costs.
7. AMD is juggling CPUs (Ryzen/EPYC), GPUs (Radeon), and now FPGAs (Xilinx). Nvidia’s entire existence is “accelerated computing.” Focus matters.
8. That said, AMD’s MI300X is competitive in raw specs—but without CUDA, it’s stuck selling to hyperscalers (Microsoft, Meta) who can afford to port code. For everyone else? CUDA or die.
lets say you're a company trying to catch up on ai, you want the best of the best and dont want to be taking risks, especially when it comes to software. writing software by yourself also costs money, remember wages are like 100K$/year/person in land of the free
>>106135750
nice bait but ill take it, are you jealous that i got into the ai field so early and all im doing in my free time is gooning? cope more wagie, while you're grinding your ai skills to catch up i've been casually consooming all ai models since 2022 and the only thing i've been using them for is gooning
you will never be celibate again, while i'm keeping my virginity for a custom made open source robot, you're simping for coworkers or getting divorce raped
Anonymous No.106135886 [Report]
>>106135775
Unsurprisingly shit software that issues twice the number of instructions is slower
Anonymous No.106135939 [Report]
>>106135759
DUN DUUUN
Anonymous No.106135947 [Report]
>>106135759
cabbage ass
Anonymous No.106136030 [Report] >>106136036 >>106136055
can I add my dataset to chroma training?
Anonymous No.106136036 [Report] >>106136203
>>106136030
???
Anonymous No.106136053 [Report] >>106136066 >>106136068
Can I train a wan lora that is just T2I or does it need to be video dataset?
Anonymous No.106136055 [Report]
>>106136030
Yes, if got at least 7 votes of the Chroma Council.
Anonymous No.106136066 [Report]
>>106136053
You can train wan t2v and i2v with pictures too.
Anonymous No.106136068 [Report]
>>106136053
Nope. You don't even strictly need video to train T2V. It will hurt the motion though.
Anonymous No.106136122 [Report]
What VLMs can take fetish content? Alternatively, how do I automate batch tagging of images for wan? I have SDXL datasets that are tagged with just booru tags and I need to convert it for WAN use.
Anonymous No.106136123 [Report]
Anonymous No.106136129 [Report] >>106136142
>T5-XXL vs T5-small with adapter
Better, the frog has the sign now
This run uses 11x the prompts and more layers, still saves ~8GB compared to T5-XXL
Anonymous No.106136142 [Report] >>106136160
>>106136129
what do you mean better?
what are you even talking about?
Anonymous No.106136146 [Report]
>>106134705
>10k subs
Irrelevant I have 2k subs and I am monetized.
>2 minutes
Shorter the content the less you make yes.
Anonymous No.106136160 [Report] >>106136206
>>106136142
Compared to the last run
I'm training an adapter that turns T5-small embeds into T5-XXL embeds
Anonymous No.106136203 [Report]
>>106136036
!!!
Anonymous No.106136206 [Report] >>106136275
>>106136160
thats interesting, so are you distilling T5-XXL into T5-Small? what rig are you doing it on? are you planning to open source it?
you should think of a license before you open source it (im not talking about cuck license im talking about a license that will prevent big tech from using your trained model)
https://opensource.google/documentation/reference/using/agpl-policy
>WARNING: Code licensed under the GNU Affero General Public License (AGPL) MUST NOT be used at Google.
Anonymous No.106136273 [Report] >>106136281 >>106136486
Anonymous No.106136275 [Report] >>106136289
>>106136206
I guess it could count as distilling.
The trained model is just a bunch of linear layers and activations, T5-small is 512 dim, T5-XXL is 4096, the first layer is 512->4096 and the rest are 4096->4096.
The dataset is embeds from T5-small and T5-XXL, training is T5-small embed -> adapter -> target is T5-XXL embed
Dataset is precomputed, saved to webdataset, with some custom tensor serialization with compression because T5-XXL embeds are huge
I was using A40 on runpod but it's too slow, now I'm using gpu_1x_gh200 on Lambda, that's ARM64 + H100, 64 vCPUs, 432 GiB RAM, 4 TiB SSD for only $1.49 / hr
If it ends up working good enough then yeah I'll release it
Anonymous No.106136281 [Report]
>>106136273
Nice cock
Anonymous No.106136289 [Report]
>>106136275
that's very kewl
Anonymous No.106136301 [Report]
>>106134893
That's 90% of what people go to the internet for, this technology is winning
Anonymous No.106136307 [Report] >>106136338
Respect copyright. Remember to gen with models that are properly licensed.
Anonymous No.106136338 [Report]
>>106136307
In other words no model, not a single model has licensed the images or videos they train on
Anonymous No.106136427 [Report]
Anonymous No.106136464 [Report] >>106136471
Anonymous No.106136471 [Report] >>106136486 >>106136648
>>106136464
very nice, it got close with "word"
Anonymous No.106136486 [Report] >>106136555
>>106136273
>>106136471
You two, get a hotel room.
Anonymous No.106136504 [Report]
>>106135879
incelibate*
Anonymous No.106136545 [Report] >>106136555 >>106136601
big qwen t2i eventually https://github.com/huggingface/diffusers/pull/12055
this better not suck
Anonymous No.106136555 [Report]
>>106136486
a diamond statue of a teen girl with small perky breasts and a tight pussy is bent over and getting fucked by a copper statue of a big veiny cock
>>106136545
it will suck because qwen LLMs are very cucked
Anonymous No.106136601 [Report] >>106136705
>>106136545
What could it realistically deliver of wan?
Anonymous No.106136621 [Report]
wan 2.2 vace when?????
Anonymous No.106136642 [Report] >>106136671
>>106135648
Can't seem to find that driver version equivalent for windows, maybe I can try with a windows driver near the same release date but I can't seem to track down when that specific linux driver released. I'm using drivers released back in April right now.
Anonymous No.106136646 [Report] >>106136689 >>106136718
I'm trying to install diffusion pipe, but when I try to requirements.txt I'm missing some modules that those requirements need. Is there a way to chain it so it automatically pulls everything it needs? Using miniconda since the diff pipe git says to use that
Anonymous No.106136648 [Report]
>>106136471
Yeah it's getting close. Interesting that the frog seems less plastic too.
Anonymous No.106136671 [Report] >>106136708
>>106135509
you need to install the old cuda 12.6 not just the pytorch packages...
check what version of cuda you have active by doing nvcc --version
>>106136642
driver isnt that big of a deal, cuda version is
Anonymous No.106136689 [Report] >>106136731 >>106136857
>>106136646
nvm, I didn't notice that deepspeed is fucked
Anonymous No.106136704 [Report] >>106136712 >>106136927
>there are still people ITT that dont know about venvs
Anonymous No.106136705 [Report]
>>106136601
Hopefully not a complete disgrace to the qwen team. Otherwise, why bother releasing it.
Anonymous No.106136708 [Report] >>106136719
>>106136671
Yeah I did both, I installed 12.6 and installed 126 pytorch packages.
Anonymous No.106136712 [Report] >>106136872
>>106136704
What's venv?
Anonymous No.106136718 [Report] >>106136731
>>106136646
List the missing modules and maybe we can help, like holy shit do you think people are psychic ?
Anonymous No.106136719 [Report] >>106136824
>>106136708
interesting, what are you testing it with? i had a speedup on wan 2.2/2.1 kijai sageattention workflow
ill grab newest drivers and cuda 12.8 to test
Anonymous No.106136731 [Report] >>106136944
>>106136718
>>106136689
It's the deepspeed, but it turns out it just refuses to work at all on windows. Gonna need a WSL
Anonymous No.106136824 [Report]
>>106136719
My WF is just lightx2v and sageattention, I went full retard and forgot to screenshot my 12.8 results but on 2nd sampler pass they were around 80-90 seconds. On 12.6 they were still in that range, same WF, same seed, same initial image.
Anonymous No.106136840 [Report]
Will do more runs later
Anonymous No.106136857 [Report]
>>106136689
If you read the repo, you'd know it only works on linux of wsl2
Anonymous No.106136872 [Report] >>106136956
>>106136712
Something you don't need to worry about. Only retards without jobs care about them. Just pip install everything to your home environment.
Anonymous No.106136912 [Report] >>106137290
>>106135435
Anonymous No.106136927 [Report]
>>106136704
you mean docker?
Anonymous No.106136944 [Report]
>>106136731
WSL2 to be exact
Anonymous No.106136956 [Report] >>106136959 >>106137239
>>106136872
What's pip?
Anonymous No.106136959 [Report] >>106136990
>>106136956
peepee in poopoo
Anonymous No.106136974 [Report]
any nsfw loras for krea? or what nsfw flux loras work with krea?
Anonymous No.106136982 [Report]
Anonymous No.106136990 [Report]
>>106136959
I'm twitching rn...
Anonymous No.106137041 [Report]
Anonymous No.106137058 [Report] >>106137080 >>106137240
Does anyone have any tips for generating NSFW audio with mmaudio? Trying to get that glrrrrrrrk glrrrrrrk sound
Anonymous No.106137080 [Report] >>106137087 >>106137240
>>106137058
you mean thinksound?
Anonymous No.106137087 [Report]
>>106137080

Is that what everyone is using now for audio?
Anonymous No.106137111 [Report] >>106137118 >>106137131 >>106137160
Can I make a request for one of you to animate this image?
Anonymous No.106137118 [Report]
>>106137111
no, demand it instead
Anonymous No.106137131 [Report]
>>106137111
No, beg for it instead
Anonymous No.106137160 [Report] >>106137210
>>106137111
that's pretty ancient, surely someone can shop something better by now
Anonymous No.106137191 [Report] >>106137217 >>106137232
Wait, do I have to reinstall the diffusion pipe entirely within the WSL linux environment?
Anonymous No.106137210 [Report] >>106137574
>>106137160
pretty sure new Emma manipulation tech is superior
Anonymous No.106137217 [Report]
>>106137191
you mean install it in WSL period? It does not work outside of linux or wsl
Anonymous No.106137232 [Report] >>106137296
>>106137191
https://civitai.com/articles/12837/full-setup-guide-wan21-lora-training-on-wsl-with-diffusion-pipe
Anonymous No.106137234 [Report] >>106137476
===UPLIFTING NEWS===
570.133.07 with cuda 12.8 is magically no longer fucked up, maybe old pytorch cu128 had fucked up kernels (source: cudadev)
in fact it's faster now!
Anonymous No.106137239 [Report] >>106137265
>>106136956
Python environment package installer
Anonymous No.106137240 [Report]
>>106137058
mmaudio is kinda ok at it, hit and miss really

>>106137080
is there any gguf of it out there? i cant find any. the model is like 21gb kek, apparently here's the comfyui version https://github.com/Yuan-ManX/ComfyUI-ThinkSound
Anonymous No.106137264 [Report]
>ctrl c and v doesn't work in linux
I already want to kill myself
Anonymous No.106137265 [Report] >>106137299
>>106137239
What's Python?
Anonymous No.106137285 [Report] >>106137681
ITS FUCKING HAPPENING
Anonymous No.106137290 [Report] >>106137357
>>106135272
>>106136912
>>>/pol/
Anonymous No.106137296 [Report] >>106137376
>>106137232
Is the cuda 12.4 and older torch required or is it just because the guide is older? Or do I just install the same I run in comfy?
Anonymous No.106137299 [Report]
>>106137265
*unzips pants*
This
Anonymous No.106137357 [Report] >>106137552
>>106137290
>rabbi posts lies
>gets deboonked
>"NOOOOOO YOU POLTARD"
kek
Anonymous No.106137376 [Report]
>>106137296
you need to install linux debian 12.11
Anonymous No.106137476 [Report]
>>106137234
yeah just switched back to 12.8 and updated drivers and this is actually faster.
Anonymous No.106137485 [Report]
Anonymous No.106137552 [Report] >>106137569 >>106137596
>>106137357
I'm not a kike nor an insecure faggot like you. Take a hike, you no-gen retard.
Anonymous No.106137569 [Report]
>>106137552
its ok you were proven wrong lil bro, you'll recover
Anonymous No.106137574 [Report]
>>106137210
you and that fucktard genning the chinese girl are cancer
Anonymous No.106137596 [Report]
>>106137552
2.2? so powerful
Anonymous No.106137655 [Report]
Fresh

>>106137650
>>106137650
>>106137650

Fresh
Anonymous No.106137681 [Report] >>106137735
>>106137285
Probably Autoregressive, right?
If past AR models are anything to go by, typical ldg VRAMlets won't be able to run this
Anonymous No.106137735 [Report] >>106137753
>>106137681
If it was autoregressive it'd be in Transformers not Diffusers. You're right though except nobody can run it because they haven't actually released the weights.
Anonymous No.106137753 [Report] >>106137784
>>106137735
I dropped the code changes in chatgpt and it said this:

>Although it's integrated into the diffusers library (often associated with diffusion models), Qwen-Image stands apart from classical diffusion pipelines like Stable Diffusion. Instead, it follows an encoder-decoder setup with transformer blocks, suggesting sequential generation — a hallmark of autoregression.
Anonymous No.106137784 [Report] >>106138250
>>106137753
Chat is getting confused. It's basically the same architecture as Flux and SD3 with bigger dimensions.
Anonymous No.106138079 [Report]
>>106135879
>Nvidia bet on AI acceleration before it was cool
Impressive to see how almost every time, Jensen foresight was correct.
Anonymous No.106138250 [Report]
>>106137784
It's same arch as flux, except entirely MMDiT. Which is absolutely retarded.
Anonymous No.106139114 [Report]
>>106133710
Where do i see more of this? ⸜(。˃ ᵕ ˂ )⸝