← Home ← Back to /g/

Thread 106413155

155 posts 40 images /g/
Anonymous No.106413155 >>106413859
/ldg/ - Local Diffusion General
Computer Snake Die Edition

Discussion of Free and Open Source Text-to-Image/Video Models and UI

Prev: >>106407231

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassic
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://tensor.art
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://rentry.org/wan22ldgguide
https://github.com/Wan-Video
https://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y

>Chruma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Samplers: https://stable-diffusion-art.com/samplers/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
https://rentry.org/ldg-lazy-getting-started-guide#rentry-from-other-boards
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
Anonymous No.106413199 >>106413303 >>106413425 >>106413910
EasyCache is good with Heun. It doesn't skip the first 2-3 Heun steps, starts to skip Euler steps at some point. 2.5x speedup compared to Heun without EasyCache. Sometimes nearly no differences, sometimes pic related.
Anonymous No.106413202
>three weeks since rentryanons last update
I await his glorious return.
Anonymous No.106413252
Chroma status?
Anonymous No.106413287 >>106413307
>>106413165
>Huh, besides VRAM capacity, that's the first time I've seen in AI that would make you want to upgrade cards that badly from the Ampere generation.
With nunchaku, the difference would be even bigger in favor of blackwell architecture.
Anonymous No.106413291 >>106413307
>AMD is AyyMD
>3090 is showing its age
bros
Anonymous No.106413303 >>106413409 >>106413471
>>106413199
>Heun
Is it really that good? From what I understand it's twice as better as per step compared to most other sampler, at half their speed.
Anonymous No.106413307 >>106413322
>>106413287
>>106413165
>>106413291
again, these benchmarks are obviously fucked up. retards who run these tech websites have no clue how to set local gen up correctly.
Anonymous No.106413309
still no wan svdquant
Anonymous No.106413322
>>106413307
true desu i wonder why a local chad hasnt taken this easy layup
Anonymous No.106413363 >>106413405 >>106413697
automatic1111 is old and crap
comfy doesn't even qualify as UI
what to do?
Anonymous No.106413376 >>106413511 >>106413694
>Recent advances in video generation produce visually realistic content, yet the absence of synchronized audio severely compromises immersion. To address key challenges in Video-to-Audio (V2A) generation, including multimodal data scarcity, modality imbalance and limited audio quality in existing V2A methods, we propose HunyuanVideo-Foley, an end-to-end Text-Video-to-Audio (TV2A) framework that synthesizes high-fidelity audio precisely aligned with visual dynamics and semantic context. Our approach incorporates three core innovations: (1) a scalable data pipeline curating 100k-hour multimodal datasets via automated annotation; (2) a novel multimodal diffusion transformer resolving modal competition through dual-stream temporal fusion and cross-modal semantic injection; (3) representation alignment (REPA) using self-supervised audio features to guide latent diffusion training, efficiently improving generation stability and audio quality. Comprehensive evaluations demonstrate that HunyuanVideo-Foley achieves new state-of-the-art performance across audio fidelity, visual alignment and distribution matching.

https://szczesnys.github.io/hunyuanvideo-foley/

comfyui when
Anonymous No.106413405 >>106413608 >>106413703
>>106413363
make your own ui
Anonymous No.106413409
>>106413303
Heun does most of it work in the first few steps. I tried to optimize it further by doing a set % (around 25%) of Heun steps and then switching to Euler. It was a bit faster but diverged sometimes from pure Heun depending on the prompt and seed.
Anonymous No.106413420 >>106415246 >>106415298
surprised to see the 7900xtx oom here (along with 9700xt). I assume if you have designer programming socks you can get it to work but the fact amd has no generic solution is really sad.
Anonymous No.106413421 >>106413523
>>106413165
if the super has 24 or 32 gb ram it's gonna be great for local ai until the next model shows up that requires more vram
Anonymous No.106413425 >>106413467
>>106413199
Anonymous No.106413467
>>106413425
Heun does two steps in one 'step' (the whole point of Heun, really), but EasyCache doesn't realize that.
Anonymous No.106413471
>>106413303
nta but heun is pretty decent but its usually slow as fuck. I've been using a mix of deis, res_multistep, sa_solver, seeds_2/3, res_#s/m, all produce some interesting results
Anonymous No.106413511
>>106413376
what does it produce if you feed it JAV?
Anonymous No.106413523 >>106413582
>>106413421
Super refresh will 100% have 24GB max. It's Nvidia.
Anonymous No.106413528
Anonymous No.106413582
>>106413523
you're probably right but one can hope
Anonymous No.106413608 >>106413703
>>106413405
this. be the change you want to see
Anonymous No.106413694
>>106413376
this is awesome to see, not because of this particular release, but because this is now a strong confirmation that the meta for China has switched to focusing on video + audio in general

>>106413165
>A 5070 Ti is faster than a 3090 at Wan
hell yeah. I don't regret my decision to get one at all. Even at 32gb of ram it is plenty fast and plenty high quality for my goon and beauty purposes
Anonymous No.106413695
I guess Heun is a bad example because its two samplers in one, but dpmpp_2m in Comfy averages around 1.45x speed-up. Basically you need a fast converging sampler. Some samplers hover at very rather high change rates and you get 0/30 steps skipped.
Anonymous No.106413697
>>106413363
have you tried anistudio?
Anonymous No.106413702
cozy bread
Anonymous No.106413703
>>106413405
>>106413608
Gaddafi tried to make his own ui too
Anonymous No.106413705
Anonymous No.106413734 >>106413786
>>106411747
Magnesium helps with this. Can recommend.
Anonymous No.106413740 >>106413924
Anonymous No.106413755 >>106413777
so whats the latest and greatest text to video so far? can i use flux loras with it?
Anonymous No.106413777
>>106413755
wan 2.2

>can i use flux loras with it?
kill yourself indian
Anonymous No.106413786
>>106413734
With what? With UI PSTD?
Anonymous No.106413811
very inorganic
Anonymous No.106413859 >>106413984
>>106413155 (OP)
>Free and Open Source
ok so where do i pick up my free 4090?
Anonymous No.106413896 >>106414597
>>106412300
I mean, aesthetic appeal is subjective, for my purposes it lets me gen, lets me goon, and lets me be happy. And that’s enough for me. Why does that seem to trigger comfy users so much? I don’t get it. I’m not even anti comfy, I have an instance for when I want to play around with it and everything. It feels like manual transmission bros getting mad that someone is satisfied with an automatic, like why
Anonymous No.106413910 >>106413919
>>106413199
Dumbledore becomes evil Dumbledore. What did eysycache mean by this?
Anonymous No.106413919
>>106413910
EasyRape
Anonymous No.106413924
>>106413740
I'm sorry Mario, but the princess is in another castle.
Anonymous No.106413973
I take a look at these threads and I see the meme makers of tomorrow.
Anonymous No.106413984
>>106413859
>he doesn't know
Anonymous No.106413996
any tips on how to use the lighting lora for wan 2.2 without everything moving in slow motion? Someone wrote, they use the 2.1 lora with a strength of 3 and the 2.2 lora at 1 strength for high noise and the 2.1 lora at 1 for low noise but that didn't help. Only using the 2.1 and having "fast motion, rapid movement" in the prompt didn't help either.
Anonymous No.106414033
I take a look at these threads and I see the CP makers of tomorrow.
Anonymous No.106414046
I take a look at these threads and I just see "Anonymous"
Anonymous No.106414070 >>106414167
I take a look at these threads and I see the school shooters of tomorrow.
Anonymous No.106414086
I take a look at these threads and I see these threads.
Anonymous No.106414114 >>106414126
I'm blind
Anonymous No.106414119
Anonymous No.106414126
>>106414114
Ok this got me lmao
Anonymous No.106414151
I saw these threads IRL today and it made me lol
Anonymous No.106414167
>>106414070
I take a look at myself and realize I'm way too old for that
Anonymous No.106414191
What is latent concat for?
Anonymous No.106414481 >>106414489 >>106414559 >>106414671
AIs are basically compression, as California courts have discovered. So while models like Flux and Qwen Image are bad for their copyright infringement, Chroma is infinitely worse since this means it's an compressed archive of CSAM. And generating images with it is basically extracting jpgs from a giant zip file. It's most likely illegal to possess a Chroma model on your hard drive, FYI.
Anonymous No.106414489
>>106414481
well good thing it's on my ssd
Anonymous No.106414551 >>106414559 >>106414588 >>106414697 >>106414739 >>106414764 >>106414808
Im a txt2img vramlet and you guys keep me bored with all your video gens. Every time I search what youre talking about it ends up being heavy crap that needs tons of vram. Why cant you post simple prompts instead of resource intensive stuff. What happened to ldg? Two months ago we were talking about simple things now its only heavy gens.

You are boring.
Anonymous No.106414559 >>106414598
>>106414481
>>106414551
I get the thread is dead right now but chill bro, no need for artificial engagement
Anonymous No.106414561
Anonymous No.106414581
Anonymous No.106414588 >>106414623
>>106414551
What's stopping you from posting your 1girls?
Anonymous No.106414597 >>106414658
>>106413896
ComfyUI users aren't gooners or AI artists, they're programmers and coding hobbyists messing with AI.
Haven't you noticed the garbage they make with their high end GPUs?
"5 seconds of Miku lifting barbells", "a cop eating donuts in a car", "an elf enters a dungeon and an ogre with a Chroma shirt appears".
Total brainlet coding monkeys. Completely uninspiring content.
Anonymous No.106414598
>>106414559
Only the former is one of my conversation starters.
Anonymous No.106414623 >>106414722
>>106414588
Already been doing it anon, maybe you just didnt notice cause you ignore me like everyone else. /ldg/ wasnt this months ago before WAN happened. Used to actually have community and being more united
Anonymous No.106414658 >>106414705 >>106415055
>>106414597
Yes, some of us are more interested in the technology than the right amount of cum on 1girl's face.
It's insane how low the bar is. It's a graph. Things flow in one direction. But I guess knowing that there's something going on under the hood is painful to some people.
Anonymous No.106414671
>>106414481
>AIs are basically compression, as California courts have discovered.
t. the voice in my head
Anonymous No.106414697
>>106414551
>txt2img vramlet complains
>no txt2img gen

these types of complaints are older than video models and therefore more boring.
Anonymous No.106414705 >>106414773
>>106414658
Also, connecting and minimizing the amount of nodes is a fun activity.
Anonymous No.106414722
>>106414623
>/ldg/ wasnt this months ago before WAN happened
No shit nigga. It's almost lilke everyone jumped on it to try the new toy-
Anonymous No.106414735 >>106414741
Is anon upset that videogen is resource intensive? Whut?
Anonymous No.106414739
>>106414551
there are only 3 types of gen threads. its either shitty video, anime pedophiles or debo.
pick one and shut the fuck up
Anonymous No.106414741
>>106414735
vramlet tantrum lol
Anonymous No.106414764
>>106414551
if it makes you feel any better, I just got myself a 5090
Anonymous No.106414773 >>106414850
>>106414705
>connecting and minimizing the amount of nodes is a fun activity

Don't you see? I can't coexist with people like you. Why am i stuck in this thread with someone like you who likes to reorganize and connect nodes? We're not even on the same wavelength the two of us view AI as a complete different thing.
Anonymous No.106414789
is musubi tuner the new thing to train qwen loras?
Anonymous No.106414808
>>106414551
Vramlet tears are delicious
Anonymous No.106414827 >>106415045
Any guides on training Wan 2.2 Loras locally?
Anonymous No.106414832 >>106414915
One thread one meltie im getting really tired of this shit already
Anonymous No.106414846
Anonymous No.106414850
>>106414773
Saar, this is /g/. There are other boards with ai generals that emphasize the "art" more than the technology. Try >>>/h/
Anonymous No.106414867
Seems a bit dangerous... But good save by KKKhroma, turning my Thorgal lora's potential fireplace/campfire into a traffic cone.
https://litter.catbox.moe/rffhrk.safetensors
I made the mistake of masking all of the speech bubbles and not captioning them AND captioning all of them like "comic book style illustration of blah blah", so some seeds have extremely persistent speech bubbles in them. Also I noticed ~all of Joycaption beta one's captions are incorrect.
So for the next run, I'm going to omit any medium/genre captioning, add some speech bubble images to teach it what's bad and manually fix all of the captions (infuriatingly slow, but at least Joycaption gives me a template).
Anonymous No.106414915
>>106414832
Well fuck you and everyone else in this thread!
there now two melties
Anonymous No.106414929 >>106414962 >>106414966 >>106414967 >>106415000 >>106415021 >>106415065 >>106415068
With all respect but I kinda agree with the complaining Vramlet Anon. The gap between running humble SDXL vs WAN and Qwen is literally $2000 in GPU hardware. Having both in the same general doesn't work, it's too big an economic jump. Like having people playing PS1 Resident Evil in the same thread as people on PS5 RE. The tech and money gap is massive.

My suggestion? Not saying make a new general but Vramlets should go to /sdg/. They also just generate images and their thread focuses on diffusions themselves, not the surrounding tech.

We should propose Vramlets migrate to /sdg/ so everyone can gen without issues and feelings of inferiority.
Anonymous No.106414962
>>106414929
I dont get why poorfags who cant even afford a 2000 dollar GPU (thats actually not even that expensive) get into a hobby like local AI?
it makes no sense, as a poorfag you should probably get a job first before you dive into hobbys like AI.
like if you cant afford it you have no business being in any of these AI threads.
thats just how it is.
Anonymous No.106414966 >>106414977
>>106414929
How much vram do I need to be able to stay here? What's the cut-off point?
Anonymous No.106414967
>>106414929
You can run Q8 WAN and Q6 Qwen on a 4070S. But lately the bottleneck is actually system ram to handle all the offload so 64+ is optimal.
Anonymous No.106414974 >>106415043
Has the recommendations changed? I thought it was always recommended to get second hand 3090s. What the fuck are you guys doing???
Anonymous No.106414977
>>106414966
12gb
Anonymous No.106415000 >>106415017
>>106414929
If you don't care about speed, a 16gb AMD card is 500e and 4x 32GB ddr5 sticks is another 500 eurodollars.
Anonymous No.106415010
Please Kijai, implement Anisora V3...
> The new version supports arbitrary-frame inference, character 3D video generation, Video style transfer, Multimodal Guidance, Ultra-Low-Resolution Video Super-Resolution, delivering greater overall dynamics and more natural motion. The V3 model can generate 5 sec 360p video shot within 8 sec.
Anonymous No.106415017 >>106415141
>>106415000
>4x32
>Not 2x64
Anonymous No.106415021
>>106414929
Imagine gatekeeping a technology discussion based on GPU price. Next you'll be asking for income verification to post here.
Anonymous No.106415043 >>106415209
>>106414974
You're thinking of the LLM thread, aren't you.

This bread has been recommending 5090 and 4090 if you can afford them (because it doesn't actually assume you can afford the DGX setups).
Anonymous No.106415045 >>106415075
>>106414827
I'm just following the advice given in the model readme for wan 2.2 in diffusion-pipe. People are gatekeeping wan 2.2 training knowledge.
Don't ask an LLM for advice. I tried Grok 3, it just gave me a mishmash of bullshit no sane person would actually use.
Anonymous No.106415055 >>106415167
>>106414658
>But I guess knowing that there's something going on under the hood is painful to some people.
But that’s a disingenuous mischaracterization. There’s nothing painful, It’s not that I’m incapable of learning it, it’s that I have no reason to. I don’t need to know how or why the internals of a microwave work, it heats the food and that’s enough. This is ai gens with forge for me. What I’m saying is why does it bother you guys? Again, if I’m not anti comfy and am happy sharing space with it, why are you against me?
Anonymous No.106415065 >>106415099
>>106414929
I agree wholeheartedly. The minimum VRAM to post in /ldg shall be 48GB.
That's right, a 5090 is still a toy.
Anonymous No.106415068
>>106414929
The hobby shouldn't be gatekept by wallet size alone. Plenty of good work gets done on older hardware.
Anonymous No.106415075
>>106415045
thanks, it's been a year since i last trained loras.
Anonymous No.106415079 >>106415123
Anonymous No.106415099
>>106415065
Based. We need GPU model recognition like the flag recognition of /int/ fags
Anonymous No.106415123 >>106415131
>>106415079
Cat box?
Anonymous No.106415124 >>106415154
another beautiful day, /ldg/
what are some of your favorite camera angle/style prompts for generating 1girls?
Anonymous No.106415131
>>106415123
nta, but that's an old gen
Anonymous No.106415141
>>106415017
based, i totally went with this setup because its technically superior in ways i fully know and not because my piece of shit mobo is unable to recognize 4dimms at once
Anonymous No.106415154
>>106415124
>angle
male pov
>style
semi-realistic cartoons
Anonymous No.106415167 >>106415179 >>106415182 >>106415216
>>106415055
The purpose of /ldg/ is not genning images, but tinkertrooning and making up your own little benchmarks and repeatedly running them again and again.
Anonymous No.106415179 >>106415214 >>106415222
>>106415167
Well it’s the best of the three ai generals we have the adt one is gay and sdg is home to insufferable schzoid avatarfags to a much more extreme degree than this thread so it’s the closest to β€œhome” here. Can’t we all just get along lol
Anonymous No.106415182 >>106415216
>>106415167
Also. Discovering the workflows to achive true photo-realism.
Anonymous No.106415202
Strange seeing what appear to be anti-ldg posts here in the ldg thread.
Anonymous No.106415209 >>106415246
>>106415043
No, 3090s are the cheapest way to obtain 24GB while also having some levels of compute. 4090s are always over $1k and 5090s are, well, 5090s
Anonymous No.106415214
>>106415179
Please don't mention the anime one here. They're deserters.
Anonymous No.106415216 >>106415468
>>106415167
>>106415182
I feel personally attacked.
Anonymous No.106415222
>>106415179
>can't we all just get along lol
I'm afraid we have no choice. It's not like you can prevent anybody from posting anything they like.
The best you can do is try and hurt their feelings massively and hope they kill themself.
Anonymous No.106415245 >>106415365
pitiful attempts at sliding and restricting you from posting here
Anonymous No.106415246 >>106415335
>>106415209
The 4090 and 5090s are what diffusion and video models work/train better on and it's much the reason here >>106413420

If you have cheap offers for 3090s that you can trust in your used market, that's lucky/good for you and do what you want. Likewise if you get cheap 4090s or 5090s or -most unlikely- DGX, that's great.
Anonymous No.106415298 >>106415312 >>106415492
>>106413420
What's the source for this? I don't know what those numbers mean but they seem like bullshit. AMD cards are only like 2-3x worse, not 100x.
Anonymous No.106415310 >>106415327 >>106415392
The problem here isn't Comfy vs whatever UI or diffusion discussions or surrounding tech debates.

The real issue here is always been one: Vramlets.

If all of you who don't want to use Comfy had more VRAM, you wouldn't be complaining about how hard Comfy is. Downloading intricate workflows and nodes just to make your video gens 0.1 seconds faster.

If I had to unmask the AntiComfy movement, underneath would be a much simpler and childish problem: "Not enough VRAM"
Anonymous No.106415312
>>106415298
the source is quite literally in the image, anon
https://chimolog.co/bto-gpu-wan22-specs/
Anonymous No.106415327 >>106415355
>>106415310
Continuing, before talking bad about Comfy, for me to reply to (You). You will have to screencap your Vram specs.
Anonymous No.106415335 >>106415352
>>106415246
24 and 32 GB are fine for consuming and going light training, but 32 GB is not enough to train Wan or use it at fp16.
It's not like how it is for LLMs, where layers can be spread across GPUs. For image and video, you have to go big or go home.
Anonymous No.106415352
>>106415335
I'm confident that there will be soft solutions to these problems as we go forward.
Anonymous No.106415355 >>106415372
>>106415327
OK show 'em.
(yes of course anyone can fake it, but what a sad, shameful thing to stoop to)
Anonymous No.106415365
>>106415245
You cant force water and oil to mix.
You cant make a huge community get along and all post normally. Like someone posting their 1girl after 50 posts of Miku videos doing dumb shit 50 benchmark Chroma posts then 50 clips of 5 second hentai with Ash and Misty.
Anonymous No.106415372
>>106415355
wrong photo!
(Deka Fumo or bust, same deal)
Anonymous No.106415386 >>106415506
I will continue to post my SDXL kino and you will continue to seethe.
Anonymous No.106415390 >>106415405 >>106415437 >>106415657
The only thing I really dislike about the generals is some faggot posting the same prompt idea/base image over and over again until he gets it the way he wants it.
Why post failed attempts?
Anonymous No.106415392
>>106415310
>If all of you who don't want to use Comfy had more VRAM, you wouldn't be complaining about how hard Comfy is.
True, but you can make basically anything work by tinkering with some constants in comfy code, so it's just an issue of Comfy managing memory badly.
Anonymous No.106415405
>>106415390
>Why post failed attempts?
this is like asking why the wind blows or why the sun shines
Anonymous No.106415437
>>106415390
Only Wanlets do that. Never seen anyone posting 10 still images of a knight girl getting surprised by a Chroma goblin.
Anonymous No.106415468 >>106415497
>>106415216
You're a national treasure my friend. Agents will come visit you at your home. Soon.
Anonymous No.106415492
>>106415298
it's a high resolution gen test which takes a lot of VRAM which means there is something wrong with memory management on AMD which makes it spill over and tanks the performance even on the 24GB cards
Anonymous No.106415497
>>106415468
Please make them come after I've done the fairy/maid/ratgirl/oil/mixed plot for the pixel space Chroma model.
After that I can die in peace.
Anonymous No.106415506
>>106415386
I mean, post SD 1.5 if you want.
Only seething I did today was tossing 12 hours of Wan 2.2 LoRA training because I took some bad advice on settings and it wasn't learning, so I blew it out and started fresh.
Anonymous No.106415528 >>106415576
Ironically sd 1.5 is still the most versatile model we have.
Anonymous No.106415571
Why not just train SDXL with T5? Is some decade old text encoder essential to it or something? You can just yeet it from memory all together after processing the prompts, so even vramlets would be happy.
Anonymous No.106415576 >>106415623
>>106415528
Yeah because it didnt have any synthetic data in their training, fucking emad ruined it for everyone
Anonymous No.106415587 >>106415616
anything better than wan 2.2 at nsfw?
Anonymous No.106415611
can an autistic kind anon update the ldg wan 2.2 guide?
Anonymous No.106415616 >>106415622
>>106415587
Like specifically for 5 sec videos? I guess some 2.1 loras might work better for like 1-2 weeks.
Anonymous No.106415622
>>106415616
I suppose in general, looking to do some croomer stuff with i2v
Anonymous No.106415623 >>106415693
>>106415576
which synthetic data, images or captions? cause raw internet captions are terrible
Anonymous No.106415653
any good general wan 2.2 prompts for teasing or showing off kind of movement?
Anonymous No.106415657
>>106415390
Why post at all?
Anonymous No.106415664 >>106415697
What's the best solution for facial consistency in Wan2.2 atm?
Anonymous No.106415666 >>106415695
vram should never be an issue since you can just borrow someone's exposed A6000 comfyui instance
Anonymous No.106415693 >>106415780
>>106415623
it was trained with an unfiltered LAION database
Anonymous No.106415695 >>106415712
>>106415666
How do you discover them?
Anonymous No.106415697
>>106415664
just keep genning until it comes out ok
Anonymous No.106415712
>>106415695
there's apparently a website which scans ips for open comfyui ports
Anonymous No.106415723 >>106415734 >>106415736 >>106415758
> Three reasons why your WAN S2V generations might suck and how to avoid it.
https://www.reddit.com/r/StableDiffusion/comments/1n2gary/three_reasons_why_your_wan_s2v_generations_might/

>After some preliminary tests i concluded three things:

>Ditch the native Comfyui workflow. Seriously, it's not worth it. I spent half a day yesterday tweaking the workflow to achieve moderately satisfactory results. Improvement over a utter trash, but still. Just go for WanVideoWrapper. It works out of the box way better, at least until someone with big brain fixes the native. I alwas used native and this is my first time using the wrapper, but it seems to be the obligatory way to go.

>Speed up loras. They mutilate the Wan 2.2 and they also mutilate S2V. If you need character standing still yapping its mouth, then no problem, go for it. But if you need quality, and God forbid, some prompt adherence for movement, you have to ditch them. Of course your mileage may vary, it's only a day since release and i didn't test them extensively.

>You need a good prompt. Girl singing and dancing in the living room is not a good prompt. Include the genre of the song, atmosphere, how the character feels singing, exact movements you want to see, emotions, where the charcter is looking, how it moves its head, all that. Of course it won't work with speed up loras.

>Provided example is 576x800x737f unipc/beta 23steps.

Workflow
https://limewire.com/d/F2cTJ#gUyhGRrCSA

Example without workflow embedded
https://files.catbox.moe/rrg4z4.mp4
Anonymous No.106415734 >>106415768
>>106415723
>Speed up loras. They mutilate the Wan 2.2
Confirmed retard
Anonymous No.106415736
>>106415723
so be specific as possible with prompts?
Anonymous No.106415758
>>106415723
so tldr:
>comfy makes garbage default workflows, pretend they don't"t exist
>distill loras shit up the quality and movement
>stop being a retard
we already knew all this since 2.1. we don't need a redditor faggot parroting what we already know
Anonymous No.106415768
>>106415734
Because he got grammar wrong or the fact distillation mutilate the model. Because the latter is true that's basically what it's designed to do lol
Anonymous No.106415780
>>106415693
what was? 1.5? it was aesthetic filtered at least, not as bad as later versions that went stupid on the NSFW filter but still
regardless, a filtered dataset doesn't make it a synthetic dataset and raw internet captions are awful, DALL-E 3 did it right with the synthetic/raw caption mix