And There Will Be More Edition
Discussion of Free and Open Source Text-to-Image/Video Models
Prev:
>>105745833https://rentry.org/ldg-lazy-getting-started-guide
>UISwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassic
SD.Next: https://github.com/vladmandic/sdnext
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Models, LoRAs, & Upscalershttps://civitai.com
https://civitaiarchive.com
https://tensor.art
https://openmodeldb.info
>Cookhttps://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe
>WanX (video)Guide: https://rentry.org/wan21kjguide
https://github.com/Wan-Video/Wan2.1
>ChromaTraining: https://rentry.org/mvu52t46
>Illustrious1girl and beyond: https://rentry.org/comfyui_guide_1girl
Tag explorer: https://tagexplorer.github.io/
>MiscLocal Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Samplers: https://stable-diffusion-art.com/samplers/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage | https://rentry.org/ldgtemplate
>Neighborshttps://rentry.org/ldg-lazy-getting-started-guide#rentry-from-other-boards
>>>/aco/csdg>>>/b/degen>>>/b/celeb+ai>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
Blessed thread of frenship
file
md5: 8c48bc1cb501cde0ca5868325a401e51
๐
>>105748252I mean in general when it comes to NSFW loras for any model, there's always typically some that are clearly trained by a dude who only cares about one race of chick, and some trained by a dude who actually made effort to make it versatile in that regard. The most "biased" ones are usually "towards" Asian chicks
>>105748254still no neighbors list update
rude.
>>>/vp/napt has SOUL.>>105748241 (OP)baker chose some really horrible shit for the collage (again) haha
i DO hope its randomized
anyways,
SMELL YA LATER
the cartoon man on the left is standing apart from the cartoon man on the right. the man on the right is standing and looking to the right. the man on the left has a speech bubble saying "NO WAY, FAG".
these threads would've been much better if 4chan didn't strip metadata from png's
>>105748286>i DO hope its randomized they are never randomized otherwise you'd get random shit like screenshots and semi-NSFW pics which could get the thread nuked
someone hand picks each one individually every single time.
>>105748295>filter by checkpoint or prompt kino
Don't give in to temptation, anons!
>>105748295yeah, why does it even do that
Is ComfyUI-nunchaku worth it? Is 4fp quantization detrimental to the quality of diffusion? I'm on 12gb.
>>105748280>>105748252>most do have roast beefnot really interested in debating this w\
gross LDG posters on fuggin 4chan bwahaha
>>105748259in a few more months custom vagina loras will let you make as much arbys as you want
but pls, TRY to remember, its fake & gay.
<3
>>105748298so the baker is a spiteful faggot then
mental illness
>caring about the faggollage
>>105748322i think that guy is also your "schizo / janny"
what do you think?
>>105748324>being a schizoid
>>105748327the data supports it kek
>>105748314no
save up for a real gpu.
Anyone know why I have melty faces with Wan 2.1? I'm using the FusionX model, RifleX thing, and a single lora, and faces just kinda melt into goop. Sometimes they're "fine", but other times they completely shit the bed.
Disabling the loras doesn't seem to affect it much.
I know that the FusionX model can have issues with faces, but the rentry had me under the impression that features could change (like the face going off model), not necessarily turning faces into slime.
>>105748314its worth it bro
t. 3060 enjoyer
i posted speeds in last thread or before last thread
svdquant speeds up generation time:
RTX 3060 12GB @100W:
clip offloaded to cpu, flux in gpu fully:
100%|| 8/8 [00:24<00:00, 3.04s/it]
-total gen time:
after prompt change: 35 seconds
2nd gen: 25 seconds
clip device default, flux auto offload:
100%|| 8/8 [00:25<00:00, 3.15s/it]
-total gen time:
26s in both cases
@170w:
19s - 8 steps
29s - 20 steps
>>105748352Can you post an example?
that kontext clothing remover is nuts
like I get that you can do better manually but jfc you can feed a folder of photos to this thing and an hour later everyone is nude
it even seems to understand concepts like leaning over, fat vs skinny, front vs back vs side, can do entire crowds of people at once, and knows not to give men tits (but still gives them a big mangina)
world ainโt ready for this shit
>>105748352neg: changing face, warping face, ugly face, crooked teeth, ugly, asian
even kling will hardfuck a face 1\10 times anon
think of it like baseball, you cant "win" every gen but you try to "win" a lot
disable teacache if you can
remove the characters from the image. change the text from "hotel" to "LDG".
file
md5: e820aacde8753a29c1e767c158c63cb9
๐
kontext pixel art lora (with perfect grid alignment) first attempt is cooking..
>>105748356should I use int4 or fp4 models?
>>105748380Yeah, gimme a moment. Everything is NSFW, so I'll grab some shit off of danbooru, and try to get an example.
I'm thinking it's the FusionX model though, as the normal Wan model keeps things stable. Only problem is base Wan keeps shit really stiff and slow motion (even with the workflow in the rentry).
>>105748400I'll give those a try. As for TeaCache, I'm not using that. I'm using the LightX2V thing, so Teacache shouldn't be in that.
>think of it like baseball, you cant "win" every gen but you try to "win" a lot I know it's gacha, but it definitely seems more like something going wrong than faces just getting bad rolls.
>>105748304Chroma can't do this
file
md5: 2db78cb9d19493e11c44ef1f98f2239a
๐
>reveal her stomachokay man.. you win
>>105748416int4 on pre rtx 5000 cards
>>105748410replace the tiles in the picture with ice and snow.
>>105748412looks good
>>105748425I'm on 5070. I also noticed --fast degrades quality even in fp8 models for some reason. Anyway thanks, I'll give it a shot.
how many bakers do we have? excluding the migguman.
>>105748314it is absolutely worth it. you can finetune the speed at the cost of a quality loss with the cache threshold slider. 0.12 makes to go brr, 0.06 seems like a good spot. in general, there is a loss in quality, that's just how it is. the installation can be a huegpain tho, gl. but go for it - I get a 25 step gen in like 8 seconds.
>>105748437Not bad actually, at least for quick conceptualizing, you could be fooled into thinking this was a actual magazine screenshot of Earthbound
>>105748442yw anon, post speeds if u get it to work
i wonder how much worse my 3060 is compared to a 5070
>>105748417try vanilla wan2.1 and see if it fugs up
are you img2v or text2v
i have NEVER gotten good results with text2video
>>105748450shes pretty
>>105748437the image is in the style of a nintendo 8-bit videogame.
made the sprites less detailed, neat
>>105748422Can't do what?
>>105748442use fp4 model, it's more accurate than int4
>>105748470>try vanilla wan2.1 and see if it fugs upSee
>>105748417>I'm thinking it's the FusionX model though, as the normal Wan model keeps things stable.I'm doing i2v, by the way. t2v doesn't interest me at all. I don't see the point in fucking around with prompting the perfect scene/character/etc for the video on top of all the action when I can do that better with whatever image model I want and start with that.
>>105748482Elizabethan engravings featuring basic 4chan themes
>>105748437>looks goodthx. once i'm happy with results i will share. but might take a few days of experimental training because it's so slow.
okay, this is pretty interesting:
replace every character with Miku Hatsune.
neat that it can pick up on each sprite/character, and distinguish them from the buildings.
>>105748503>t2v doesn't interest me at all. I don't see the point in fucking around with promptingnta
a valid point
>>105748510replace every character with a pixel art version of Miku Hatsune.
>>105747995>https://files.catbox.moe/t775eh.pngdoes this work for anyone else?
>>105748507I'm pretty sure that gen was made with Chroma
>>105748503>t2v doesn't interest me at all.agreed, i don't really get the point of t2v and never even downloaded or tried the models
replace the red car in the middle with a teal color car driven by Miku Hatsune.
the model is very good at picking stuff up. could have done the red truck but didnt. although that got a color swap.
>>105748380>>105748400Fuck it, now it's "working" with the shit I was trying to use from danbooru as an example.
Here's one of my fucked outputs.
It's NSFW so head's up before you click.
https://files.catbox.moe/ysdpof.webm
change the location to Akihabara, Tokyo in pixel art style. keep all the characters in the same location.
cool.
>>105748540what really cooks my noodle is SOME of the lora will work with both img2v and t2v simultaneously (despite being trained for specifically one or the other)
if i had a stronger pc build i would try to train as well...
>>105748591and it gets even better.
change the location to a convenience store in the desert, in pixel art style. outside the store is a sign that says "SNEEDS FEED AND SEED". keep all the characters in the same location.
>>105748580the warble in the face is what happens on my machine if i turn teacache up too high (usually 1.5\1.75x)
for simpler cartoon subjects i can crank to max and usually have no issues
the pubes were gross kek
>>105748599>if i had a stronger pc build i would try to train as well...rent an A40 on runpod! it's $0.40 per hour
i have two 3090s at home but do all my training that way especially in the summer it doesn't heat up your house
>>105748609nice to see another pixel art enjoyer
>>105748609s a v e d
i plan on getting banned from \vr\ with this image kek
the jannies there are such fuggin dicks
there we go
change the location to a convenience store in the desert, in pixel art style. outside the store is a sign that says "SNEEDS FEED AND SEED". Below that sign is a sign that says "(formerly chucks)". keep all the characters in the same location.
>>105748422i had to censor her vajayjay but yeah it can lol
>>105748627i'll look into it
if i can get around 30-40 videos cut\cropped up properly
i can go public and release a team rocket rainbow hair wanvideo lora :)
>>105748641regular Flux even kinda sorta gets closeish for that matter on the exact same promp
>>105748623I'm not using teacache though. I'm using the LightX2v lora/NAG self attention shit (at 0.6 str as mentioned in the Rentry when using the RifleX model).
>the pubes were gross kekFair enough, it's just an example of the fucked face. Though to be fair I kinda figure Power would have much more going on down there.
Change the location to the surface of the moon, in pixel art style. Keep all the characters in the same location. In the background is the Earth in pixel art style.
kontext is really good at detecting elements based on prompts and also transformations/styles, pretty fun
>>105748641>>105748660Original pic in question was made with Chroma though. Flux nor Kontext know the style, and if you're editing with Kontext it won't gen sex toys or panties flash.
The image is projected on a CRT TV screen in a man's bedroom. A man wearing a suit is holding a SNES controller connected to a game console, and looking at the screen.
>>105748692chroma so good ("sad but true" from metallica slowly cueing in. fuck I hate metallica). I mean glad we have it.
>>105748464I think I've got it working. Seems to look ok, and almost twice as fast as the fp8 checkpoint, nice.
https://imgsli.com/MzkzNTYw
>>105748775speed? can you post the full workflow so i can do a test, i wanna compare int4 vs fp4
>>105748241 (OP)more konoha chads!? spoon feed me
promotions...denied.
A man is folding his arms and looks upset. the background is white.
>>105748810my promotion...gone...
>>105748787https://files.catbox.moe/bcv0uh.png
32 seconds, about 1.25it/s @150w
lora: https://civitai.com/models/721039/retro-anime-flux-style
the green cartoon frog is sitting in a bean bag chair and watching TV in his bedroom, wearing a red shirt and blue shorts. On the TV is a sunny beach.
>>105748308>yeah, why does it even do thatTerrorists could communicate with image metadata.
>>105748991the green cartoon frog is sitting in a bean bag chair and watching TV in his bedroom, wearing a red shirt and blue shorts. On the TV is a sunny beach. keep the same expression.
there we go, now it kept the img source face.
>>105749003lawnmower toes
>>105748521That's more like it, pretty cool
>>105749023ok, now it's better.
>>105749042except his neck was bad
now it's decent.
>>1057490321024px, training on rented gpu since 24G isn't enough for that
>>105749042>>105749049>>105749003>>1057489914 pepes and NONE of them are using crt but laggy flatpanels
how is he supposed to play vidya like that???????
What you mean you deleted my 10 TB folder of nswf ai gens.
>>105749075nice. openpose?
>>105749099Chroma spamming gens until it hit it right
>>105748810>upsetLooks like a movie villain.
>>105748352The MPS Reward LoRA merged into FusionX has a known problem with changing facial appearance. You should probably be using Lightx2v instead.
>>105748422>>105748507kek, the first stage is denial...
what's the best way to remove mosaic from hgames
>>105749147if you post pony 1girls the schizoid will be mad at you too :c
>>105748412Nice, but perfect pixel alignment ? I mean even if you train on high resolution non filter scaled pixel graphics, Flux will interpolate when generating as far as I know ?
I've never come across a pixel lora which didn't need nearest neighbor scaling on the results.
>>105749231I'm not to fond of the very high contrast in these images, but they have a very non-ai look to them, well done.
>>105749163>kek, the first stage is denial...That's not true!
>still waiting for the based China man to release radial attention
We WILL escape 5 second hell
>>105749147Well heil there, how you doing frรคulein ?
>>105749246/LDG/ ladies & gentlemen
change the screen to show whatever horrible abomination the rest of the users(shitters) come up with here and its 1:1 with actual reality
>>105749239ill try some muted ones next
file
md5: 211584eb972bf9e10cef89736536bc56
๐
>>105749188it's cause people don't know how to train them. you have to have your pixels perfectly aligned and all same scaling (common is X4 or X8)
you can't just shove a bunch of randomly scaled pixel art into the training dataset, and you also have to disable bucketing or rescaling during training
yes there will be some minimal noise from the VAE but if you trained it well it means after rescale the result will be almost identical to the gen
>>105749246Chroma really doesn't want to output glowing brakes...
>>105749062Where are you training? I have some soon-to-expire Colab credits that I'd happily burn on a few Kontext loras
>>105749322>>105749246Oops didn't mean to quote
>>105749312picrel after 0.25x and 4x nearest, almost identical to the gen
>>105749312Good stuff, sounds like you know what you're doing!
grid
md5: f501fa0f27a96a9d5683f580a82fb04a
๐
>>105749239left is
>pos: muted color, pale color, flat color >neg: saturated, colorful, neon paletteim sure i can find some more tags to push it further
>>105749367What model is this ? The style reminds me of Shigenori Soejima of Persona fame
>>105749326on runpod. i can afford it easily the more difficult part is dataset preparation, because flux controlnets are so dogshit. this is the first test method and i did img2img + depth controlnet with some loras i trained in the past but if i controlnet too low it changes the image too much and too high it fries the image.
chicken and egg problem, i need good examples to train this task, which don't exist
>>105749390https://huggingface.co/Laxhar/noobai-XL-1.0
Chroma's lack of knowledge of even the most basic anime-adjacent characters like vocaloids is driving me fucking crazy. I hope the chinks will try to have their way with it and train it properly. This is suppossed to be Kagamine Rin
file
md5: 9fa86bfa1ac5e74da6036a193cdad8a5
๐
>>105749413random example from my training dataset which i painstakingly seedhunted. this is really the worst part of it.
Are there any LLMs made specifically for image2prompt? Bonus points for being able to run within comfy.
>>105749185Funny enough, that one was made with an illustrious realism model that is so shit fucked that it's pretty much pony but with exceptionally worse prompt adherence.
784
md5: e81575c4b54da2108dc6b983be29f85c
๐
wow truly 10/10 tool
>>105749584>generated caption:this is a image submitted by a frogposter devoid of any artistic merit and has been created using (now depreciated) Ai image generation technology, it should not be prompted for by anyone, ever again, and may God have mercy on your soul
>>105749584Why not joycaption?
>>1057495392009 tumblr called they said "reblogged"
>>105749624Oh that worked. Thanks.
has anyone used musubi-tuner to make a lora for chroma here? if so, do you mind sharing your configuration
Kek I tried one more i2p tool and it just fucking died. 5 minutes btw.
>>105748511Are you using a film lora?
>>105748511>britney spears face
we got anything that uncensor kontext yet?
mini
md5: 7272d1583a3e388b39c7e96aab1dd627
๐
>>105749511you want an LLM with "vision" capabilities. you can go the ollama route or you can use joy caption, both have comfyui nodes. I had decent success with minicpm-v (5.5gig) but it's very basic, doesn't know a thing about artists. it's ok tho for upscaling when you can't be bothered to write a prompt. joycaption is a whole lot better esp. for niche/nsfw content, 15ish gb tho.
and high contrast in the negs but still not as faded as i want
>>105749702how 9secs? do you have image prompt?
>>105749756images from scratch? looks cool
>>105749702imagine the smell
>>105749809censored to hell
So what's the best shit for generating high fantasy portraits and landscapes? It used to be Fooocus. Is it still?
>>105749816if you find an uncensored vison-capable llm that isnt called joy caption let me know lol
>>105749809If I wanted to incorporate this into existing workflow, is there a way to unload the LLM after it is done to not hog VRAM for the image model?
>>105749822any gemma 3 abliteration
the problem is that abliteration makes the models retarded
>>105749822Try the WD14 image tagger. Ollama sucks ass.
>>105749777https://files.catbox.moe/bxjvug.png
to prompt for longer vids just increase the generated frames. the looping problem doesnt affect most actions.
>>105749809>artistdont upset him now he finally left ;3
>>105749756Lowering the contrast seems to have removed the 'fringing' in the outlines, I prefer this but it's all subjective.
Hey, got a question. I'm using Forge and diving into ControlNet.
The preprocessors are good to go, but I still need to hunt down the models.
I've got some NoobAI models, but I'm after the SDXL ones. I saw a CIVITAI link with 50 ControlNet models featuring the same ballet ballerina image, but downloading all 50 nameless models feels off.
Is there a cleaner source for these?
Another thing, do any depth v2, gold depth, or similar models work in Forge?
What's your take on Forge?
If I want to tackle more 'complicated' projects, is the UI solid?
>>105749584don't waste time with those model, use gemini or chatgpt4o or claude it has better image vision that those 8b llama finetuned
>>105749826comfyui ollama, no, because you load the model with ollama via cmd (but maybe there is a way..?), joy caption comfyui, yes. got an 'unload model after you are done' option.
>>105749850yeah I hate ollama too. right I have WD14 on my other comfy install. no idea what it does with non anime stuff tho
>>105749894>https://huggingface.co/xinsir/controlnet-union-sdxl-1.0/tree/mainDownload the promax union model so you don't have to download a bunch of controlnet models individually.
>>105749850It works ok. It can't recognize styles. This pic is a direct feed of wd14 into an image generator. I recommend taking the wd14 output and pruning it, then adding your own style prompts.
https://files.catbox.moe/815btc.jpg
I tried asking ollama about freemasonry and it completely hallucinated the founder with a made up name. It also said Albert Pike had nothing to do with Freemasonry, then when pressed explained how involved he was.
>>105749931for
>>105749914
>>105749511The best one is not local, it's Gemini (uncensored and free from API)
You can easily hook it up using ComfyUI, just modify the code any node connecting to API to connect to Gemini (and use the appropriate token)
file
md5: 59377b28c801558f40a534f65814a010
๐
>>105749511https://github.com/pythongosssss/ComfyUI-WD14-Tagger
>>105750040for booru tags?
>>105749919Thank you! I saw this before but thought it was for ComfiUI, not for Forge.
Iโll give it a shot. Do I need two Python dependencies? One for Forge and another for this? Thanks again!
>>105742711>>105742655>it worksHell yeah time to make some abominations
yep
md5: 146b3a6137a23bbce9f7b646972ec796
๐
>>105749969ok I just made a key and signed up and shit and wow. is there a weekly/daily/monthly token limit?
>>105750125>>105749809funny to see how gemini outputs this overly-analytical wall of text, in comparison to how simple the prompt was:
>A black and white engraving print by English satirical engraver and cartoonist William Hogarth in the year 1600.>A 35 year old prostitute woman with large breasts is sitting on stairs in an alley looking at viewer naughtily wearing a dress with short skirt. She's lifting own skirt to show her white silk panties. Fleshlights, sex toys, and dildos litter the stairs around her.>The background is Gooner Lane in London, an alleyway notorious for prostitution and alcoholism. A ruined slum is visible in the distance. Fine text at top says "Gooner Lane".>>105748641what style prompt did you use?
I find I get best results by researching specific artists on wikipedia and then following this formula:
>media used>school of art/artistic movement>artist name>era>qualities, style details, etcI've started learning more art history just to try and find good styles that chroma recognizes.
how do you pronounce "gguf" ?
I call it double G oof
>>105750125>is there a weekly/daily/monthly token limitYou can see the limits per day and rate when you hover over the models on AI studio. It is mostly free.
>>105750202Yes, well one could prompt Gemini to condense the info into a paragraph and it nails it too.
>>105748641lmao
>>105750125I wonder how that prompt style works when fed into flux. SD images come out basically the same with WD14 prompter, because it overloads it so much, it just werks.
>>105750126French woman? Looks like Eiffel tower in the background
>>105750252Does flux use asterisks in it's syntax? It should be fine probably.
>>105750202here, just a little vision/prompt enhance thing. I'm impressed. also, that gen is super cool. is that william hogarth again?
>>105750252even flux has it's limits lol but here, condensed. flux and chroma can work with that np. I mean you can feed t5xxl novels but that's pointless.
>>105750276IDK I just started messing with it because someone made a styx lora, and it's only available on flux.
>>105750289>use british english.
reminder you can use a quick image stitch to get two images to interact without two image sources in a workflow:
man on left (does action) with man on right, etc. anime girl, object, whatever, just identify each and it works.
Anyone have a workflow for chroma detail calibrated? It's kind of producing results much worse than when I tried v30.
Flux context is perfect in removing black bars censorship in hentais. But it completely screws up against pixelation - too bad.
file
md5: 273d6d1197fab46ea9294e476fe71ffc
๐
>>105750423Based. You can use a node like this one so you can do it right on your workflow.
>>105750423but what is the advantage? quicker processing?
>>105749865REEEEEEEEEEE!!!!!
>>105750445ah nice, also you can use queue selected nodes to get that to update without doing the whole workflow.
>>105750465two image source workflow is 2x speed I think, this is default speed, either is fine but I just wanted to see if it works, it does
kek, it works well desu
the man on the left is holding a tall body pillow with an image of the woman on the right.
just took my image concatenate output and tossed it in my img input. this is nice cause I dont even need photoshop to stitch it fast, this is faster.
used image stitch output to show you input vs output:
>>105750401>British "people"
the man on the left is holding a tall white body pillow with an image of the woman on the right on it.
same process
>>105750423post links btw for the flux image stitching comfyui workflow
>>105750117>>105750465Flexibility in what you can do with AI. It might be easier to gen 2 images then try stitching them together than try one regional prompt.
mind you, you can get any objects to interact this way, it doesnt have to be waifus. you could put an image on a vase or painting for example. but if kontext doesnt know a character, you can use this as a workaround for no lora, if there isnt one. want (anime character), just use them as a source.
same process, but with cinderella (nikke)
>>105750566it's the default kontext workflow plus
>>105750445for quick stitching
https://docs.comfy.org/tutorials/flux/flux-1-kontext-dev#flux-1-kontext-dev-basic-workflow
the anime girl on the right is holding a portrait of the cartoon frog on the left. keep their expressions the same.
even with a cropped image of the girl it did well:
>>105750652and the default output:
>>105750530add rainbow hair
add R on clothes
become slutty
you are rocketnow
Is there a sampler/scheduler combo for the cfg1 Chroma that helps with the baked images or do I just have to deal with it?
the cartoon frog on the left is holding a tall white body pillow with an image of the anime girl on the right. keep their expressions the same.
>>105750570Can it mimic art styles? Like, "draw the character on the left in the art style of the right"
>>105750663>become sluttyhot
>>105750717I think it's primarily for interactions but not sure will have to test more
someone try if it can copy tattoos from one body to another
>>105750401Last british gen
>>105750764Inpainting is probably still better. Completely guessing.
the anime girl on the left is holding a magazine with an image of the anime girl on the right on the cover. The title of the magazine is "LDG".
>drag official comfyui vace v2v mp4 to the webui
>doesn't load the workflow
>already the latest comfyui version
help
>>105750775works on my machine. redownload file, restart comfyui, reboot pc.
https://docs.comfy.org/tutorials/video/wan/vace#1-workflow-download-2
>>105750678how many have you tried? was gonna do the deed and do some grids but we're in the middle of a heatwave. there is a rescale cfg node in the comfy core, maybe try that?
>>105750775works on my machine. redownload file, restart comfyui, reboot pc.
https://docs.comfy.org/tutorials/video/wan/vace#vace-video-to-video-workflow
>>105750802I did all that
>>105750774make one of Asuka reading a book saying "#1 Waifu /a/ward" with pic of Rei
>>105750799I'm just randomly trying shit. But it also gets mixed with the chroma artifacts themselves so no clue.
>cfg rescaleShould I even use that if the cfg is 1? I'll try that in the standard chroma tho.
on average how much time do anon spend on inpainting?
just curious
>>105750821its a bit tricky but simple enough to make asuka with a blank book without a stitch prompt:
>>105750912then a simple shoop does it:
>>105750906days, weeks of my life gone. sometimes more than an hour per gen. dozens of gens per set. it's fun! ..
>>105750880was just an idea, would be nice if it works. I only ran chroma cfg1 once last night and some gens were close to being baked, yeah. euler seemed the least problematic
>>105750906I don't inpaint lmao. I press generate and get 1500x2000 big booba images
and kontext makes it easy to make this stuff:
anime girl is holding a blank white painting with a black frame.
>>105750948or
anime girl is holding a blank white painting with a black frame. keep her expression the same.
one more with this img:
the anime girl is wearing a black business suit. keep her expression the same.
>>105750938I can't imagine that. My 4090 can do kontext gens in 29 seconds. Regular forge gens in less than 10. And the comfyui 5 second video gens in only like 3 mins
>>105750938>sometimes more than an hour per genNigga what are you doing?
>>105750962Fucker. Us Linux + AMD users are the most oppressed. I'd generate an image but I'm messing with i2v again.
>>105750938>>105750963He's got to be overutilizing vram. Going over 90-95% turns GPU based processing into hanging on CPU processing.
is there a big quality difference between vace 1.3b and 14b?
the anime girl on the left with blue hair is waving hello to the cartoon frog on the right. they are standing on a sunny beach. the cartoon frog is wearing a red shirt and blue shorts. keep their expressions the same.
left image is the stitch/concatenate source from adding 2 images.
Hmm, this frame seems better.
>>105750979excuse the potato quality
>>105750979I can't wait for the day a kontext-like model/workflow thats not lobotomized and censored is released.
>>105750962just checked, one inpaint takes 4 secs, hyper lora-ed sdxl on a 3090. I just want things to be perfect and I don't always find the exit. I know, its silly, but w/e. usually tho, a few mins per gen including krita stuff. bla
>>105749147>You should probably be using Lightx2v instead.I am. The rentry says FusionX should be a "drop-in replacement" for the default Wan Model when using the Lightx2v workflow, and mentions that Lightx2v should be dropped to 0.6 strength.
I have it set up like so.
FusionX gguf> Lightx2v (@0.6str)> General loras> PatchModelOrder> TorchCompileModelWanVideo> WanVideoNAG > Apply RifleXRoPE WanVideo
Here's a screenshot of the main bit of the workflow. Maybe someone can glean something from it. Not pictured are the video combine nodes.
>>105750992the clothes remover lora is surprisingly effective and thats a day 1 lora, we can fix all the censorship bs, the model itself knows how to pose characters/people or do diff stuff
the cartoon man with a white face on the left is standing beside the cartoon frog on the right. they are standing on a sunny beach. the cartoon frog is wearing a red shirt and blue shorts. keep their expressions the same.
kek, if I didnt say white face it made it a generic guy
>>105750906i don't really inpaint i just slap the text and logos i want on there with krita then run that bitch back thru noob-inpaint to blend it
i spend more time erasing fucking extra fingers than i do anything else probably
nothing more infuriating than getting exactly what you want but a hand is a little messed up or some shit
>>105750979This is really amazing. Flux is pretty shitty at doing raw gens, but feed it stuff and it can work with it very well. Recognizing the style, positioning, and filling in the details. Great work!
>>105751009ai still struggles with hands in 2025?
>>105750906I haven't done manual inpainting in ages.
the cartoon man with a white face on the left is swimming in the ocean near the cartoon frog on the right who is on a fishing boat. the cartoon frog is wearing a red shirt and blue shorts. a red cooler with beers is at the front of the boat. keep their expressions the same.
you could theoretically generate 1 billion pepes a day with this model with random text.
>>105750996it's the fusionX model, nigga
use the regular Q6 wan2.1 instead. Then manually add whatever loras fusionX has in its merge. Remove one by one
>>105750994I'm curious how much faster a 5090 is over a 4090. Anyone out there do a comparison? I always get demoralized when searching AI advice or tutorials because 100% of them are made by in comprehension me Indians. The stereotype is so true lmao.
the cartoon man with a white face on the left is wearing plate armor and holding a sword and shield, near the cartoon frog on the right who is holding a spear and a shield. the cartoon frog is wearing a red shirt and blue shorts. they are standing in a grass field in the medieval era. keep their expressions the same.
>>105751027You can use comfyui dynamic prompts.
https://github.com/adieyal/comfyui-dynamicprompts
You can set them to increment through each section of your dynamic prompts to make repeatable imagesets. You run into the same problem with early attempts of using text to image, you lose coherency through frames.
(warning porn)
https://files.catbox.moe/hocp44.jpg
>>105751034Wtf I just got promoted
>>105751031Yeah I know that.
>>105748417>I'm thinking it's the FusionX model though, as the normal Wan model keeps things stable. Only problem is base Wan keeps shit really stiff and slow motion (even with the workflow in the rentry).Regular Wan outputs slow motion shit even with RifleX.
>use the regular Q6 wan2.1 instead. Then manually add whatever loras fusionX has in its merge. Remove one by oneThat's what I'm doing now.
the cartoon man with a white face on the left is wearing plate armor and is kneeling on the floor, near the cartoon frog on the right who is sitting on a throne and wearing a gold crown and red robe. they are standing in a throne room in a medieval castle. keep their expressions the same.
not even the flux pepe lora was this effective.
>>105751045and it's just using image stitch node as a source (generate the 3 nodes then drop it in your image source)
oc
md5: 675a9a7ce86e2a75e78d5caccdb135b0
๐
>>105751063Little bit too banana shaped for me
>>105751032techpowerup claims about a 30% performance increase.
FP64 1200 GFLOPS vs 1600 GFLOPS
https://www.techpowerup.com/gpu-specs/geforce-rtx-4090.c3889
https://www.techpowerup.com/gpu-specs/geforce-rtx-5090.c4216
>>105751039I specifically used the combinatorial prompts node, control after generate: increment, autorefresh: yes, and structured it like this. It tells the node to go through them 1 at a time, adding each of the tag sets to your base prompt. The only problem is you need to close the tab or something, because cancelling leaves it at the last seed increment.
@{tags1,|
tags2,|
tags3,|
tags4,|}
>>105751063OUHHHH SAG EROTIC
>>105751016it still makes mistakes at odd angles
an open palm or fist is usually fine
a "cute" hand pose with the fingers splayed is also good, but when the fingers start to occlude each other like picrel it tends to forget to draw the visible knuckles for the occluded fingers and starts blending them together instead. stuff like holding a cigarette or a pencil still seems to give it trouble.
i usually throw "4 fingers 1 thumb" in the prompt but it's not entirely foolproof.
one more, with a diff image + stitch
the anime girl on the left wearing a black swimsuit and black blindfold is standing beside the cartoon frog on the right. they are standing on a sunny beach. the cartoon frog is wearing a red shirt and blue shorts. keep their expressions the same.
cute!
keanu
md5: 6dde151fa02d3c881ebca35d312de17e
๐
Can someone with kontext try getting the jewelry/watch/grill from this image onto a new character, I've had really good results with kontext otherwise.
>>105751085added tall girl and short frog to change the proportions, works:
>>105751103I'd try inpainting. That's too low quality. You're better off finding hires grill/watch, editing onto a character, then inpaint.
>>105751069Thats just numbers on a chart by brown people "writing" tech articles though. I mean real world timer on gen time in forge and comfy etc between 4090 and 5090
>>105751116yeah I figured, was just trying to see how far kontext could really go
replace the newspaper the green cartoon frog is reading with a white book. The book has the text "LDG lewd outputs" with a picture of a blonde anime girl below it, on the cover.
>>105751137Stop reading Cunny Limited in public that's not allowed
is there a way to do this kontext imageconcat to replace one character with another from a different image? for example: replace 2b here
>>105751108 with asuka from
>>105750957
>>105751142stitch image node, then say "replace the white hair anime girl on the left with the red hair anime girl on the right"
might work
>>105751063I love it how it sometimes showcases spatial awareness as good as Wan's despite it being an image-only model
>>105751156I wish it didn't make them Cross-eyed so often
remove the black hat of the white cartoon man on the left and replace it with a sombrero.
>>105751169remove the black hat of the white cartoon man on the left and replace it with a sombrero. give the white cartoon man a curly moustache. change the coffee cup to a beer bottle.
what a neat model.
>>105751173also, added "keep the expression the same."
retains the face.
>>105751173replace the green cartoon frog with a slim anime version of Miku Hatsune.
unlimited potential with this model for edits.
>>105749796scratch?
>>105749883unsure what you mean by fringing but i agree low contrast fits the style more
>>105751117It's a real performance metric though. Anything else and you need someone who owns both to test.
>>105751196KEK. Nice gens. It's pretty good at keeping the style.
change the text from "Exodia The Forbidden One" to "Miku Hatsune The Chosen One". Replace the image in the center with an image of Miku Hatsune who is smiling.
>>105751222also kontext is an amazing tool for duping text for fonts that seemingly dont exist online, or are impossible to figure out.
>>105751232change the text from "NIKKE" to "LDG". Change the text from "goddess of victory" to "image gen general".
>>105751244change the anime girl's hair color to blonde. remove her hat.
>>105751040Alright, so it seems like it's something fucked with the workflow in the rentry... and/or my tweaks to it. The Workflow included with the FusionX "lora" (literally just each merged lora loaded one by one), doesn't have the face issue.
>>105751256change the anime girl's hair color to red. remove her hat. she is facing the camera directly.
so many edit possibilities and things you can do.
>>105751269>so many edit possibilities and things you can doI still can't get it to make goatse wear pants. Is it just a prompt skill issue or does it have no idea what's going on?
>>105751269That's a completely different anime girl though.
>>105751275I didnt include "keep the same expression" which is important, like keeping all the pepe faces the same. there is a lot of flexibility.
>>105751276Nah, I mean the hairstyle and length is completely different, her outfit is only vaguely similar, and her eye color is different as well.
>>105751273I'd manually paint over his asshole, or photoshop him holding an orange. Great test lol.
>>105751287the model does much better with a full body reference, otherwise it has to guess what their figure is like or whatever.
ie: the anime girl is sitting in a beach chair reading a book. she is wearing the same swimsuit. keep her expression the same.
>>105751344KEK. Also it did very good. Shit is going to get scary when this software proliferates. Everything online is going to be fake.
>>105751351Normalfags still don't know about this stuff and Indians are still stuck using very limited free models because they're poor and brown. Its a golden age of autists with expensive gear doing whatever they want like how the whole internet was until the late 2000s. It won't last. First time some boomer spams a bunch of AI CP everywhere the free wheeling Is over.
Lands of Lore 1 lora, a little over halfway through training
the anime girl is standing on a sandy beach with an ocean behind her and palm trees nearby. she is wearing the same swimsuit and holding a tropical drink. she is smiling.
pretty good considering anis is so thick you couldnt see her bottoms in the source image so the model had to guess.
>>105751370better booba, other one had artifacts
>>105751359>First time some glowie spams a bunch of AI CP everywhere the free wheeling Is over.ftfy
>>105751265The FusionX Recipe Workflow uses a different weight for the MPS Reward LoRA because of complaints about it fucking up faces in i2v. The GGUF and LoRA versions of FusionX haven't been re-merged with the lower weight.
>>105751359well it's like roop/reactor. people could make deepfakes with it, but as long as people arent retarded and spam it everywhere, things will be fine.
>>105751359Yup. I'd prefer normies get off the internet, than it getting locked down, which will happen anyways.
>>105751385No she just has quad boobs because she is superior.
>>105751332I drew some clothes over him, changed the prompt a little, hope it works
stay hydrated ldg!
the anime girl is standing on a sandy beach with an ocean behind her and palm trees nearby. she has large breasts and is wearing the same swimsuit and drinking a bottle of water. she is smiling.
emphasizing large breasts seems to keep them gacha-tier.
the anime girl is standing on a sandy beach with an ocean behind her and palm trees nearby. she has large breasts and is wearing the same swimsuit and drinking a bottle of water. a blonde anime woman with small breasts nearby is looking down at the ground, dejected.
kek
>>105751429the blonde anime girl on the bottom right is holding a white sign that says "IT'S OVER" in scribbled font.
what a fun model. all off a swimsuit photo.
>>105751395Good to know. That being said, even when using base Wan and loading the models in sequence with the weights used in the workflow, still results in fucked up faces. Same issue with MPS turned off.
Maybe CausVid and AccVid are causing issues with Lightx2v and the NAG shit, even thought Light is turned to 0.6
>>105751359All it takes is one malicious retard doing it to poison the well, or an especially spiteful retard pissed at AI to learn how the shit works to spam some celeb's twitter with genned porn.
>>105751450the anime girl on the left is holding two blue milk containers with the text "MILK" on the container, with both hands.
>>105751395>The FusionX Recipe Workflowwhere? In the rentry?
>>105751480No, on the page for FusionX.
https://civitai.com/models/1690979
>>105751458AI takes the surveillance state to its natural conclusion. Embedded hardware IDs in your DSLR, PC, phone, and all files you use are auto-tagged with who made them, modified them, when, and how. A digital ID to access the "official" internet, while the "unofficial" has as many taylor swift furry gangbang compilations you want.
>>105751471remove the anime girl on the bottom right. the anime girl on the left turns away 180 degrees facing away from the camera.
actually impressive, wan could rotate stuff too
>>105751488so which lora actually caused face shift?
file
md5: a1e1231c1d6734ee8a4afc5b69b7ff49
๐
>>105748529Prompt info is right there if that's what you're asking.
>>105751521MPS Reward is generally regarded as the one that shifts faces. It's supposed to do other "useful" shit with motion, but the drawback is the face thing.
That's not MY face issue, per-se, but for other anons it is.