Discussion of Free and Open Source Text-to-Image/Video Models
Prev:
>>105748241https://rentry.org/ldg-lazy-getting-started-guide
>UISwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassic
SD.Next: https://github.com/vladmandic/sdnext
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Models, LoRAs, & Upscalershttps://civitai.com
https://civitaiarchive.com
https://tensor.art
https://openmodeldb.info
>Cookhttps://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe
>WanX (video)Guide: https://rentry.org/wan21kjguide
https://github.com/Wan-Video/Wan2.1
>ChromaTraining: https://rentry.org/mvu52t46
>Illustrious1girl and beyond: https://rentry.org/comfyui_guide_1girl
Tag explorer: https://tagexplorer.github.io/
>MiscLocal Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Samplers: https://stable-diffusion-art.com/samplers/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage | https://rentry.org/ldgtemplate
>Neighborshttps://rentry.org/ldg-lazy-getting-started-guide#rentry-from-other-boards
>>>/aco/csdg>>>/b/degen>>>/b/celeb+ai>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
I'm lazy what node do I use to automatically concat the input and output images?
suspicious collage of api nodes and the naughty arts
I haven't been able to get this the way I wanted, so I'm gonna post the sketch and hope someone here can get it right. Going to bed soon, good night you spergs
>>105751566KJNodes > Image Concatenate
>>105751613fuck yeah thanks anon
the anime girl is standing and holding a book that says "LDG posting for dummies" in scribbled font.
>>105751659nice artifacts in the background sis
>>105751664the output wasnt the same size as the input so it did some outpainting, yeah.
the man with a cigarette is reading a book with the text "why sequels suck", a picture of the joker from batman is below the text.
so close.
>>105751697Have you tried loras yet?
>>105751706only tested the clothes remover one, it works
file
md5: 03e7bad29ca9f5012be3121a21423668
🔍
>>105751712stock image places BTFO
>>imgur.com/a/flux-kontext-dev-vs-pro-vs-max-uBazRlo
ohnonono localsisters. We are cucked
>>105751734>BFL betray :O
>>105751712Now to automate this without quality loss. Imagine getting stock datasets without any watermarks
>>105751734Yeah they changed the naming convention. I'm positive pro should've been dev version, while dev schnell. BFL's models are cucked AF now, and they only work as intended via API which is sad.
>>105751712so what about quality loss? does it leave the rest of the image in that example untouched? what is the max size for the input? this is obviously REALLY BAD (t. man from gettyimages and basically every big photographer working for a large agency lulz)
the man with a cigarette is reading a book with the text "why sequels suck" on the front cover.
was easier to generate a blank book, then prompt the text on the book.
the man is holding a blank white book and is upset.
put the text "Joker 2 script (with rape)" in black text in a scribbled font, on the white book.
>>105751712you can do it with any LLM cleaner , but this probably better since you can mass remove watermark, instead of marking it one by one
the anime character in the image is in a vegas casino playing blackjack. he is wearing a brown bomber jacket, white polo shirt and blue jeans. keep the expression the same.
zawa...
so don't laugh at me.... Well, I guess you can, but I'm still learnin i2v shit.
Turns out, my fucking Wan face problems was the fucking resolution.
832x480 works just fine, 640x480 and things start getting mushy, 480x480 and things just shit the fucking bed. As far as I'm aware, you need to keep one of the values close to 480, so I'm not entirely sure how I'd get a 1x1 ratio video. 540x540? Like am I aiming for the highest pixel count within the 832x480 range that still has the ratio I want?
Also, how would I go about upscaling the videos with the workflow in the Rentry? From what I understand, you'd do that before you interpolate so you're interpolating the large images instead of upscaling the entire video.
>>105751831>Turns out, my fucking Wan face problems was the fucking resolution.https://rentry.org/wan21kjguide#supported-resolutions
the anime character in the image is in a vegas casino playing blackjack on a green blackjack table. he is wearing a black dress shirt, and blue jeans. keep the expression the same.
it can even colorize images, neat
>>105751831I do 640x480 for faster gens, with q8 and multigpu node for extra vram if needed it works fine. 832x480 is best but slightly longer. use the light2x lora as it's EXTREMELY faster with it. rentry in OP should have it/workflow.
>>105751841I've been using those for the most part, so I'm aware of that. However that's kind of limited for most compositions and figured there'd be a little leeway as long as things are kept mostly around 480p.
>>105751844Been doing all that too.
with the anime image, also good result:
>>105751773>>105751755>so what about quality loss? It's obviously not perfect, but for stuff like lora training is mostly fine, as long as you mix other pictures in there so that the training won't pick on these too much.
>does it leave the rest of the image in that example untouched?The rest of the image stays mostly the same, with small loss of going through the vae (I think). Ideally you want to mask just the modified parts into the original pic.
>what is the max size for the input?Whatever flux kontext can generate.
>>105751779What are those? Never heard of them.
>>105751859I love how kontext can dupe a font even if the font itself doesnt exist online, you can copy it.
here is a fun test case:
change the text from "Indiana Jones" to "GAY QUEERS". Change the text "fate of atlantis" to "buttsex of AIDS". Remove the man with the hat on the left.
>>105751902and a more common test case, Miku:
change the text from "Indiana Jones" to "Miku Miku Miku". Change the text "fate of atlantis" to "i'm thinking Miku". Remove the man with the hat on the left and replace them with Miku Hatsune. Change the lava in the picture to the color teal.
>>105751912much better Miku, prompted it a little clearer.
change the text from "Indiana Jones" to "Miku Miku Miku". Change the text "fate of atlantis" to "i'm thinking Miku". Remove the man with the hat on the left and replace them with Miku Hatsune, with long twintails and her traditional outfit. Change the lava in the picture to the color teal.
>>105751922okay. now the second set of text is proper. I'll gen more in a bit.
>>105751887thanks. other anon is refering to dedicated object removal tools like the old lama cleaner. quite handy and fast when you just want something gone like a watermark or w/e
What's the total filesize of all the Kontext shit?
>>105751819>>105751849I've always despised this mangas art style
holy shit you can do a billion edits in one go, seemingly.
change the text from "Indiana Jones" to "Miku Miku Miku". Change the text "fate of atlantis" to "i'm thinking Miku". Remove the man with the hat on the left and replace them with Miku Hatsune, with long twintails in Miku Hatsune's outfit. The lava is replaced with green leek vegetables. Change the background to a music concert. A spotlight is on Miku.
>>105752036That's impressive. Now if only AI could spell.
Can I use the Gemini API node in comfy also for reformating my dementia esl prompts into AI-compliant wall of text?
>>105752043it usually one shots it but can sometimes take a few gens. but it is able to copy the font/style, even gradients, which is impressive.
>>105752051there we go. now it's a proper game.
this one is trickier but kontext actually can do it:
change the red text from "Zelda" to "Miku". Change the text "link's awakening" to "Miku's Awakening". Change the text "DX" to "39". Replace the egg in the center with Miku Hatsune.
im so impressed with this model. it's like wan, really good at what it does.
>>105752072a->b reference
revenant
md5: 17144cfb9711c261b54d646de05b5378
🔍
is there a way to do a style transfer between two images in Kontext?
Like I want to put the face on the right onto the drawing on the left.
The model is extremely cucked though. Nice, but could be better.
>>105752102clothes remover decensors everything and thats a day 1 lora. if you want lewds you can get lewds. if you happen to get nips that are off just do a fast inpaint with any SDXL model with booru training.
>>105752145>clothes remover decensors everything and thats a day 1 lora.and it got nuked on civitai so no one will know it ever existed, the ecosystem can't thrive when the biggest site prevents you to host your loras
>>105752102There's a checkpoint merge on civitai which uncucks it about as well as any flux base can be
>>105752152anything good online is there to be found if you look for it, like when reactor for face swaps got "banned" and rehosted (still up)
>>105752166>just find the video on rumble brono one is gonna do that, we're all on youtube even though youtube is becoming more and more shit
>>105752172in any case all you need is a lora once or an extension once, it will never stop working unless you git pull and break something (then you can just revert or use a previous version). reactor will work forever on my system for a face swap, it wont just stop working cause it's 100% local.
>>105752079Unironically use NAG and go for big nag_scale value like nag_scale = 15, and it starts to listen to you when you do that
>>105752153says it's clothes remover, which is on hf, and a different lora called reveal.6
any idea where to get the latter as a lora?
>>105752208is there a default 2 img workflow? cause I dont know how to set all that up
>>105752224there is
https://files.catbox.moe/g40vmx.json
>>105752221idk but it works pretty well
https://files.catbox.moe/hdw76t.png
why aren't GGUF loaders in base comfyui? seems like a standard feature
make the character from first image sitting on a sofa while maintaining the same artstyle, (he has the exact same pose as the character in green square:2)
It doesn't work, my neggas
kontext can't do ref image 1, ref image 2 well
>>105752263Why not include that he is leaning on own hand
>>105752227how does the top image affect outputs? I got the bottom image shaking hands with itself instead of the top image girl.
>>105752284wait, image resize at the top was set to 0/0 width and height, is that to disable it?
>>105752291disregard that, im wondering how the top interacts with the bottom image in this workflow
>>105752271The object is testing if model understand pose of the refence image and then recreate it.
If I carefully described the pos, then it would not be main purpose.
>>105752284add on the negative prompt "clone, twins"
>>105752255I agree, at this point GGUF is very established as a format, so I can only suppose that comfyanon is being lazy and/or is salty over something with GGUF, he is really petty.
Overall it's shameful that the core nodes cover so little of basic usage that you need custom nodes just to make something as basic as combining random prompts from lists.
>>105752295>im wondering how the top interacts with the bottom image in this workflowthat workflow also stitch images, but it does it on the latent space, it's a different method and it also works
https://www.reddit.com/r/StableDiffusion/comments/1lnckh1/comment/n0ev4qe/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
>>105752300seems to be working, the other girl is showing up in the preview
>>105752318show a screen of the workflow, hard to help you when I can't see what's goin on
>>105752323it's fine now, negative prompts worked
>>105752340oh yeah I misread what you said, my b, and yeah, cool that it seems to be working with you, have fun with that model, I think at this point we definitely need a rentry that lists all those little tricks so that we can use Kontext Dev at its full potential.
>>105752351definitely, as it's a very neat tool like wan is.
the anime girl is wearing a white tanktop and a black miniskirt. keep the same proportions and expression.
>>105752186Where do you find reactor nowadays? Torrent?
>>105752375That doesnt look like a miniskirt sister. It looks like very short pants
>>105752384https://github.com/Gourieff/ComfyUI-ReActor
the nsfw stuff can be commented out to fix the filter, otherwise github bans it.
>>105752392this one turned out better, now it's a pleated skirt.
alright, pretty happy with the first version of this kontext pixel lora. it still struggles with some photorealistic images and scenarios but whatever, i've published it on civitai
need to work on my dataset for the next version
>>105752411Now give her meaty pencil eraser nips
>>105752419mods might be in a bad mood, just use any sdxl booru trained model and inpaint.
neat, worked on a very old cammy I made (this was before noobAI I think). although covering cammy up is bad it's a test.
>>105752412Cool man. Did you train it locally?
>>105752449nah 24G VRAM didn't seem enough, rented an A40 and trained overnight. still quite slow like 20s/it
>>105752446>>105752375So kontext can't keep the input's image artstyle at all?
What happened to the Anon that tried to train his own NSFW kontext LoRA?
gulag
md5: 9f709cfae03bd00522a5a0bca7a33f70
🔍
>>105752454I guess it will be same with Chroma.
>>105752457it can, I just specified to keep the same expression/proportions. if you dont say that it will take more liberties (diff pose, etc)
>>105752516civitai.com/models/1731051
>>105752512for example, it can make infinite pepes with a pepe source image, it will only look like a real frog if you tell it to be realistic. cartoon frog keeps it as a regular pepe.
>>105752522much appreciated
>>105752522Strange it can't be seen on front page
63
md5: 5f1ceca51d6cd5f76f08f9bb686e61c3
🔍
>>105752522Impressive result.
>>105752548maybe cause i just added it. i also ticked "cannot be used for nsfw" but not sure what that checkbox does desu and if i should leave it on
>flux kontext "requires 20GB ram"
>run it on my 3080
>150 seconds
but why is it lowering the resolution of the output image
>>105752578I ran 2400x1400 images fine on my 12gb in the original resolution, but 2560x1440 appears to be oom. Using nunchaku of course.
>why is it lowering the resolution of the output imagebypass/delete fluxkontextimagescale node
>>105752522is there a way to make those pixels bigger? I feel like the effect is a bit too subtle
3
md5: 3ed5e1e5d316c20d12494a69c5f695af
🔍
spent 20 minutes trying to fix hand anatomy to no avail
>>105752604no but i am planning to train an 8x version at some point (so double the pixel size of the one i linked).
this is really my first try and making the dataset is hard, i had to use the flux loras which i had already made to make a synthetic pair dataset
>>105752522Seems like it struggles with high res images, but that might just be a Kontext issue in general
>>105752617flux faces are so cursed
>>105752650yeah it's only trained up to 1024 on a small dataset of 20 image pairs
Realistically how long until other kontext variant that has apache 2.0 license and nsfw combability is made (Chinese model). Do you think the model will benefit with training like Pony/chroma
file
md5: 477a2462b0faf6bc2415a9d802bea23a
🔍
>>105752604>>105752622desu i realized, you can just halve your output resolution. the lora just tries to be 4 pixels = 1 pixel.
>>105752715Damn looks nice
>>105752709you'll be waiting a while. took almost a year to get a flux finetune. pony took 6 months after SDXL released, and illustrious another 7 months after that. the only model comparable to flux we've gotten since flux is hidream, and nobody wants to touch it.
>>105752723will publish a v1.1 later today, i think the one i put on civitai is a bit undercooked still
kontext is crazy good. i think AI is driving me insane. its too much omnipotence to handle and it overwhelms me, i think im just gonna rope one day
>>105752739we're still in early stages
the future is models that communicate to user through llm module to discuss and fine tune the image, moving away from black box approach
Does Chroma not know wtf loli/JC/JK or shota is? The characters ages/heights/tits are all over the place.
>>105752739it is but you got the hella weak version. Just go to the bfl website and use Kontext. Max which is their best version and see the comparison for yourself. Change the setting and see what is really in store for you. Then imagine what this will look like nsfw with proper training and no censorship holding it back.
34344
md5: bb8ebe7df5bbae4bee580774dff99dbb
🔍
>>105752523Seems ok for general style like that frog one, but it shits the bed once you try more artsy style
>>105752748what ive always wanted is some kind of multi stage system, like it gives me choices for composition first, then lets me choose one, and i can further refine subjects, styles, and make little edits until the final product is complete
>>105752759yo buddy
still gooning?
>>105752727i guess is because companies went for txt 2 vid models and downright abandoned image models altogether until the ghibi meme got them reconsidering. Just look at stability ai and BFL in how they shifted focus to a video model. BFL still is taking a year to release it after their announcement. It kinda pisses me off that the best image sources is from the west which is censoring the model to hell whereas the best video model is from the east (WAN2.1 and others).
what if txt2img become so potent and ubiquitous that people stop drawing new content and everything goes to shit in a decade
>>105752412Nice work my man!
>>105752522looks more like sprite art than pixel
>>105752874sprite is any 2d element bro, there's no such thing as sprite art, or it can be commonly used to refer to pixel art
>>105752869Is this clash of clans?
>>105752499You can train Flux dev on 12gb with ram offloading (32gb is enough).
Chroma is based on Flux Schnell which is smaller than Flux dev, so it will have less hardware demands.
The problem is that the only trainer that supports Chroma at the moment is ai-toolkit, and it has practically zero vram optimizations meaning you basically need 24gb vram to use it.
Once OneTrainer and Kohya adds Chroma support, you will easily be able to train loras at 12gb.
>>105752573It means that it can't be seen if you have NSFW turned on, Civitai did this a while back to prevent celebrities from showing up when you had nsfw enabled, they were hoping this would appease the payment processor so that they could keep both porn and celebrities, but it didn't.
>>105752924Not bad, but I miss the cake girls enjoying syrup...
IMG_2876
md5: 8614b9373888ccdf6abe3fb2f1f0a287
🔍
https://xcancel.com/bdsqlsz/status/1939562837724315850#m
Imagine they’ve seen how much coomer use Wan2.1 had and they make it full NSFW
>buy 5090 for $2800
>sell my 4090 on ebay for $~2100 going rate right now on used examples
>5090 upgrade for under $1000
y/n?
>>105752765>first image>second imageyour model won't understand something like that, try to go for "replace the art style of the image by the art style from the other character"
>>105752933>Wan 2.2please make it local aswell
>>105752944>Hunyuan went API>Bytedance went API>Hidream went APIthe SaaS reaper knocks once more...
>video tdr failure
is it over for my faithful 6 GB laptop graphic card?
>>105752949These were all both open model and API 'pro' version from the get go, Wan is the same.
I would be surprised if Wan 2.2 isn't released into the wild, that said, how much of an improvement can one expect from 2.1 to 2.2 ?
>>105752970>These were all both open model and API 'pro' version from the get go, Wan is the same.no, when alibaba released Wan 2.1, they released their most powerful model (they didn't have something better on their API)
>>105752968It's time to put old Yeller down, anon T_T
>>105752973You sure ? Either way everyone else is an also-ran at this point, it was hilarious to see BFL quietly memory hole their upcoming video model when Wan released.
Fuck western big tech, the chinks are my best friend now.
>>105752984>Fuck western big tech, the chinks are my best friend now.yep, this
>>105752970>>105752984https://www.runcomfy.com/playground/wan-ai/wan-2-2
This page seems to have a little information and demo clips available
>>105752994>30 fpsoof, it's gonna be painful to render, but they are right, 16fps wasn't enough for fast movements
>>105752937if it wouldn't hurt you and you think you make use of it, I'd keep both. how much more expensive would it be for you to get another 24GB of VRAM or 4090 if you decided to later?
>>105752970lol Chinese count on Americans to think like that. they know making a big release with a small version increment like it's no big deal can crash our stock market.
>>105752968>tdr I had 10 of those last weeks, then nothing this week
it's probably windows fucking shit up
how dumb of an idea is to buy a KVM switch to easily switch my monitor between my integrated gpu and nvidia card in the case i need that extra 1gb of VRAM?
>>105753039>lol Chinese count on Americans to think like that.Go kvetch somewhere else, rabbi
>>105753014Why not 24 or 25 (NTSC, PAL respectively) ? Would at least save some time and be fluid enough.
>>105752937I'd keep the 4090 so you can still use your GPU while gen'ing, or use it for parallel gen'ing.
NAG on Flux Dev is pretty cool
https://github.com/ChenDarYen/ComfyUI-NAG
>>105753074That extra 1gb VRAM could be the threshold difference between higher resolution, higher batch size etc, so it could be worth it.
It's sad that Nvidia is so cheap that we are doing workarounds to save 1gb...
file
md5: 6ea3ffa715ed4f386ba0661140b07be0
🔍
>>105753074>>105753152just put that extra gb to the ram bro
https://github.com/neuratech-ai/ComfyUI-MultiGPU
>>105753164ram is like 100x slower than vram
>>105753170not if you only offload a few % of the total model to the ram
>>105753174yeah ok fair. i still think though if i CAN use that extra vram by switching the monitor to the integrated gpu, then i should, since it's "free"
only annoying thing is switching back the cable once you're done generating and want to game or smth
>>105752994> improved temporal consistency, seamless scene shifts, and support for longer, high-resolution clips.yeah that's never coming to local.
>>105753174That graph is for llama.cpp and an LLM.
When using VACE, does the reference video resolution need to be the same size that WAN supports, like 480P, 720P, etc?
>>105753203If they were going the API route, that'd be the first thing they'd mention.
>>105753205>That graph is for llama.cppthe gguf node on ComfyUi uses llama.cpp, are you this retarded debo?
>>105753207No, but it might give better results, I can't say.
>>105752968Install Ubuntu and try again.
>>105753074KVM is useful anyway. You could also use it for virtual machine...or something else.
>>105753243It doesn't, llama.cpp is not a dependency. The Nvidia driver can already offload to RAM. It's just snake oil.
lamia
md5: 495551ac183711b243f526a088864bd2
🔍
>>105753289>debo baits used to be believable
>>105753289The Windows Nvidia driver automatically offloads to ram unless you tell it not to, the Linux driver never does.
And you should tell it not to, because it is a lot less efficient at doing so than the inference / training program you are using.
Those programs knows exactly what they should offload and when, the Windows driver just dumps whatever it feels like, slowing everything down a LOT.
Did you guys experimented with Kontext Dev as a simple text to image model (it can render images without image inputs)? and if yes, do you think it's worse or better than Flux Dev?
>>105753394>than the inference / training program you are usingThan a random person making a random node. Llama.cpp has actual code that does CPU inference. ComfyUI offloading doesn't work like that, so the graph is invalid. Make a new one that actually applies to that node and ComfyUI.
>>105753394do not feed the debo troll anon
>>105752869holy based, thanks anon
Is Kontext completely local now? I've been too busy over the weekend to follow, i'm still busy :( but would appreciate a few spoonfeeding mouthfulls to get me going if anyone would oblige, thanks.
wan2.2
md5: 722120ff51925f1b3273db6dc3fbf922
🔍
>>105752994>>105753014>This page seems to have a little informationNo shit, its another fake website giving bullshit information plus using wan2.1 demos, kek. How does this random site have access to the "demo clips" and its magically no where else? Plebbitors are catching more fake info sites:
>https://www.reddit.com/r/StableDiffusion/comments/1lo2pmz/wan_22_coming_soon_modelscope_event_happening_atm/
>>105753493>One more thing, wan 2.2 is comming soon>Wan 2.2 open source soon!What makes him believe that? maybe it'll only be API
>>105753398No I only gen the same fennec anime girl forever and forever I don't even care to phrase my prompts at this point it's just a word salad no commas or anything else
>>105753526>What makes him believe that?https://xcancel.com/bdsqlsz/status/1939634590987289060#m
it's confirmed it'll be open source
>>105753427I don't give a shit about the graph, I didn't post it.
I've helped friends with Windows get their generations/training not to be glacially slow. Window driver vram offloading is pure shit for AI.
>>105753548https://xcancel.com/bdsqlsz/status/1939566026703946043#m
there's also Kolors 2.0 that'll be released, but Idk if it'll be local like Kolors 1.0 or not
Wouldn't 30fps Wan take twice as long as 2.1? considering it's almost twice the number of frames
And what would happen to all these loras trained at 16fps, all these snake oil like teacache, lightx, causvid, SLG, vace control and so on?
Am I oom? (12gb)
The prompt was "Change the time of day to midday. The sun is shining brightly on the mountains. The sky is clear, no clouds or fog. Sun is clearly visible."
Top original, middle output in original res, bottom using imagescale node
>>105753548>Alibaba didn't betray usbest AI company in the world
>>105753595if they managed to make the model better with a fewer parameters then maybe we'll get that speed back, but desu self forcing lora is doing the job well at that
>>105753595>all these snake oil like SLGbait used to believable
>>105753565those kolors images look like complete slop, but wan2.2 would be nice
Are Flux D loras compatible with Schnell?
>>105753565>>105753626kolors 2 is already available via API?
should be easy to confirm if it's capable of non-slop outputs.
>>105753548>the chinks won againthat's fucking right, take some note SAI and all you western dogs!
>>105753595We'll know when they release it. I hope to god they remove the 5 second limitation though. As it's 2.2, can't see that happening but we'll see
>>105753603Based Chinese stay winning
>>105752578it absolutely does not, get the q8 gguf model.
https://huggingface.co/QuantStack/FLUX.1-Kontext-dev-GGUF/tree/main
>>105752739AI makes much of the internet fun again, like the early days where most normies knew fuck all and a select few knew about browsers and ftps and so on.
how would you do faceswaps or swapping elements with kontext? ie: character wearing character 2's clothes
kontext fp8, fp8_scaled and Q8 GGUF are all about the same size.
what's the difference between them to the end user (local slopper)?
who's the artist that drew those infamous cheerleader bdsm comics, and what are the odds that flux knows it
with a simple image stitch node and 2 images, and some directions, you can make any historical figure interact!
the man on the left is shaking hands with the man on the right.
if you change the image size to the stitched size it helps with consistency.
>>105753831it doesn't look like hitler though lol
>>105753834not all outputs are the same, can take a few gens to get the likeness right
>>105753834actually I forgot the magic words: keep their expressions the same. that helps a lot with keeping the faces the same style.
>>105753854for example, this is with that added:
>>105753820akira toriyama
Is there a cheat to looking for optimal lora training settings without having to wait ~1 hour between iterations, or do I just have to train with different settings overnight and compare the next day?
>>105753775it's not consistent but sometimes it can work
>>105753806Output quality-wise, fp8_scaled and q8 gguf should be somewhat better since they use algorithms to more effectively scale the weights so as to not lose as much information on smaller concepts of the model when reducing precision.
Can I just add SD3.5 to easy-diffusion models and expect decent results?
>>105753843This nigga has a white hand, what did Kontext mean by this ?
still need to tinker and learn stuff, but stitching and prompts do work:
>>105753806>what's the difference between them to the end user (local slopper)?Q8 is the one with the best quality
but yeah, change width/height to the stitched size or you get some funny results. this is with the stitched size.
the man on the left is holding a large white body pillow with an image of the anime girl on the right on the cover. keep her expression the same.
and this is the a -> b, with the stitched image as source (just have 3 nodes and run it solo, image + image + image stitch to an output)
>>105753843>>105753861>>105753908you can see it's not natural at all I prefer to use this method instead
>basic bitch kontext setup + day 1 lora
>can bring nearly 25 year old internet pics to life and change almost everything about them while they stay looking like that was the real photo like some real fucking Harry Potter magic photo shit
Bros....I think I'm gonna die from dehydration....this is too much power
Looking forward to the increased duration and temporal consistency in Wan2.2. I'd be amazed if they managed to increase it to 30 seconds while retaining everything. That would be mind blowing.
>>105753955yeah I have that workflow saved im just trying to figure out how the model treats stuff with one input, that way is definitely better for combining stuff.
I've managed to replicate tags and artsyle correctly, more or less.
There is one problem, the generated images are low resolution (as in, details aren't good) and I don't know why it is happening. Database is good quality but I can't manage to replicate it (example: details on a piece of armor isn't like the original, and the details looks bad)
Model I'm training is illustrious v0.1. it's weird because it can learn clothes and weird outfits, but it's just that the quality isn't quite there. Adetailer or high-res fix doesn't fix it always.
Ive read somewhere that sdxl models have like many models and that they need extra settings for high resolutions maybe? Like, they generate things at very low resolution by default, even if they learn the information.
Can anyone smarter than me help me please?
>>105753955why do I have a feeling that using conditioning (combine) would be a great method to do style transfer, I need to test that
man, kontext is so good for home renovation stuff.
i just took a picture of my kitchen and was like "change cabinets to white", "replace the bin with a kitchen pantry" etc.
so nice to be able to picture how things will look
you can even provide 2nd image of some furniture you found online
>>105754000couldn't you have done this with regular inpainting..?
>>105754007>effortif he had that he'd have renovated his kitchen already
>>105754007i couldn't be bothered back then, the workflow for inpainting was too clunky and results weren't as good
now you just tell it what to do and it does it immediately without fucking around with a mask or w/e
we're in a new era of digital marketing and branding
>>105754007Yes, but dude, this is Kontext, which means it must be better, you think all that reddit hype would lie ?
>>105754029Yes, this looks really professional, surely this will sell waifu pillows, very good sir!
>>105754035controlnets and inpainting can not do all the shit kontext does. even the stuff it swaps respects layers below it. if you inpaint with high denoise it fucks that up. and controlnets are great but can't generate poses from prompts on the fly. this is a great tool to use in conjunction WITH all that stuff.
>>105754035honestly just from trying a few times it is much better. sfw workflows like this is where it shines
we did it, we got forsen to play black flag.
the man on the left is watching a TV screen and holding a game controller. On the screen is the image on the right.
>>105754051change 1993, 1998 to 2006, 2007
>>105754063good idea
also I love how kontext can dupe fonts, if you cant identify a typeface or find the font, how would you shoop it? in a sense it's even better for editing cause you dont need the ttf font or whatever.
and there we go, it gave me 2066 instead of 2006 for a couple gens.
>>105754053My problem with this is that for SFW workflows I'd honestly rather use the superior API options.
>>105753831Got this with the latent concatenation method, it's also doing that not smoth at all vertical separation unfortunately
>>105754098>My problem with this is that for SFW workflows I'd honestly rather use the superior API options.this, if we're only allowed to do SFW stuff locally, then what's the point? the API options are better
>>105754101yeah the 2 image workflow is 100% the way to go for combining stuff, stitching is just a workaround that isnt as efficient, also it can be weird if the stitch is wide and the output is 1024x1024.
>use ai to make small helpful utility nodes for my workflow
>works a charm
so this is vibe coding
>got used to cfg1 + schnell lora
>needed to go back to v40
>those gen times
Like fucking pls integrate these things into the base chroma. It's like shooting heroin.
47
md5: d28a84829935bb9bd91db7685ea1f856
🔍
1600x1600 works on flux1dev on 12gb
what would be the limit for 24gb I wonder, around 3000x3000?
turns out I do like cartoons after all, there's just never been a good one
>>105754247make the man on the left wear tattered rags
>>105754247nice, which workflow/method?
>>105754260>which workflow/method?that one
https://files.catbox.moe/ftwmwn.json
can I change the lora filename to the trigger word or does it mess with the file?
>>105754323filenames have no bearing on the contents of the file whatsoever
did multiple gens but the result is the same:
>>105754303That's more like it
anon your workflow + post about schnell lora for speed should be in the rentry, works well plus negative prompts are super useful.
>>105754399so for prompting, the top image input is always referenced as "first image"? is the workflow putting them together, just curious how they work together or how to reference them
>>105754414also 31 seconds with the schnell lora, even faster than the 1 image workflow (with no lora)
what happens if you leave the second image blank? just prompt in reference to the first image?
>>105754399thanks
>>105754414I don't think "first image" works at all, I think the workflow uses the image (that's on the bottom of the workflow) as the main image, and then you work with that, "add something from the OTHER character/image", that seem to work better
>>105754424>what happens if you leave the second image blank?why would you do that? lol
>>105754438I mean, what if you want to change a single image without a second reference, or is this workflow primarily for two character interactions?
>>105753974What resolution are you training at ? And are you sure your training images are of at least that size or larger ?
>>105754451no, it can with the one image, you have to bypass the vae encode node on the top of the workflow (I added a note about that)
someone should make a pepe with this expression with kontext just because
>>105754465I see, very useful workflow this should be on the rentry page as it's much more useful than the default one, better for two image interactions as well as speed with the lora.
>>105754471>When you forgot your dentures at home
the girl is sitting on a beach chair at the beach.
eve if she real:
>>105754500people without teeth look weird, their mouth is sunken in
teeth shape your face a lot more than you expect, you can definitely tell when someone is just gums
>>105754520also 8 steps works fine, I tried 20 steps to compare and gen time is still similar to 20 on the default workflow...but this has negative prompts + NAG.
>>105754522You must be pretty clever. Is this why you are frequenting /ldg/?
>>105754616Kontext is the perfect model for image shitpost memes not gonna lie
>>105754611I would think most people here have a good concept of anatomy, they're extremely critical of it
>>105754668I love how kontext can replicate fonts and even details like gradients or textures or whatever
like the miku/zelda from before: the font is LITERALLY the same pixel font.
>>105754698this is a better one.
>>105754461I'm using these, I set up max resolution at 1152x1344, lycoris, network and convolution at 32 (normal and alpha) lr 0.0003, TE 0.00015, 20 epochs, 5 repeats in dataset, Huber exponential loss, if you need more parameters please let me know.
If I'm able to replicate the artsyle, tag concepts aren't correctly learned, and If I get the concepts, I get that low resolution thing.
>>105754639Counting fingers is something what everyone can do. Why don't you draw up something if you have such a good knowledge of anatomy then?
>>105754733try asking openAI to do this with a lewd lora
>uhh sorry, even though you pay, we cant generate thatopen source ALWAYS wins.
>>105754711do you prompt "change "up your arsenal" to "up your ass" without changing anything else"?
>>105754761yeah I did something like that
>Replace "UP YOUR ARSENAL" to "UP YOUR ASS"
cute
yes, forsen is the test case for gens because why not, he loves anime.
>>105754746>open source ALWAYS wins.we're losing though, BFL is prohibiting NSFW loras on the internet, and I feel it won't be the last company to do that
>>105754808who cares what they say, I can use the clothes remover lora right now and it works on anything.
just because they say you cant doesn't mean you can't do it. buying a PC doesn't mean you can't torrent. resourceful and smart people will always find a way to do stuff.
>>105754804better
project diva miku is pretty effective:
please just make new content. it's been days of the same shit
>>105754727your quip was funny and I apologise for ruining it with a titbit on facial structure
we good?
>>105754845wait for the kontext shills to get bored and go back to feddit. this always happens when a new model drops.
>>105754851We're good, buddy boy.
>>105754883>this always happens when a new model drops.Wan is still going strong though
>>105754895because of the recent light2x lora that allows very fast gens for poorfags.
>>105754895Miku spam of the same input image is over at least
Untitled
md5: 03af5684c6b4c030a57f95c28bc493f7
🔍
Dunno if anyone's said this before but this one's https://github.com/pamparamm/sd-perturbed-attention NAG node actually works with Nunchaku SVD Kontext quants. The prompt was to replace the character with 2B, without vs with NAG
>>105754895Wan is pretty much a revolution in local video gen, also it is somewhat uncensored out of the gate and easy to train nsfw for (easy, but not cheap though).
>>105754927>The prompt was to replace the character with 2B, without vs with NAGwhat was the prompt exacly? this is really impressive it even kept the style, wtf
>>105754932flux in itself is extremely cucked. i'll never care about any model they release.
>>105754932kontext is the death knell of local. I like the model but hate what sinister proprietary intentions will be coming after
file
md5: bef6a535dfb22c46f393d8ffa3c737f4
🔍
>>105754932>Wan is pretty much a revolution in local video gentrue, and I can't wait for the second revolution
>>105754953did they say it's open source or is twitter tranny lying?
replace the girl in the second image with the girl in the first image. (nag workflow)
pretty good for a first try
>>105754946>I like the model but hate what sinister proprietary intentions will be coming aftersame, it created a dangerous precedent, and I feel they got away with this because their model is just fun to play with, we were way harser on SD3's licence even though it's not half as bad as Flux's one, we have the beaten wife syndrome, as long as the Husband is handsome he can get away with abuse
>>105754946>kontext is the death knell of localJust how retarded are you ?
>>105754982that guy speaks chinese and he was at the conference, and he confirmed it
>>105753548
>>105755063>I don't read corpo overreach licences and I don't have a problem with itsir, this is a /g/