Palate Cleanser Edition
Discussion of Free and Open Source Text-to-Image/Video Models
Prev:
>>105737196https://rentry.org/ldg-lazy-getting-started-guide
>UISwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassic
SD.Next: https://github.com/vladmandic/sdnext
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Models, LoRAs, & Upscalershttps://civitai.com
https://civitaiarchive.com
https://tensor.art
https://openmodeldb.info
>Cookhttps://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe
>WanX (video)Guide: https://rentry.org/wan21kjguide
https://github.com/Wan-Video/Wan2.1
>ChromaTraining: https://rentry.org/mvu52t46
>Illustrious1girl and beyond: https://rentry.org/comfyui_guide_1girl
Tag explorer: https://tagexplorer.github.io/
>MiscLocal Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Samplers: https://stable-diffusion-art.com/samplers/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage | https://rentry.org/ldgtemplate
>Neighborshttps://rentry.org/ldg-lazy-getting-started-guide#rentry-from-other-boards
>>>/aco/csdg>>>/b/degen>>>/b/celeb+ai>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
Things are changing too quickly for me to reliably reaction image shitpost.
I've got to think about this.
Blessed thread of frenship
>>105739403there's no way this is a raw Flux gen, did you save it at like JPEG Quality 10 before uploading or something kek
the green cartoon frog is on a fishing boat holding a fishing rod on a sunny day. he is sitting on a chair. keep the expression the same. a cooler with beers is open on the boat.
>>105739426This isn't even the worst deepfrying we've had in the threads. That Bob Ross one was absolutely baked.
Is there a version of Chroma better than v29 yet?
>>105739293nta. i guess, the GGUF version has some problems with longer text, but this is still really cool
>>105739426raw kontext gen
the green cartoon frog is on a fishing boat holding a fishing rod on a sunny day. he is sitting on a chair. keep the expression the same. a cooler with beers is open on the boat. there is a large cartoon shark that is diving out of the water.
>>105739442>the GGUF version has some problems with longer text,use NAG, it fixes the text a lot
https://github.com/ChenDarYen/ComfyUI-NAG
Why does Kontext gradually degrade the image after repeated edits? It doesn't look like the kind of degradation you get from repeated VAE encoding/decoding, seems like something else
>>105739463If only the style was just a bit more like the initial pepe
>>105739463revised a bit:
the green cartoon frog is on a fishing boat holding a fishing rod on a sunny day. he is sitting on a chair. keep the expression the same. a cooler with beers is open on the boat. there is a large cartoon whale that is diving out of the water, attached to the fishing line.
>>105739470cause it's changing the entire fucking thing to some extent every time even if you don't notice it, basically. You'd need to be using it with like hard-masking of certain image areas to avoid that, I guess
prompt for multiple people worked well
the green cartoon frog is on a fishing boat holding a fishing rod on a sunny day. he is sitting on a chair. keep the expression the same. a cooler with beers is open on the boat. On the boat are two other cartoon frogs that look like him, one is wearing a white t-shirt, the other is wearing a red t-shirt.
now it's a fren boat/fishing trip
>>105739470I think they haven't found a way to force the model to only change only one region but the whole image, desu it's way better than what 4o is doing lol
https://www.youtube.com/watch?v=Ot_aYxptzJ4
>>105739485looks too clean, add ",while maintaining the same style of the drawingโ Or "using this style,"
maybe ?
>>105739500But 4o piss filter degradation would be easy to fix by doing an Adain on the image after each iteration I think (using the original image as the reference)
Whatever this is much harder to fix :<
Since regional prompting doesn't really work with multiple subjects reliably, how do you guys make do when you want different parts of the picture to have different stuff?
Gen a base image and inpaint each region one by one?
>>105739497the green cartoon frog is on a large fishing boat. he is sitting on a chair holding a beer in a bottle. keep the expression the same. a cooler with beer bottles is open on the boat. On the boat are two other adult size cartoon frogs that look exactly like him, one is wearing a white t-shirt and drinking beer, the other is wearing a red t-shirt and drinking beer.
pretty decent desu
kek
"candid amateur photo of an average 4Chan Local Diffusion General user"
noob can do crazy stuff when you start looking at the different mediums that have booru tags.
>>105739563better cooler:
>>105739576I mean shit, look at this
>>105739576This pic is so baked the stock looks like it was made from the contemporary wooden plastic kek
>>105739576What do your upscale settings look like
>>105739576why does the background is realistic and looks so terrible? lol
bros... dont make the same mistake as me and fall in love with one of your gens
>>105739651I seriously hope it isn't the zendaya goof hair you just posted
>>105739651>falling in love with a random insignificant slut>2025Tsk tsk!
>>105739626because, I'm pushing the model to its limits:
>(photo background:1.3), photorealism, by pankichi anko, (by gomasho asuka:0.5), (3d background:0.7), paper cutout \(medium\),masterpiece, absurdres, amazing composition,I had gens without this artifacting but I think the artifacts actually add to the aesthetic here.
>>105739612have a box
https://files.catbox.moe/vqwhkc.png
>>105739500ChatGPT has as a massive content filter. I bet it's not the model that's doing it at all, but a watermark they placed on it. In reality that model would far exceed what we have locally, but they're not letting us play with the raw model.
Anyways, the Flux inpainting and outpainting models do the same thing. Only node that can fix this is in that case is ImageCompositeMasked
>>105739651ive seen that face a million times
>>105739750wai holds its own here, alright I won't totally write off the illustrious models. but lately noob finetunes have been impressing me the most.
I don't understand why WAI is so popular if it's simply a merge.
>>105739651Now all you have to do is torture some LLM into being your onitis so I can laugh at you when some sloptuber makes a video on the subject
kontext is amazing.
>change the text in the center from "STARFIELD" to "SAARFIELD". Add an indian man in a spacesuit with his helmet off, with an indian flag on his arm.
>>105739651how bland. but it is always fascinating how others react to gens. I remember sharing a few tattoo girls with an old friend and he enthusiastically pointed out one particular gen, for me it was like a 5.4/10. "can you gen more of her?", etc
>>105739590anon wot are you doing. but yes noob is funky.
>magazine scan, graphite \(medium\), scan artifacts, scan, color halftone, artbook, doujinshi, production art, novel illustration, jpeg artifacts,>>105739576i need e621 in il2.0 or 3.5vp if it ever leaks
>>105739831okay, NOW it's proper saarfield.
>>105739850India will be the first country to shit on the moon
>>105739831less is more anon, just saarfield and an indian would get the point across
>>105739846ah I never asked how you make those but yeah.. figures.
>>105739360 (OP)You got a guy linking your threads in an attempt to raid them. Just thought you should know.
>>105739759noob again. come on. this model is so expressive and has so much style. if local doesn't make any advancements beyond this, I'll still be set for life. what an incredible achievement for a random chinese autist.
>>105739826see
>>105739670
How many steps for the cfg1 chroma?
I feel like the cfg1 Chroma is more schizo than classic versions. I had one pristine gen, but since then it's been giving me mid pieces.
>>105739908doing 20 now, gguf from silverdude. quite a few gens look pretty baked at cfg1, so there's that.
>>105739899ah so it's just prompting. neat.
>>105739937She's clearly not wearing enough safety gear
>>105739966Your work is so fucking cool dude. Genuinely in awe. Would you be able to post a gallery with some of your previous stuff or am I doomed to go back through previous threads? Or just a catbox to see more of your process maybe?
change the text in the center from "STARFIELD" to "SHITFIELD". Change the location to the surface of the moon. On the moon is a sign that says "NO CONTENT HERE".
sign is wrong but the logo is perfect:
>>105739846sick style prompt, it works perfectly with different subject material
>>105739467workflow?? how did you get such good text from 35m?? here's a random 35m gen I just made
>>105740017neat. these are energy swords
>>105740048https://files.catbox.moe/o5gcn3.json
same prompt but i added these
>>105739846i ran the same seed as the previous one but it forgot the character. kinda cool this finetune recognizes her after just 7 epochs. its kind of unstable so i did a merge with two versions which helps a bit
>>105740105based luminachad.
here's more sd35m. wait a second, this model is good?
>>105740132Imagine those eyes
what does it mean when adaptive guidence sets cfg to 1? some gens it triggers at like 10%, does this mean it basically ignores my prompt?
we need models over 50gb and affordable gpu's to match that in vram, until then local models are just toys
what is the meta checkpoint? is it still illu?
>>105740170it will effectively set the cfg to 1 and ignore the negative once the threshold is reached. need to tune it
>>105740174yep nvidia has zero competition in this field so they are jewing it out
https://files.catbox.moe/rhfqty.png
>>105740081helps if I attach an image.
does kontext support partial denoise inpaint
like if I wanted to denoise 0.4 does that work
I got really noisy outputs
if this can be done can I get a workflow
>Kontext
>Pos: Straight on. Standing straight up. Both feet on the ground. Remove the spear. Arms down at her side.
>Neg: Standing on one leg. Spear. Looking to the side.
https://github.com/ChenDarYen/ComfyUI-NAG
Absolute magic. We will no longer suffer under the tyranny of 1 off artist images at this rate of advancements.
how come there isn't a torrent of all the celeb loras that were removed?
Or is there another site?
>grab chroma cfg1, it is now fast. add nag, it is now slow again. awesome.>>105740279try civitarchive and huhhingface.
Does Kontext know Wojak style?
based lumina and chroma chads bringing back the kino sovl
Chroma is getting real close. v40 is noticeably better than all the previous versions.
Skateboards are the hands of sports gear.
>>105739709Gippity is the most prompt-filtered API-only image generator by a bigly margin
Reve is the least (it has none whatsoever, their prompt enhancer will not enhance certain stuff, but you can turn it off with a button for any particular gen, they only "hard filter" with output blurring that seems to target explicit NSFW and nothing else as far as I can tell)
>v29 is the best
>v40 is the best
which is it?
>>105740311can it do this in a way that doesn't have "upscaled SD 1.5" tier fidelity, though
What's your NAG settings for Chroma?
>>105740367wtf is NAG, sounds like a node for your GF mirite lads *rimshot noise*
>>105740375Chroma? I've been out for months at this point
>>105740399Furryrock guy decided to do a finetune / architecture tweak (to eliminate unneeded params) of Flux Schnell
which Flux model can I use to run Loras?
gguf? schnell? dev?
prompt: https://civitai.com/images/74972543
>>105740413any generally speaking
>>105740408I like it. Does your workflow use controlnet or lora to get such a good pose?
>>105740380omg dude
>>1057403670. slow enough as is
>>105740426i'm not that anon lol
>>105740440im too retarded even for 4chan damn
>>105740375your workflow use controlnet or lora to get such a good pose?
>>105740447its alright kek
>>105740279who specifically? There's a ton on there currently
>>105740458some niche Irish celeb
>on thereon where?
civ?
>>105740447it's just genning. you go " FOOT FOCUS, feet, zomg foot, low quality grainy amateur photo of trashy plastic mystery meat something giving you the finger" and voila
I thought that's an entire raw chicken, but nope. Body horror.
have to very specific when prompting this model
https://files.catbox.moe/oe3tzw.png
>finally figured how to decensor Kontext since the current LoRA is subpar
>requires two datasets
>one dataset is explicit images
>other dataset is a control, same images but with characters/people clothed
>captions are the instruction, so "remove their clothes" or "make them nude"
>need my nude lady dataset clothed, but how to do it...
>!
>what if I run Kontext on the explicit images, instructing it to add clothes
>works like a charm since the modelโs trained to redirect from NSFW
>1000 images are queued up in a batch run rn
>mfw I'm using BFL's own censorship against it
how do I use this lora on Tensor art?
https://civitai.com/models/1363473/the-walking-back-wan-i2v-14b
>>105740496based thinking fren
>>105740467very meta take on the longcat actually, nice.
>>105740496awesome
>>105740496>one dataset is explicit images>other dataset is a control, same images but with characters/people clothed>captions are the instruction, so "remove their clothes" or "make them nude"Wait, so you could basically teach it to do anything then, right? Decensor manga/anime, stuff like that? You could just get uncensored manga, censor it yourself with a pixelate filter, then teach it to decensor
>>105740461Is there a way to prompt a character to be "distant" in Illustrious/Noob?
I am trying controlnet + inpainting stuff which works better than just inpainting for illustrious. But it is drawing characters "closer" than I intend them to. Also I am still getting close up of different body parts half of the time but that's better than not getting anything usable at all.
Also, secondary question, is there also a way to minimize these in-painting distortions? It is a lottery, sometimes it is drawing the same background, shifted a few shades, sometimes it is drawing an entirely different background.
>>105740511Didn't meant to tag anyone sorry, my bad.
>>105740510Yeah. That'd work. You could teach it style transfer too with the right dataset.
>>105740460if it was deleted there's civitaiarchive. Otherwise start stacking that training data or use faceswap
>>105740335>Amateur photograph, this playful group photograph taken at day time features five young East Asian women sitting on a light brown, wood-patterned floor. In a unique and coordinated pose, they all extend their bare legs straight towards the camera, presenting the clean soles of their feet in the foreground. The women are dressed in various casual tops and dresses and look directly at the viewer with pleasant smiles. The background wall is decorated with framed art that thematically complements the pose, displaying ink imprints of footprints in black and other colors like green and pink, suggesting the photo was taken in a themed studio or for a special event.>>105740342>v29 is the bestStraight up autism
>>105740513There was some saas that did this, but it looks like kontext has completely superseded it
>>105740511I am not working with the illustrious models very often. you need to find the right tokens, like 'closeup' in the negative and then of course the right weights. using a positive token/positive tokens is always the best way to do it tho.
inpainting. what can I say. what are you using? comfy? forge? you need to find the right sampler and the right prompt obviously. DDIM is a safe bet. differential diffusion > inpaint model conditioning and inpaint crop and stitch is how I do it. you can also patch the model with fooocus for a more aggressive "make this go away" approach.
let me find a video.. https://www.youtube.com/watch?v=wEd1wPlCBaQ
a pic of my generic inpainting workflow, one of many
wish I could prompt a brain in my head or my depression away
give the girl a japanese female idol outfit. she has a black blindfold on like in the original image.
>>105740578can it do bikinis? May download if it can
>>105740585it can if you prompt it or use the lora
>>105740399Chroma v40
>>105740447Just pure txt2img, a simple prompt can achieve good results
https://files.catbox.moe/uejlhs.png
In the other case here's her feet was positioned
>The woman's pose is deliberately provocative; while her bare foot is propped up and prominently displayed towards the camera, she simultaneously raises her other hand to give the middle finger.
give the white hair girl a japanese female sailor anime outfit. she has a black blindfold on like in the original image.
give the white hair girl a black bikini.
using the clothes remover lora but prompted clothes, still works fine
>>105740399>>105740592I suggest looking at
https://desuarchive.org/aco/thread/8816539/#q8826066
(including my replies to that post and other posts on that thread)
and https://desuarchive.org/aco/thread/8816539/#q8831762
For full extent of Chroma's NSFW capabilities
>>105740576not gonna happen. t. battling depression for 30+ years.
the girl is working in an office at a computer. keep her black blindfold on.
>>105740576all we can do is fun things we like anon, if you do things you like like AI or games or whatever then that's a good use of time.
>>105740551I added full body to prompt and close-up to negative, I also drew a smaller mask. I usually use euler or some dpm but tried DDIM as you suggested. This gives me something close to what I want occasionally, though it can pretty much never resolve eye/face level detail now.
I also have no idea what differential diffusion or "patching the model with fooocus" are. I guess I gotta search.
>what are you using? comfy?Yes.
Thanks for the workflow btw.
Why do some anons on 4chan get SO fucking salty over AI?
>>105740658same reason some people get upset at bitcoin
remove the girl's dress and give the girl a white tank top. her midriff is exposed. she is wearing white short shorts that say "2B" on one leg. She is holding a blue water bottle.
>>105740658>Stop liking what I don't like
>>105740642normally the denoise is applied equally to the entire area. with differential diffusion (or soft inpainting in forge), the denoise is dependant on the brightness of the pixel, or actually latent unit (not really a pixel I guess). brighter = higher denoise. differential diffusion is in the comfyui core. check out that video, goes into great detail about various things. you can also prepare the image in krita/photoshop/gimp/paint (lol), helps a lot. stable diffusion can pick up on your crude airbrush and you help guide it. like a light cone shining on her, stuff like that. not sure what you want to go for but this isnt that bad I guess? need a 2nd pass obviously.
>>105740687vnice
>>105740702I don't use /a/ but I believe it. I think its because of how hard AI is replacing human animators.
>>105740711They just talk about "soul" and Indians a lot
I heard a /g/ anon saying he could beat me
this is with
>incredibly absurdres, absurdres, highres, masterpiece, best quality, newest, 1990s \(style\), year:2005, magazine scan, horror \(theme\), graphite \(medium\), scan artifacts, scan, artbook, doujinshi, production art, novel illustration, jpeg artifacts, dark, doodle, holographic hallucination, non-web source, original, commission, mixed-language commentary, md5 mismatch, archived source, bad link, colored pencil \(medium\), watercolor \(medium\), spray paint, jaggy lines, painting \(medium\), ink wash painting,
and my regular negatives
not using a 2 image workflow btw, this is just a test:
The girl is shaking hands with the cartoon frog on the right. the frog is wearing a blue shirt and red shorts. The girl is looking at the frog and smiling.
removed the background on both, now it works better:
The girl is shaking hands with the cartoon frog on the right. the frog is wearing a blue shirt and red shorts. The girl is looking at the frog and smiling.
well, 2b is kinda ignoring him, for now.
have the girl in the image sitting on a stool at a bar. the bar counter is filled with beers.
>>105740785this time, slightly diff
have the girl in the image sitting on a stool at a bar. the bar counter is filled with beers. keep her dress the same.
>>105740513Not sure if ass crack is bannable. I don't want a vacation yet.
>>105740793can it reproduce the style of the subject for the background? need to mess w/ this
>>105740803some pretty heavy shit went unnoticed in the past but eh
>>105740592true gangstas only need the t2i damn
Anyone know why faces are melting using Wan 2.1?
I'm using the FusionX model, which I know can "change" faces (I doubt "change" mean's melting them), light2xv is at 0.6 str (like recommended), and I'm using the RifleX shit too.
I'm using "A" lora (general NSFW shit), but I wasn't having this issue before. Just seemed to randomly crop up. I noticed it when I gave the "Image Noise Augmentation" node a go, but after I noticed certain features getting smoothed out I bypassed it, so it shouldn't be doing anything.
I'm not really sure what the fuck is going on. It's been persisting after restarting Comfy.
hat
md5: bb805ef064478fcc2bcfc3cfaffc6d52
๐
I told her to take her hat off...
Change the glass with ice to a beer bottle. Change the cup of coffee to a glass of orange juice.
the girl in this image is working at a computer in an office, and typing on a keyboard. keep the same expression.
pretty good
>>105740907How many years until we get the ease of using natural language to tell skynet what to edit but with models that aren't censored dogshit?
>>105740910clothes remover lora is all you need to uncensor stuff. and there will be even better ones to come.
>>105740922Change the text to 'black people cost me thousands in property damage and chimp out inexplicably with no provocation'
the girl in this image is working at a maid cafe in Japan. She is wearing a cute maid uniform with a white apron. She is holding a plate with a vanilla cake slice on it. keep her hairstyle the same and blue and yellow hairclip on the left side of her hair.
>>105740936okay, this one is more cute:
>>105740721bruh what the FUCK are your negatives? this prompt gave me a jumpscare
the girl in this image is working at a maid cafe in Japan. She is wearing a cute maid uniform with a white apron. She is holding a sign saying "Welcome LDG!" in scribbled text. keep her hairstyle the same and blue and yellow hairclip on the left side of her hair.
ugh, my qveen
never thought I'd be a waifufag
>>105740637image gen has been surprisingly helpful for it
>Prompt executed in 01:13:01
>>105740962the image is manga style with halftone shading.
when i was younger i could have been so creative with this infinite tool. now i'm just a gooner that can't think to do anything but generate fucked up pornography.
>some mentally ill kpop insect found the thread
chroma knows some anime now
>>105740961you probably left out the other positives from
>>105740414
convert the image to a manga panel with halftone shading, in black and white.
one more! it also understands multiple directions based on an order:
convert the image to a 4 panel manga with halftone shading, in black and white. in the first panel, the girl is dressed as a maid. in the second panel, she is dressed as a panda bear. in the third panel, she is waving hello. in the fourth panel, she is holding a black guitar.
ok the NAG implementation actually works incredibly well for unslopping Flux art styles, it's just the default scale value is far too conservative
default is 5 and didn't do much so I cranked it to 30, seems like a good value. left is without NAG, right with nag and 30 scale. normally you'd need a lora to get this kind of result
>Art by Beatrix Potter
>>105741084>left is without NAG, right with nag and 30 scale. normally you'd need a lora to get this kind of resultyou can use this to get text on top of your images so that you won't have to bother to do any effort
https://github.com/BigStationW/Compare-pictures-and-videos
>Art by Beatrix PotterI thought Flux didn't know any artist style
>>105741046more that it recognizes some anime styles. but very character oriented, and have yet to find an artist it knows.
>>105741096It knows a bunch of dead famous artists, Beatrix Potter, Van Gogh, Albert Bierstadt etc
plus a small handful of living ones like uhh Jeremy Mann off the top of my head, probably a few others
pretty bad knowledge overall but enough to be useful for some things
i heard there was /g/ay frog poster around these threads
>>105740375since Kontext dropped I don't give a fuck about Chroma anymore, which is weird because Chroma can do NSFW and Kontext doesn't, but like now I want a model like Kontext, that can do image inputs so you don't have to rely on lora characters anymore, this kind of unified approach is probably the future of imagegen, they nailed that shit
lol
md5: 8c4aacc1a8ab86936934a77ea82044b5
๐
https://higgsfield.ai/soul
>call their model "soul"
>the images are really slopped
what do they mean by this?
purchase an advertisement
>>1057404961000 IQ move. Godspeed anon
>>105740496>>works like a charm since the modelโs trained to redirect from NSFWthe model definitely has some layers that activate and does nothing if it sees a NSFW prompt (or image), what if we can locate those protection layers and nuke then instead? maybe the model can do it by itself, we just need to remove the filter
>>105741190omnigen2 failing at similar task
>>105741252The other lora that was uploaded proves its as easy as teaching it nudity/unclothing as a concept, so I'm not sure there's any "censorship" layers beyond the model being totally unaware of nudity. Or if there are, headfucking it with a lora is all you need to do. Case in point, that simple lora that was probably trained on like 30 image pairs given the bad quality totally gets around the problem with a simple dataset and the right captions
>kontext is the best at what it does
>can be trained
What's the catch?
>>105741261omnigen 2 really sucks so it's not surprising (they improved on omnigen 1 but like c'mon it wasn't hard kek), I think the chinks need to focus on making an architecture similar to kontext, you can't make something more elegant thant that, a simple unified model that can do both t2i and r2i
>AttributeError: 'NoneType' object has no attribute 'device'
huh? what they mean?
can kontext also be used as a txt2img model like omnigen2 can?
>>105741273maybe because it wasn't set to flux (was set to sd3)
>>105741267yeah, I think it's either there's some layers that need to be nuked, or they went for a smarter method and simply trained the model to do nothing when going for NSFW prompts, as if the model believes this is how it should be
>>105739466what you even put in negative for clearer text
>>105741289I didn't put any negative prompt on that one, NAG acts like CFG, it improves the prompt adherance and the structure of an image without doing anything special
>>105741252It has some nudity in the dataset, it's pretty much impossible to curate it out entirely of whatever massive image set they used to train it. Plus, without some basic nudity, models fail at basic anatomy. Look at SD3.
It's trained against words like "nudity" "remove clothes" though, for sure. It has no clue what those words mean, because I'd wager the dual dataset captioning was carefully modified to exclude it. Like the other anon said, fixing that problem is as easy as teaching it what those words mean by giving it context.
The inference filters they talk about though, that's for the API.
>>105741308>The inference filters they talk about though, that's for the API.you can make a filter on the layers aswell, like local LLM models like gemma it'll refuse to do some of your requests even though there's no API filters in there
Does Kontext have difficulty with 768x768? It keeps zooming in the image on the left
>>105741329you probably did a bad job on cropping the image before adding it to the model, show me a screen of your workflow
>>105741315>https://old.reddit.com/r/StableDiffusion/comments/1llpsk1/flux_kontext_dev_can_not_do_nfw/n02ompo/They've been pretty clear on what they did to it, and like I said, the inference stuff is for the pro/api version, and it's based on the frontend (ie an external set of filters, like what happens when you access an LLM via a provider's web based frontend and not the API).
Given they use image pairs and then teach it what to do with those pairings, what they probably did via training was teach it that requests like "remove clothes" or "make them nude" should output the exact same image unmodified, because that's exactly what happens when you ask it to do those things without a LoRA.
>>105741337Using the template Comfydev posted the other day
>>105741342remove the fluxkontextimagescale, this shit is ass
>>105740946That looks great! Thanks for that man.
>>105741337>show me a screen of your workflow***catbox.moe
>>105741341the fact that they've got filters on the frontend means they know the training wasn't fool proof either
>>105741341it's here
>Subsequently, we undertook multiple rounds of targeted fine-tuning to provide additional mitigation against potential abuse. By inhibiting certain behaviors and concepts in the trained model,they added a filter on the layers by doing that, that's why the model acts a certain way when you ask for NSFW
>>105741363>the fact that they've got filters on the frontend means they know the training wasn't fool proof eitherof course, no filter is perfect, that's why having more filter is never enough filters
why do some comfy images end in the temp folder and some in the output folder?
What am I doing wrong?
>>105741227is this with a 2 img input workflow? or how are you getting the pepe?
>>105741346Fixed it, danke
>>105741391he probably stitched 2 images together to get that effect
>>105741402you're welcome o/
>>105741406so shoop pepe into an image then say have the green frog standing over the man on the right?
>>105741408yeah something like that
>>105741346what does it do?
I seems speed is faster if you use it
>>105741369>hey added a filter on the layersNo, they didn't, because then a shitty, low effort lora wouldn't work at all unless you found those layers. By "inhibiting certain behaviors and concepts in the trained model", they almost certainly means that they taught the model that certain concepts need to return the same image.
It's real easy too, given how a model like kontext works with image pairs. Control dataset (original images) vs matching images with changes made, and the caption describes those changes, which teaches the model how to make image A look like image B.
So for NSFW concepts like nudity, it's as easy as:
>image A is a clothed woman standing>anti-nsfw caption is : make her naked>however, image B is NOT a naked woman. It's an exact, unchanged copy of image A - a clothed woman standing>model is taught the concept that any request to "make them naked" should return the same image untouchedRinse and repeat for every NSFW concept. The downside is that a LoRA can override those changes easily, which is what we've already seen happen, albeit sloppily for the first attempt
>>105741346why? I'm using it and having no trouble
>>105741342>Q5go for Q8 anon, and if you don't have enough memory, offload a bit of that model to the RAM, like this, you don't want to miss that quality
https://github.com/pollockjj/ComfyUI-MultiGPU
>>105741415It uses a shitty intermediary upscaling method and its inaccurate too. There's plenty of better nodes to scale an image to around the res you want for kontext
>>105741388preview nodes save to temp save output nodes save to output folder
>>10574142112GB and I'm at 11.5GB usage. It's 6s/it for me now, I'll see how much slower it gets with that.
>>105740726put both characters in a scene together where they shake hand. The girl in the black outfit looks at green frog and smiles
>>105741413>By "inhibiting certain behaviors and concepts in the trained model", they almost certainly means that they taught the model that certain concepts need to return the same image.yeah I agree with you, they didn't specifically target layers but by doing this kind of training, there's in consequences some special layers that activate to cuck the output, that's what's happening on the llm models, and for them they manage to locate those layers and they can remove them
https://huggingface.co/blog/mlabonne/abliteration
>>105741441I remember hearing that there was some attempt to ablate layers on flux, but it ruined output quality or basically did nothing, was that a thing?
>>105741450well, flux was never trained to return nothing when you asked for a NSFW prompt, it was censored because it simply didn't know the concepts, so you couldn't "uncuck Flux" since the model simply didn't know, it didn't refuse
>>105741450I think that was when people thought the 'censorship' for flux was in the text encoder or something
>webui still leaks memory
AniStudio save us
>>105741459This is false. The team actually messed up safety tuning the distillation of the model.
If you give Kontext a NSFW image and prompt, it's able to understand the concepts and gen from it. You have to use NAG with strong strength and use the lcm sampler for it to work better.
>>105741506>This is false. The team actually messed up safety tuning the distillation of the model.I was talking about Flux dev, that one doesn't output "nothing" when asking for NSFW, it'll always tries to make it (but it can't since it doesn't know the concepts)
>If you give Kontext a NSFW image and prompt, it's able to understand the concepts and gen from it. You have to use NAG with strong strength and use the lcm sampler for it to work better.really? care to show an example?
>>105741506would love a catbox to try with
>>105741440welp, it's something
>>105741522it was never trained on multi image inputs or multi concepts, so I'm not surprised it's bad at it, maybe XVerse will be the alternative for that?
https://bytedance.github.io/XVerse/
Can't wait till deepseek has kontext and we can run it locally
>>105741486>hire mcmonkey >refuse to incorporate swarmui elements into the main code What did cumfers mean by this
>>105741506>If you give Kontext a NSFW image and prompt, it's able to understand the concepts and gen from it.my theory is that they trained Kontext with a lot of NSFW so that it's great at anatomy, but they also finetuned the model do to refusals, if we manage to remove that refusal trigger we could unlock the full potential of that model
>Put him on a bed with white sheets
LOOOOOOOL
>>105741530>I'm not surprised it's bad at itConsidering it wasn't even built for that purpose and it managed to roughly follow your instructions, I'd consider it a humble but satisfying start. Promising for future models.
>>105741540I'm guessing ego got in the way.
Is it worth it to train the text encoder or do you only train the unet? What I'm supposed to expect from a text encoder training?
>>105741551mission failed succesfully
>>105741167There's potential, but for me it's currently not as fun as Chroma. There's just no easy way to do make the model learn complex poses or do NSFW stuff that Chroma does (even unclothed lora is far behind).
It's a fun model, don't get me wrong. But until it is properly NSFW tuned it is still just a toy to me.
Also we still have to rely on style LoRAs because Kontext is slopped, I'd be more hyped for a Chinese model that is comparable to Kontext.
>>105741208Alright, that model looks like it's got coherence on point for photorealism (at least as good as Flux dev) but its text is bad and it's also not uncensored like Chroma. Still would be a fun model to play with, shame its closed source.
>>105740496Couple of things I've found that might help fellow trainers. A kontext dataset for NSFW works best like this, with the example being how to make a "remove clothes" style LoRA :
>control dataset of clothed women>matching target dataset of the exact same images, only they're naked>for this example, you make the set via kontext itself, telling the model to give the women clothing>can adapt that to all kinds of NSFW concepts by "un-NSFW'ing" images for the control set>caption for the pairings in this case would be "Remove her clothing and make her nude", though you can also specify aspects of the body to teach it those concepts too, ie if she has big tits and an innie with red pubes, also caption "Make her breasts large and give her an innie style vagina with thick red pubic hair">a third dataset, single images of naked women with lots of close ups on tits, pussy and ass, with the captions natural language descriptions of the body featuresIt's similar to Wan training in how, with Wan, it's good to teach it new concepts with video, but it's better to also include higher res images alongside it to add more detail and really drive the concept home via the captions.
>>105741544I could be wrong, but maybe the team didn't safety tune the editing stuff on the model on purpose.
I think their motto is that they just don't want people genning NSFW from scratch or transforming it, but if it exists it's free game.
>>105741590kontext knows how to repose figures, all it is "refusing" is to show nipples or genitals. a lora can fix that though. the hard part is generating figures, in a certain style, or pose. at least the model isnt generating random noise if you say "girl bending over" or whatever.
>>105741594>but if it exists it's free game.yeah, I have a feeling it can unlocked if we're smart enough to fool the model and make it stop triggering the safety layers, I'm sure it won't take long, the power of coomers are infinite
>>105741590Where do you get datasets like that? Porn?
I love how context can basically copy a font style, if you dont have the original typeface, how would you shoop it?
>>105740696Watched the video and experimented a bit, haven't really figured a solution but some thoughts before calling it a day:
The foocus patch and the inpainting controlnet very much don't work together, gotta use either one or the other. (Can't really decide which is better, or in this case less worse)
Self-attention stuff may provide some improvements in some cases but slows down generation too much to be worthwhile imo.
Not sure why differential diffusion is added in the video during car/bear inpainting stuff in the middle. Weren't those uniform mask? Though even under similar conditions, it still provides some minor changes to images without any noticeable performance hit, in my testing.
Can't get a hang of "grow mask by", sometimes the value changes the image, sometimes it doesn't. Couldn't really find a resource for optimal value.
>>105741623original reference:
>>105741631so the letters from the original are duplicated, but the letters that aren't there in the title, still share the same font theme.
>>105741623>>105741631use a stitch node anon
https://github.com/kijai/ComfyUI-KJNodes
>>105741636also other neat details like other layers being unaffected (ie the camels). if I inpainted, odds are those would get fucked up at high denoise.
>>105741642yeah, I got one now: that was a previous gen. here is a good one.
change all the sand to grass. change the water in the pool, to ice.
https://www.reddit.com/r/StableDiffusion/comments/1ln9exc/flux_context_chibifies_all_characters_wtf/
>the ledditors are starting to complain about the manlet effect 2 days after us
we're always ahead of the curve
>>105741616Partially. I use stuff like MetArt, since their photo shoots are high res and the women usually aren't skanks (look up "metart marta e anarti" for example), plus I use nude artist reference images from places like Grafit Studio.
>>105741650also
>remove all the characters in the image>the afros remain
>>105741623>>105741631Flux has been absolutely insane for text editing and graphic design in general. Just revolutionary stuff ever since it came out.
>>105739545It does work reliably enough for me, perhaps you're doing something wrong? Inpainting does give you more accuracy if you want extremely specific areas though.
>>105741659>marta e anartiholy booba
>>105741670yeah, it does two things we were never able to do succesfully
>change the text of an image while keeping the style>being a replacement of character loras
file
md5: d74bf9c4796810dc0ab784cb93f72665
๐
>>105741674Regional prompting works fine for me too when it is one girl has black hair other is blond stuff.
But when the image has multiple different subjects like, 1 girl sits on chair near one edge, 2 people dancing near the other edge it starts slopping and either ignores parts of the prompt or starts shitting out malformed blobs.
>>105741661is that after one gen? look at how it rapes the quality, jesus. wonder if it's adding a watermark or some shit
>>105741704>wonder if it's adding a watermark or some shitno, I think it's just the weakness of the vae encode/decode process, that's lossy
>>105741704it's generally fine I think the output image wasnt the same res as the first one, thats why it's a bit off, it's outpainting a bit at the bottom.
>>105741730green cartoon frog is how to reference pepe apparently
>>105741714the issue is that the width and height are only working on +16 pixels (because of the vae) so you can only go for 1024 or 1008 pixels, but not anything in between, I'm searshing for a way to resize the image so that it's only a multiple of 16 to make sure it won't resize again afterwards and get a mismatch
https://civitai.com/models/1725088/clothes-remover-kontext-dev?modelVersionId=1952266
it got nuked again lol
>>105741757i bet the author is doing it to drum up interest kek
>>105741750Image Resize node from Comfyui_essentials can resize with a multiple_of 16 option if that helps.
>>105740496>what if I run Kontext on the explicit images, instructing it to add clothesKontext is not good at anatomy, so it won't understand how to properly clothe women in dynamic poses. Better to use controlnet and preferably an uncensored model to clothe them. You run into the issue of the data being slopped once you do this though, though it can be mitigated if you use masks.
>>105741774nice, thanks anon
>>105741774>>105741786for those interested, here's how to do it
>you put a Scale Image to Total Pixels node at 1 megapixels (equivalent to 1024x1024 pixels) so that if you go for a giant image you won't get OOM lol>you add a second node that has a "multiple of 16" so that the image is always on the resolutions the model wants
>>105741694This is what I can get. Does need some rerolling and the control over the characters' position in is very approximate. Inpainting would work better. But if you're getting significantly worse result then maybe you can tweak your approach. Obviously, the easiest way is to combine it with a ControlNet, but if you don't have a source image on hand you have to make a custom OpenPose hint image which could take time.
>>105741782I realize that Kontext is a distilled model similar to Flux dev, has anyone tried removing distillation?
>>105741782I'm about half way through the 1k images I'm using, and for the most part, it's clothing them just fine no matter how they're posing (bending over, spread legs, etc). It does work best with standing poses though. It really doesn't like close ups of genitals without the rest of the body for context, but you save those for the third set I was talking about anyway.
Looking at the images done so far, 2/3's of them are pretty much perfect, pic related.
>>105741810>has anyone tried removing distillation?to remove the distillation of a model you need a giant scale finetune, that's why the only succesfull undistilled version of Flux dev is Chroma
>>105741814Yeah, and that's only a thing because of schnell's licensing. Nobody is gonna touch flux dev or kontext for a full tuning.
>>105741811you making them all wear the same clothes or is each one different?
>>105741817and even if the licence was cool, BFL nukes NSFW loras, so they'll also nuke NSFW finetunes, no one is gonna bother wasting tens of thousands of dollars for such a risky project, we'll just have to wait for the chinks to copy the technology and make it uncucked + apache 2.0 licence
>>105741817what if someone does it anyway and releases it into the wild behind 7 proxies? what are they gonna do?
>>105741803Wtf. I've been outpainting and cropping, since when has it been this simple to resize an image?
>>105741826I got an LLM to make a huge (1000) wildcard list of clothing, with sub-wildcards for variation too. As it runs through the batch, it gives each one something different. I'll have to go back and redo a bunch of them though, because some outputs are still naked, topless or unusable for other reasons. In general, it's solid though.
>>105741811perfect? look at that skirt/pelvic curtain combo shit
>>105741421>>105741433Got it working, adds roughly 33% extra time but not the worst, will be great for compositions I've already optimized, danke
>>105741848Amazing work. Is that wildcard list available online?
>>105741845>since when has it been this simple to resize an image?always has been
>>105741862Why don't you make your own? Takes five minutes. Oh wait I forgot people itt are too incompetent for chatgpt even.
>>105741854Literally doesn't matter if they clash. The point is the concept, to teach it to remove clothing. Those are the control images, all that matters is that they have clothes on.
>>105741884You seem butthurt.
>>105741884I don't know enough about clothes to know to ask for all that stuff
>>105740576Best advice I can give you, take long walks, sounds like 'touch grass' nonsense takes but often depression is triggered by stress and nervousness about your life situation, and you have a hard time finding calmness in your head.
When you walk, your brain releases endorphins (and other mood regulators), also if you can, walk outside, when you view things passing you by like when you walk it helps trigger these brain releases more efficiently.
>>105741862Just ask any decent LLM to make nested wildcards for whatever you want, in this case women's clothing. Then tell it how many you want max.
>>105741897ChatGPT, I don't know enough about clothes to know to ask for a wildcard list of clothes, help me out
>>105741906>Generate a list of 1,000 unique nested wildcards featuring women's casual clothing. Organize tops nested within various bottoms, and list dresses separately without nesting them with tops or bottoms. Ensure all items reflect a casual style.
>>105741806I do not get it, I get this.
>>105741987You can use the workflow I used as reference:
https://files.catbox.moe/2q8iq3.png
the only possible error I see in your screenshot is that your default should be to tag each region "3girls", and then only change it to "1girl" as needed if a region gets extra girls.
>>105741540have you looked at the two codebases? they are literally written in different languages
>>105740980get a job, buy a decent rig
>>105742122figure it out, fag
>>105742738>>105741553If you're short on vram and want to speed up the process, you might want to skip text encoder. If you're training concepts that seem to be entirely foreign to the model, it might be worth a shot. Otherwise you might just as well end up lobotomizing it if you train the text encoder while feeding it stuff it's already familiar with. You might also want to freeze it if your dataset is small.