Discussion of Free and Open Source Text-to-Image/Video Models
Prev:
>>105983779https://rentry.org/ldg-lazy-getting-started-guide
>UISwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassic
SD.Next: https://github.com/vladmandic/sdnext
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.com
https://civitaiarchive.com
https://tensor.art
https://openmodeldb.info
https://openart.ai/workflows/home
>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe
>WanX (video)Guide: https://rentry.org/wan21kjguide
https://github.com/Wan-Video/Wan2.1
>ChromaTraining: https://rentry.org/mvu52t46
>Illustrious1girl and beyond: https://rentry.org/comfyui_guide_1girl
Tag explorer: https://tagexplorer.github.io/
>MiscLocal Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Samplers: https://stable-diffusion-art.com/samplers/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage | https://rentry.org/ldgtemplate
>Neighbourshttps://rentry.org/ldg-lazy-getting-started-guide#rentry-from-other-boards
>>>/aco/csdg>>>/b/degen>>>/b/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
>>105987645 (OP)>OP collage almost entirely 1girl shit
>>105987645 (OP)blessed thread of frenzone ;3
>>105986829>but it does matter for vision and videotell me more. also how much did you pay for that thing?
Blessed thread of frenship
detail-calibrated collage of hot 1girls in your local area
>>105987672he also refuses to update the neighbors list ;D
>>105987680i like the frenzone! i liek frens!
How does Lora rank of lightx affect T2I, since there is no motion involved?
>>105987066pay up, or do i still owe from last time?
>>105985680VERY DISAPPOINTED THIS ISNT IN THE COLLAGE HOW DARE U
So do any of you want to do the Technology jam? Any enthusiasm? Should I set up a submission form? Then at the end of submissions we could have a gallery + collage for all the submissions.
If not, that's ok.
>Update comfyui with today's update
>Get this error on every workflow, even the base comfyui one
I can't run anything, not even load an image and I have no idea where does this error come from
>>105987752go ahead try, just put enough time for it
>>105987672turn that frown upside down buddy
>>105987752Been gone for a week, what's the idea? Just some gens from Anons around the theme of technology?
I was thinking about something like that myself a few weeks ago, like a "theme week" or weeks.
How'd you handle submissions? Are we supposed to just pick a single image or multiple? Multiple might be a lot to sift through by the end of the submission period.
this general has turned into a circlejerk
>>105987762>WARNING: Updating Comfy to v0.3.45 WILL break numerous custom nodes due to it using an updated version of numpy! To fix this:>python_embeded\python.exe -m pip install numpy==2.0.0
>>105987797Nice. Mind if I continue this exploration?
Can you share the prompt? I'll do some more comparisons.
>>105987752>>105987788i wont be participating sorry<3
>>105987752ok sure, whats the topic again? Technology?
>>105987809is 2.0 mandatory for this update? I've quite a lot of outdated nodes that I still use and they don't work with numpy 2.0 or more
>>105987834technology + arms_under_breast + abstract
lol
>>105987822>an attractive young woman in a red dress sitting in a wooden chair at an outdoors garden
>>105987841how does that even make sense?
but okay I will try.
>>105987752Why just have people post images in the thread? Right after the thread is baked, let people know what the current suggested theme is, and let people just post their gens if they were inspired.
>>105987834Do not listen to this retard
>>105987841Technology it is
>>105987848or wait, i guess it's meme instead of arms_under_breasts now? not sure of the autistic rules
>>105987870There aren't any. The three tag idea was a brainstormed idea but it can be saved for another time.
The theme is Technology, and only technology. If it's on-topic on /g/, that's the theme.
>>105987836>>105987809Doesn't look like it's a numpy problem, I did this and still have the same error
>>105987645 (OP)GenJam #1 submission form:
https://forms.gle/ZQMNMTaxGxAZZTAD8
Theme: Technology.
Gen whatever you want to long as it's related to technology.
Time for adventures in EQ-VAE, seems important if it increases training speed 6-7x.
>>105987923o i thought it was the top three not just the top one
>>105987942original image catbox? nice skin texture
>>105987823No worries we know you only gen 1 thing
Some loras don't play well with the lightx2v speedup lora.
It's... interesting... but not what I wanted.
https://files.catbox.moe/kcsvr9.mp4
>>105987967You're using the old lora. You're supposed to update it.
I created a ComfyUI node to pick words at random using NLTK and WordNet to craft the ultimate slop-generating machine.
The biggest challenge was preventing the model from generating humans, it REALLY wants to generate nude women exclusively (tried a bunch of popular models, all do. I'm using ZavyChromaXL as it was the one that worked best with this.
I'm sending the results from my home server to a Github repo and I've set it up to autodeploy on push.
I'll produce about 4.000 slop pictures a day until github shuts down my repo (70-110kbs per picture)
https://slop.pictures/
>>105987970I just set it up a few days ago. Where's this new lora at?
Also, it might be the loop function I'm also playing with.
>>105987989im far too embarrassed to show my wan lora folder kek
>>105987980I love how instead of using the API to modify the prompt to pick random words you wrote a bastard node.
>>105987989https://huggingface.co/Kijai/WanVideo_comfy/tree/main/Lightx2v
>>105988019I wanted to have some easy way to block some words and to tweak things like picking words with more word frequency (purely random always generates some word that isn't even a token lmao)
>>105987989you are using the patch model patcher order node right?
guys si this shit real? have amd actually developed proper software to use the npus for stable diffusion?
https://www.tomshardware.com/tech-industry/artificial-intelligence/amd-unveils-industry-first-stable-diffusion-3-0-medium-ai-model-generator-tailored-for-xdna-2-npus-designed-to-run-locally-on-ryzen-ai-laptops
>>105988035Yeah in the thing we call code you can do things like that.
>>105988042>no generation speed benchmarks>can't train
>>105983793>46 participants Ain't no way we're gunna get 46 entries
>>105987953https://files.catbox.moe/qopgf9.png
Hopefully that works, workflow might not be the best, still trying to figure it out.
>>105987969Fuck off you disgusting prick.
>>105988026Yeah, that's what I'm using. I'm pretty sure it's the "WAN Video Loop Args" plugin that's messing things up. I'm doing some more gens to verify.
>>105988068that is why i ask, i have no idea if its good at all
i tried searching for npu in the last few threads and found nothing
>>105987752I want to do it, but I won't be back home until the weekend. Make the deadline until Monday at least please and I'm in.
>>105988042given the power of npus this is likely slower than a 3050 for generation. I think there is already an npu node for 1.5/xl in comfy that does work.
>>105988110would recommend trying the workflow in the OP's rentry guide, but with the updated lightx2v lora
>>105988110>Camera ControlHow well does that work? Will it save me from endlessly rerolling due to Wan never following prompts for camera movements or lack thereof?
>>105988127It was the looping plugin. Without it, it's fine.
https://files.catbox.moe/ctbt68.mp4
>use flux
>it sucks at adhering to camera angle prompts
>try hidream
>surely it'll be better
>nope, same shit
sad
>>105988111The simple answer is they're much slower that a dedicated GPU. You get ease of access of memory at the cost of compute. It probably has performance worse than a 3060.
>>105988042>have amd actuallyno.
>>105988150 >>105988126i use a 1070 lol
i was mostly asking if they are adding some usable software alongside the other shit
>>105988132Zoom and pan work well. Tilt up and down seem to work only on a very small range before the image fucks up. BTW zoom isn't really "zoom", it's moving the camera. You can see the background go out of focus (as expected) which is moving the camera closer, vs zooming in with a zoom lens.
>>105988181Unless they make it 100% compatible with Pytorch you're forever fucked and always a beggar.
>>105988183Thanks. What about "no camera movement"? That's my most common use case.
With Wan I always insert heavy conditioning to prevent camera movement but with some images it's just determined to move it no matter what.
Something that helps a bit: make sure you're prompting for motion through the entire frame. If you actor(s) bodies fill up the frame, then make sure you are engineering movement for both the upper body and legs. If you're only prompting for upper body movement, it's more likely the camera will move in to focus on that.
>>105988196There's a "static" option but I have no idea if it forces the camera to be still.
>Make sure both characters entire bodies are visible to the camera for the duration of the video. The camera should be frozen in place. The camera should never zoom in.
>moving camera, closeup, close-up, darkening, camera zoom, camera moving closer, camera movement, camera changes position, camera hones in, camera transitions closer, camera moves in, camera moves closer
>>105988228How do you people still not understand how these models work? They don't work with negations. "Don't draw the elephant" will always show an elephant. "Don't zoom" = ZOOM.
These are video clips trained with captions (e.g. tags). They do not use antagonistic or negation captions, everything is always positive, meaning if the caption has the word, it's supposed to be in the video.
>>105988261Wan literally include a negative prompt by default in chinese you retard
>>105988261Works fine for me, squirt.
>>105988281>Bro the negative prompt is like writing "Don't show me an elephant" in the positive prompt
Why the fuck is aniwan's OOM treshold so much lower than standard wan.
>>105988327Because you touch yourself at night.
>>105988327because it's not a gguf (there isn't one afaik) and you're not using the multigpu node
>>105988347>multigpu nodeCan you explain how this works? Can you actually combine the vram of two different GPUs? Can it be two GPUs on two different computers on the same network? I was under the impression you could offload the text encode to another PC's GPU
>>105988365custom nodes are cancer shitheaps
It's actually amazing how much is left to optimize. The EQ-VAE stuff seems extremely promising, I didn't realize how shitty even the new VAEs are.
Bro I got this gen jam in the fuckin bag
Are any of the wan clones in the wancomfy repo on HF worth bothering with or is it all just meems?
Why do you guys miss the SD1.5 days?
>>105988522works on my machine
"squeeze her breasts together" in the prompt seems to be making them grow instead
>>105987672>>105988097>complaining about femalesHere you go fags, this one's for you.
kontext is good at stalin'ing photos desu
with the most overposted normie meme of late:
>>105988567is it some panasonic vintage view promt?
>>105988499Back then not everything anime was the same shiny slop style we have now
>>105988499SD1.4~1.5 a magical time.
>People would prompt for different artists in the same prompt, often getting unpredictable and some awesome combinations>Waifus were made out of renaissance oil paintings>Making tranime was difficult early on, weebs shitters were forced to be creative>Inpainting was commonplace. Some users would post high effort gens and landscapes.>New cool stuff released every day, you learned something new every single day>Everyone could run it in their toasters (it was a democratic technology), poorfags could even fine-tune Dreambooth (not even just Loras) on regular gaming gpus>The NovelAI leak resparked the sense of wonder, first time you actually felt AI image gen could replace artfags>Full rank porn fine-tunes were common>Emmawatsonposting>Many of the users in the threads were technical. People back then appeared to have a higher IQ than the current average ldg poster, it was kinda similar to current /lmg/ except for stable diffusion. I feel most of those guys have moved on to other things ever since>Users were truly helpful>Greater diversity of gens, not just 1girl big boobs (ironic as models were more limited then)
>>105988597if he meant females, he might have said females
you're the homo posting shirtless dudes
Nostalgia is the mind killer.
>Chroma 46 out of 50
>hands are still a mess most of the time
Grim
>>105988619"1960 technicolor film still of"
file
md5: f95a760e84b44d5a07af39604cceeac7
๐
>>105988677EQ-VAE is the savior. He's fighting the VAE and it seems to very much underrepresent nsfw, cartoons, anime, etc, and the model has to deal with lots of noise. Basically it's like having a really bad, noisy dataset except at the latents level. In this example you have to see that the model has to reconcile the noise within the VAE itself.
>>105988634>SD1.4~1.5 a magical time.*Was a magical time.
With all that said, we are eating good currently with video gen, which was unthinkable about an year ago. I hope Chroma v50 ends up being good. The model is completely unslopped, and it seems to "know" a lot within its distribution in terms of styles. Prompting for different artists and styles without using autistic danbooru tags is something I miss from the old days, it was fun finding things out ourselves. The Lora potential is great, I imagine that merges will be common again. If all goes well, we will be eating really good.
>>105988637>being this insecure about his sexuality that he thinks genning a topless guy makes one a homoPipe down, no-gen.
url
md5: 0c5aa4b15c3e232b3bf5ed5d4405f527
๐
>wan T2I
>"girl covers her nipples with her hands"
>leaves nipples exposed because niplpes is a bad bad word
>"girl covers her crotch with her hand"
>leaves crotch exposed because crotch is a bad bad word and it won't listen
>https://arxiv.org/abs/2507.15857
new bullshit
>>105988855They don't want to admit it but diffusion is better.
>>105988855>having higher loss = betterretard
>>105988629Anon, 1.5 anime is abyssorangemix territory, literally the same waishit we have now
At least there are actually good anime models like noob nowadays
>https://github.com/kohya-ss/sd-scripts/pull/2157kohya about to have chroma lora training
>>105988881oh what makes it better?
>>105988915Computationally less complex and as this paper shows, requires less training data. 4 repeats vs 100 repeats is pretty extreme and they're already running out of unique tokens for the mega models. I have a feeling a language model designed around the architecture used in video models could be quite good and many times faster and computationally efficient.
Have some of you fiddled around Invoke? How was it?
>>105988042https://www.amd.com/en/blogs/2025/worlds-first-bf16-sd3-medium-npu-model.html
>sd35m>no comfyui support>custom modelwhat the fuck is wrong with them? nobody uses this model, it's not in comfy, nobody will finetune this either. what a waste of effort.
as a 7900XTX owner I want AMD to improve support for actual optimizations like SageAttention and FlashAttention, not this shit.
>>105988949Better than comfy but also worse
Not worth it desu
Turns out undress/uncensor loras need to go after the lightx2v one. Looks so much better now.
https://files.catbox.moe/oxkda7.mp4
What do you guys think of Swarm? Worth using over Comfy? And if not, why?
How can I fulfill my FLIR fetish? Asking here instead of /r/ because they couldnt answer this. I have a fetish for FLIR images plus women or stuff I find hot, but I like the additional layer. How can I make pics like this? AI can do it but not commercial AI accepts material that\s explicit or even lightly suggestive, defeating the point. Do I do this in photoshop? Do I use AI?
>>105989050Train a lora lmao poorfag cloud user
>>105989039It's literally a good UI on top of comfy, and you can use comfy from within the interface, what's there not to like? Good inpainting interface, good built in image history viewer, a decent number of features baked in, like regional prompting.
Remember to submit your gens to /ldg/ GenJam.
https://forms.gle/ZQMNMTaxGxAZZTAD8
Theme is technology.
>>105989109yup I'll definitely do that yes sir
>>105989073It always seemed like adding a temperamental layer on top of an already temperamental program. I prefer the controller-cg repo over it but that's just me.
I've had kontext installed for a while now but just started playing with it in order to remove water marks and prepare training data for a lora. Are there any kontext LORAs i should use to enhance the quality in grainy/blurry images, or are there some better prompts i can use that's not just "enhance and make it look high resolution" or whatever?
>>105988731I have barely any idea what this means but the picture shows a clear difference. can it still save chroma? wouldn't that be something that needs to be done in the training phase? t. brainlet
>>105989238it's a known problem of the model. every pass adds jpeg-like artifacts
>>105989073I think you can just prompt for it as long you know the colors you want
>>105987762I've had Comfy commit suicide on me like that a few times now, forcing me to start all over. Guess I'll wait for fix before updating this week.
>>105988731I'm gonna setup a training session before I go to bed this afternoon, I'm curious as to the difference it'll make, if any.
Saw this one posted on /vp/ diffusion.
>>105989249Well the paper alleges up to 7x training speed improvement. Essentially the model is trying to learn with extremely noisy data, smoothing out the latents means the model has less noise to sort through.
Another way to put it: right now the VAE even given crops, rotations, or scaling of existing images literally puts the latents into entirely different neighborhoods rather than properly conveying to the model that the images contain similar data.
>>105989269>heated floors>nonfunctional radiator heater for aesthetics
the man in the image is wearing a tuxedo and is sitting in a leather chair, counting money in his office. A sign that is hanging overhead says "king of lies". keep the man's expression the same.
kontext seems to work good with the schnell lora, normally you do 4-8 steps, but I tried it with the default 20, seems ok or text is more consistent?
>>105989283perhaps it's summer, radiator off. heat spot on the floor is because window is open and sun was shining. this is how hillary can still win
>>105989267It seems to me the results are clear, the VAE as it stands is like a staticky radio station. This training reduces the static and more specifically conveys to the model that the music is coming from the same song.
Final version of NetaLumina just dropped.
>>105989282sounds interesting if not somewhat demoralizing. 50 epochs of chroma and already in need of retraining? because of something so silly as noisy "images".
>>105989341>it seems the model has no new knowledge of artists (i haven't checked characters yet)>i highly doubt it truly trained for 3 weeks on a B200 with a 10 million datasetHm? What's the point then
>>105989303So the hypothesis is this could improve general semantic understanding during training which we best understand as the "woman laying in grass" malformed bodies or rather, if you gen someone upside down you get body horror. Because an upside down body is treated like a different thing than a person that is upright. This could apply to many other things including fingers which can be in many orientations and configurations. That means an okay sign hand could be in a completely different semantic neighborhood than a pointing hand even though they *should* be semantically similar.
>>105989369It's only a realignment, it might only take an epoch to fix the model's understanding, keep in mind if this truly is a 7x improvement, it would be like running training for another SEVEN epochs. That means a month of training in five days.
>>105988634As someone who was there since basically day 1, this is all cope. Everything is better today. The only thing that is not, are these threads, because 4chan itself went to shit with exposed transexual janitors abusing their power in the threads 24/7 along with the huge influx of retards daily that still ask what can they run on their laptop 3060s, along with the humiliation ritual captcha system, everyone who actually is technical or an artfag who wants to tryhard gen, just left the piss pool and went into specialized communities that actually do these things where these things are actually possible.
>>105989513Or maybe we post here because it's Anonymous and not everything I do I want to be on public record attached to my name.
>>105989513>Everything is better todaythe plastic skin slopstyle spam everyone is proof this is not the case. normies need refinement tools in front of them, not a slop assembly lone
file
md5: 50f730c795e46f6fd60a48446617d2cb
๐
>>105989414Just with my own small training after just a small amount of steps. You can see how much more semantically clearer the latents are.
>>105989369Chroma looks exactly like you'd expect a finetune of Flux Schnell to look imo, I've never observed anything weird that I'd attribute to the VAE personally
2
md5: abffb87a1bce6622ec25dffd5ede6b48
๐
>sage attention 2
sage attention 2
>sage attention 2
sage attention 2
>>105989539This is just false. There's much more diversity now. Look at your old gens from years ago and stop the cope. Same shit as people who glorify the c.ai days in LLMs and then go read those chats. It was good for the time, but nowhere close to anything today.
>>105989592so much this
>>105989539plastic skin is a skill issue, the proof there is more of that content is because generative AI is more widespread than ever
>>105989551You literally don't get it, the VAE at a latent level is full of noise and more specifically, literally believes that
>dog.jpgand
>dog.jpg (rotated 90 degrees)are different class images.
Just look at the basic data, the base VAE is full of noise which is no different than static on the radio while trying to listen to a song.
>>105989620the man in the image is sitting on a white beach chair on a beach resort in Spain. keep the man's expression the same. He is wearing blue swim trunks and a white tshirt.
gotta enjoy free time after playing skyrim
>>105989536that doesn't refute anything he said
>>105989576>Look at your old gens from years agoI did. they have much more sovl and are generally more interesting up until AoM. the synthetic datasets have poisoned the well and there is nothing we can do about it because while the Chinese are smart, they are creatively bankrupt
>>105989895>Chinese are smart, they are creatively bankruptactual facts
>>105989895>>105989906https:/x.com/nise_yoshimi/status/1947608917162545333
>>105989055How do I train one, I have the card but not the knowledge
>>105989269Nice but the problem is always the heat making sense...
>>105989961unless you train a ton on heatmaps it will always be wrong, the models are dumb
>>105989961You get 100 images, caption them, and then train a Lora.
in comfyui how do you change the output directory for images? mine is getting full and i'd rather keep gens from different base images in different folders
>>105989372>it seems the model has no new knowledge of artistsYeah, nor much new characters it seems
At this point I don't know what changed from the base model
>>105987797I'd appreciate it if someone could find out how to prompt Chroma consistently to:
1 - Do 80s style of commercial photography (eg saturated colors) without typing "80s" and ensuring the outputs do not contain texts
2 - Do "live tv footage" style of content without Chroma outputting fucking photos of TVs showing the footage
3 - Do "VHS footage still" content that actually look like the aesthetic of VHS recordings (Chroma used to be able to do that really well in older versions, but it suddenly lost this "ability")
>>105990015I do it like this, it's: directory\filename
just have to change it when I change models, I prefer doing it manually
>>105988499it was just a simpler time.
When using lewd loras in wan, what's a good weight range for anime-style I2V videos? I want to capture the action of a lora but still preserve the original style of the input image.
I get that it probably differs between loras, but what's a good general weight range to at least start with?
>>105989569where. is. the. implementation.
>>105989569and speaking of video, animatediff still lives! https://github.com/Tonniia/EVS
>>105990173A not-shit Lora should work at 1.
>>105990238No. If you use a 3D lora on a 2D image input at 1 weight it makes the characters look more realistic. This isn't desirable.
>>105990256Then durr I wonder what happens if you decrease it.
>>105990266Why do you even bother trying giving advice, idiot? You are the most worthless, petty little retard in this thread.
>>105990038cant you get normal vhs defects just by prompting for them?
>>105990270ignore and report debo
>>105990270Are you really paralyzed by making any decision that requires 1 minute of testing? I don't know why zoomers are so terrified of doing anything on their own and need constant handholding and consensus.
>>105990293>report someone for stating a fact>bro the water is too hot! how do I make it less hot?>turn it down>OMG TROLLING
>>105989372That was the fake finetune from the grifter, the NetaYume one. This is the base finetune from the Neta people.
https://huggingface.co/neta-art/Neta-Lumina
So I guess the rumors were true, they ran out of money to finetune. Wonder if people are going to run with it or not if that is the case where the model is undertrained.
list
md5: f8bbe0d4fb97e1da7a2ba3f7b5de64a2
๐
>>105990224well considering comfyui was 1st on the list originally, then they added Fusion (WHY!) and sparse video gen before it. Then implementing sage 1 and 2 before continuing with the rest of the list (not complaining as this is huge, could of been on the todo list too), who knows what they're going to do before they check off the list
>>105990320>someone asks a simple, perfectly reasonable question>mentally ill sperglord cannot help but get demonstrative and rudeYou're just a stupid, petty retard. You've been the village idiot of /ldg/ for a while now. Time to grow up anon.
>>105990357>how do I make the water less hot and don't just tell me to turn it down
>>105990369Do you always draw false equivalence when someone rightly exposes you as unintelligent?
>>105990346>support multi-gpu inferencewhy do they even bother listing that
>>105990369Anon, the poster you originally applied to was asking for a good starting point for lora weights. Some people have weaker GPUs and it takes longer to generate videos, so it's understandable why they would be seeking a solid base to start their gens from. Please control your aspergers syndrome, it makes the thread worse for everyone.
>>105989372He posted the "real" info in the Neta cord. And I compared the model to some older snapshots of the original, it is mostly better.
>>105990336Testing now, somehow 1.0 is worse than Alpha_full_roundnnnn_ep7_s160000. Maybe I need to adjust prompts and sampling but why though.
Why is there always something just a little bit off about pony loras?
>>105990432>ponycouldn't be the model causing the weirdness, no way.
>>105990336>That was the fake finetune from the grifter, the NetaYume oneThe NetaYume model is a lot better than even this Neta-1.0 version. Set up a workflow that will compare them side-by-side with the same seeds, you will see that NetaYume is almost always more consistent, better anatomy, and better knowledge of niche concepts and tags.
>>105990432it's the tracking symbols that they embed in the pixels
>>105988395eq-vae is a training convergence speed thing, you most likely arent going to see a benefit and the reconstruction might even be worse
>>105990398It doesn't change my answer, there is not "right" answer especially when he's already using a Lora out of bounds (3d lora used on a 2d image).
1) it depends how generalized the lora is
2) it depends on how similar the input image is to the training data
3) it depends on how similar his prompt is to the training data
4) it depends on the seed
So the answer literally is "just turn it down". 1 too high? Try 0.9.
Your water is too hot? Turn it fucking down. Don't ask how many fucking degrees to the left you should turn the handle you absolute toddler.
>>105990396fuck knows but i'm sure there's a reason. wish it was lightx2v instead of fusion. gotta give it to them though, out of the "long video" gen projects, they seem to be the only ones updating their work. https://github.com/DualParal-Project/DualParal for example that claims 1 minute gens hasnt updated in 2 months
So what's even the gimmick of Lumina? The outputs are dogshit.
>>105990484Reconstruction should be the same, you're just adding a new training signal using rotations and scaling which trains the model into understanding images can be rotated and scaled and still be the same image/class/concept. Right now noisy VAEs are like having a image dataset with irrelevant alt tags interspersed in a correct caption. And after thinking about it, it's likely why upside down people end up being body horror messes because the model treats them as a different class of images rather than a human concept that is upside down.
>>105990569It's a fifth the size of Flux.
>>105990583I understand the rest but what does vace mean fren?
>>105990498See, the problem with you is that you're mentally ill, and clearly a total loser in real life. Clearly you have some internal frustrations, and need to vent out at others to cope.
>>105990598You know if I'm wrong you can actually attack my assertion rather than do an ad hominem. This is a classic case of a blown the fuck out zoomer because you know I'm right, there is no "right" Lora setting. Either prove 1-4 wrong or just shut the fuck up. And this is why no one else even bothered replying to your "rational" and "sensible" question. Because you asked a retarded question no different from "how do I make the water less hot and don't tell me it's dependent on my water heater and pipes".
>>105990628No, at this point I'd rather just say it like it is: you're a boring loser.
Nobody cares enough to have a pointless argument with you. You're a rude sperg who lashes out at others, and a tumor on /ldg/. I look forward to the day you inevitably commit suicide.
>>105990628>This is a classic case of a blown the fuck out zoomer because you know I'm right
>>105990628take your meds aspy
I love it because no one still hasn't given an answer. Either samefagging or buttbuddies, either of which is funny because no one will give an answer.
>>105990668>HAHA, NOBODY WILL ARGUE WITH ME AND ANSWER MY STUPID QUESTION! THIS MEANS I WON! I TOTALLY BLOWED THE FUCK OUT THAT GUY!
the ability for kontext to emulate fonts/copy styles is so neat, could use it to dupe a font or make your own even if you didnt have the .ttf.
>>105990668why are you digging yourself deeper and deeper into a hole? just swallow your pride and move on, sperg-kun.
>digging yourself deeper
>still no one has given an answer
Every minute I'm proven right :)
the topic is technology right, does this count?
>>105990715Change the text "Kingdom Come Deliverance" To "LDG General experience". Replace the man on the right with Miku Hatsune. Put a sign that says "LDG" on the large house at the back.
>>105990734but at the end of the day you're still a chubby, miserable loser with aspergers syndrome.
>>105990743can you add men with tinfoil hats fist fighting at the background
>>105987752I don't understand why this has to be such a formal thing instead of doing what /aicg/ does with anchors except with themes
>>105990747Feel free to ask ChatGPT with our full conversation about which one of us is more likely to be mentally ill.
>>105990743remove all the armored characters from the scene. remove the gold logo. remove the white text. replace the house in the background with a McDonalds restaurant.
it's neat how the model can pick up these elements and replace them desu
>>105990398>Some people have weaker GPUs and it takes longer to generate videosso buy a better GPU then.
how come that whole AI hobby is filled with so many retarded poorfags who cant even afford a 5 year old GPU?
>>105990734It's quite bizarre behavior that you're still hung up on the "argument" from dozens of minutes ago. Nobody really cares, ppl are just rightly labeling you a sperg, but clearly your autism just cannot move past any internet argument lol.
Prompting with Lumina is like browsing deviantart in 2006
>>105990766Because being like /aicg/ is not a good thing.
>>105990771Evidently lonely losers like you rely on ChatGPT for a lot of things :)
>>105990759"several men with tin foil hats are fist fighting in the background."
>>105990793You seem quite upset about something you don't care about.
>>105990796>like browsing deviantart in 2006Levels of kino soul previously believed to be unobtainable
>>105990785>just have more money!
>>105990793being smug on 4chan is all he has going for him in his life, please understand. if he had a worthwhile life outside of these threads he wouldn't willingly choose to have shit throwing contests here
>>105990806change the setting to a sunny beach.
fun in the sun
>>105990811You're free to project your emotional insecurities on me at your leisure. Unfortunately you've already embarrassed yourself quite a bit.
>>105990798being like ribbit isn't much better
>>105990571yeah my bad for making assumptions
N.I.B
md5: dc0e00fc0a67d00d7650477f5cd3bbfb
๐
>>105990881/dmp/ does albums.
/agdg/ does game jams.
We can do genjams. If that offends you, you're free to cry about it.
>>105990892Something else to add and likely why we see such a speed up in convergence, is we're moving transformation and rotations to the VAE rather than forcing the diffusion model to learn transformation from scratch. Right now the diffusion model has to learn how align / re-align / transform images and it is a significant computational problem and we can consider how we avoid this problem in other industries with concepts like Axis Aligned Bounding Boxes rather than expecting downstream components (e.g. the diffusion model) from having to figure it out. So maybe it's easier to describe it as "Axis Aligned Latents".
https://www.reddit.com/r/StableDiffusion/comments/1m5wpmv/flux_kontext_psa_you_can_load_multiple_images/
interesting workflow, good for multi image stuff.
the anime girl is holding the pink hair anime character with one hand. keep their expressions the same.
>>105991063That post is the perfect simple of why we need factory nodes.
>>105991063diff first image:
>>105991076I like comfy customization but it's easy to get confused with some setups, but at least once you have a workflow it can be saved/reused easily.
>>105988327Wild guess, because it uses more vram ?
>>105991110the man in the blue shirt is holding the pink hair anime character. keep their expressions the same.
>>105988634>People would prompt for different artists in the same prompt, often getting unpredictable and some awesome combinationsYes, I miss this, and also combining celebrities to create new interesting looks, sadly we will most likely never have this in a base model again, as contemporary artists are seemingly never trained by name due to avoiding angry attacks, and extremely few celebrity / public persons are there if any.
Loras fixes this, however you will never have the same ease as where they were all in the base model, when you have to manually load the loras you need for each artstyle / celebrity etc.
>>105988677Are you running some lora ? Because everything in this 'pixel style' looks wonky as fuck, not just the hands.
If you are expecting Chroma to have Flux dev level hands, that's not going to be the case for the base model at least, Flux dev is massively overtrained on a set of specific hand gestures from synthetic data, so the hands look anatomically correct 90% of the time, they are also recycled from a very small pool of hands 100% of the time.
>>105991258one more, have a todd
>>105988995AMD is sadly so far behind it's not even funny
We need competition to NVidia, yet it's like they're not even trying.
Maybe it's not such a good idea when NVidia's CEO and the CEO of their direct competitor AMD are cousins, just sayin.
>>105990913>If that offends you, you're free to cry about it.spoken like a true ribbiter
>>105987821Alice in Cunnyland
>>105991377Don't worry we already know you're a moron.
>>105989203But it was buggy shi aaaaaaagghh!
>>105991369Apple and Intel are Nvidia's real competition, not AMD.
>>105988650Jewish hands wrote this post as they rewrite history in order to increase their power
'eat the new slop goy, that old stuff you thought was good, it's just nostalgia, kill it and give us shekels as we produce new shit content which is nothing but globohomo propaganda'
I'll take nostalgia, thank you very much
>>105991389lel
lel * 2 actually
Apple are so bad at AI that they are trying to pass it off as a 'fad'
Intel is a dead company, they're even worse at AI than AMD
How can you be so behind the times, anon ?
>>105991457>t. AMD fanboy
>>105991465No, I WISH AMD / Intel would compete against NVidia, that would mean we would finally get decent vram on consumer hardware, and likely much better performance as well.
AMD sucks at AI, Intel sucks worse at AI, Apple are so bad they're not even in the AI space at all in any real capacity.
Nvidia is coasting on the billions of investments into AI they made throughout this decade, that's great for them but it sucks for consumers since without any actual competition, NVidia will improve their consumer AI capacity at a glacial pace, and still have you pay high prices for these small iterations.
>>105991533intel's datacenter cards are actually quite good for inference and their network stack is far better than amd. All of amd's npus are pretty shit next to M4. I am not sure what world you live in anon.
so if I enter a character's booru tag for example "haruno_sakura" the anime models will automatically gen Sakura without the need of a LoRA?
>>105991661If the base model knows the character: yes.
>>105991670How to find base models are trained on booru tags without reading the description of each one by one?
>>105991661Depends on the representation within the dataset. Some obscure characters you have to help it by specifying hair and clothes.
>>105991681If you want just 1girl then get WAI illustrious.
>>105991637Only gives remotely decent inference on LLM workloads, they have NOTHING on the consumer level, NOTHING for video / image generation, too slow for training...
No, anon, Intel is WAY behind, AMD at least have powerful GPU technology which can compete against NVidia, they have just wasted SO MUCH TIME at this point.
6 more days until 1024 chroma starts training my niggaz
>>105991761can't wait for the cope when it's finally done
>>105991533Intel and AMD are competing fine in inference at certain pricepoints. But at the very top, yes, it's Nvidia or nothing. It's a meme that you can't use either for your image/video genning needs though. There is a reason the A770 is not below $250 USD on used markets. It's the cheapest GPU that can run AI with BF16 support with 16GB of VRAM, which means it's better than even Volta.
>>105991637The datacenter cards are "good" if you can get good prices for them. Gaudi puts up good enough results on MLPerf but it's ASIC specific inherited from Habana and the software it uses is a dead end. Intel's Ponche Vecchio which is their GPU line that actually has a future hasn't done enough to make them compelling. Furthermore, being outdated, they have not been written them down on their balance sheet to make that happen. Geohotz wrote about this economic dilemma this year. He never worked on their GPUs because of what they wanted for their cards.
https://geohot.github.io//blog/jekyll/update/2025/03/24/tragic-intel.html
>>105988834based text encoder hacker
>>105991781It's easily the best model for realistic images already, no more Flux dev plastic skin, cleft chin, dead eyes, mutilated nipples, missing genitalia.
Anything the 1024 epochs adds will be icing on the cake, Chroma is THE new community base model for NSFW images, and is fantastic in combination with Wan i2v. Also there will be a flood of loras once we see final release.
>>105991761I have high hopes for chroma, it's already pretty good. How much of a difference should we expect from 2 epochs at 1024?
>>105991780so are you intentionally just shitting it up or what?
Now that netalumina flopped, what's the next cope for animebros?
Illu 3.5 vpred? Haha...
>>105991864Literally just give me WAI with updated dataset.
>>105991864Who is going to pour money into training more shit specifically for anime? Illustrious hasn't shown any signs of life or indication they are doing anything particularly beyond illustrious and their alpha Lumina finetune even if done months earlier than Neta showed even less promise than how Neta panned out.
Neta was on mostly the right track but the fact that they ran out of money before getting to the finish line is heartbreaking. I suspect we'll see a couple of Loras and etc. but that's it. It won't be the next SDXL and people aren't going to hop to Lumina en masse until upgrades to their GPUs are cheaper.
>>105991856Not really expecting that much, it's just two epochs, but 1024x1024 is 4 times the resolution of 512x512, so there should at least be a bump in detail retention.
>>105991856the one thing chroma is bad at right now is multi subject images, hands/faces/anatomy is all fucked in those so i hope that's what will improve
apparently the last two epochs will take a whole month because of how much slower 1024 training is, so we'll see
>>105991861Well I have to, they're 10.
any wan ass grab loras? specifically where one character grab's another character's ass
>>105991936you do not have to post, it is not compulsory
please stay in your containment board, annoying.
>>105991985there's Ass Stretch/Grab on civit
>>105991995Why would I care what you think though?
>>105991934They might throw more GPU at it for these last two epochs though.
I'm really looking forward to seeing the loras people will make with Chroma, no more uncanny plastic look that plagues all the Flux dev loras no matter how hard the creators try to avoid them.
>>105992005nta but fuck off. why are you posting censored porn, retard
>>105992003>there's Ass Stretch/Grab on civitThat's 1girl though. It probably won't work with one character doing it to another.
>>105992012You sound upset, my friend.
>>105992012he is a schizo & crossposting in vp also trying to ruin things like a mongoloid
>>105992033>also trying to ruin thingsRuin what?
How do I convert Wan t2i finetunes on tensor art to gguf?
specifically this one:
https://tensor.art/models/864231482397327022
I tried to use llama.cpp>python convert_hf_to_gguf.py but that doesn't work.
>>105992025ok, don't use it then and just sit there twiddling your thumbs
>>105992040you stick out like a sore thumb
if i want to make a lora, should i train using the ancient Novel AI leaked checkpoint or should i use something newer like NoobAI? NovelAI is the grand daddy of all the anime checkpoints, but its old. the rentry guide said using NovelAI is your best bet for a robust lora that'll work with any other model.
>>105992058Anon, do you ever wonder what life might've been like if you didn't have autism?
>>105992062Still not sure what I'm ruining, my friend. You seem quite angry though.
I made a wan 2.1 14b fun camera lightx2v workflow for t2v which has teacache and other speed enhancements:
https://files.catbox.moe/3k4vy0.png
I'd appreciate any feedback for improving it.
>>105992067yes and it seems hella lame and gay
does anyone know how to fix a penis that's been tied in a knot? i was just messing around and didn't actually think it would work, but then i got a little bit of a chub, and it's super uncomfortable
>>105992010the thing i'm excited about is fpgaminer (joycaption and bigasp guy) said he's considering doing a finetune of chroma once it's done training
>>105990899>>105990742>>105990743alright i was right last time
30 bucks triple or nothin
goin for the trifecta box
all in collage
$ $ $
I'd be more excited about Chroma if the hands and feet weren't fucked up.
>>105992119I think Chroma will be the 'go to' for most finetuning going forward given that it's trained the plasticity and censorship out of Flux Schnell, and also has a fully permissive license.
Perfect to use as base model for further finetunes.
>>10599217545 and 46 show significant improvement.
>>105992176the training shit is kinda slow tho.
I hope the next finetunes arent gonna take fucking months.
testing the new Lumina. still has very bad artist knowledge/activation, unless I'm doing something wrong. trying to get shinkawa youji, this is obviously not very close. still kinda cool.
>>105992223Fine tuning is always going to be slow
If you want fast, train loras
First attempt at using my wan fun camera control t2v workflow with teacache and lightxv2 lora plus a custom Kuroki Tomoko lora. It's badly animated, but the camera control works. Her eye color is wrong, obviously.