Search results for "bd27344ea08895af8c1163ecbcfed00f" in md5 (4)

/g/ - /lmg/ - Local Models General
Anonymous No.106481882
►Recent Highlights from the Previous Thread: >>106475313

--Paper: Binary Quantization For LLMs Through Dynamic Grouping:
>106478831 >106479219 >106479248 >106479257 >106479312
--VibeVoice model disappearance and efforts to preserve access:
>106478635 >106478655 >106478664 >106480157 >106480528 >106478715 >106478764 >106479071 >106479162
--GPU thermal management and 3D-printed custom cooling solutions:
>106480670 >106480698 >106480706 >106480719 >106480751 >106480797 >106480827 >106480837 >106480844 >106480875 >106481348 >106481365 >106480858 >106480897 >106481059
--Testing extreme quantization (Q2_K_S) on 8B finetune for mobile NSFW RP experimentation:
>106478303 >106478464 >106478467 >106478491 >106478497 >106478519 >106478476
--Optimizing system prompts for immersive (E)RP scenarios:
>106477981 >106478000 >106478547 >106478214 >106478396
--Assessment of Apertus model's dataset quality and novelty:
>106480979 >106481002 >106481005 >106481016
--Extracting LoRA adapters from fine-tuned models using tensor differences and tools like MergeKit:
>106480089 >106480116 >106480118 >106480122
--Testing llama.cpp's GBNF conversion for complex OpenAPI schemas with Qwen3-Coder-30B:
>106478075 >106478122 >106478554 >106478574
--Recent llama.cpp optimizations for MoE and FlashAttention:
>106476190 >106476267 >106476280 >106476290
--Proposals for next-gen AI ERP systems with character tracking and time management features:
>106476001 >106476147 >106476263 >106477114 >106477147 >106477247 >106477344 >106477773 >106477810 >106478561 >106478636 >106477955 >106477268 >106477417
--B60 advantages vs RX 6800 and Intel Arc Pro B50 compared to RTX 3060:
>106475539 >106475563 >106475606 >106475639 >106475661 >106475729 >106476927 >106476939 >106476998 >106476979 >106477012 >106477117 >106481021 >106481030 >106481067 >106481241
--Miku (free space):
>106475807

►Recent Highlight Posts from the Previous Thread: >>106475316

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
/g/ - /lmg/ - Local Models General
Anonymous No.106328695
►Recent Highlights from the Previous Thread: >>106323459

--Paper: Every 28 Days the AI Dreams of Soft Skin and Burning Stars: Scaffolding AI Agents with Hormones and Emotions:
>106324268 >106324309 >106324351 >106324423 >106324488 >106324523 >106327026 >106325013 >106324996 >106325086
--Debate over whether quantization errors compound during inference in language models:
>106326398 >106326462 >106326471 >106326477 >106326533 >106326556 >106326567 >106326784 >106326879 >106326900 >106326937 >106326645
--LLMs as dense internet interpolation and the rise of monetized data access:
>106323481 >106323558 >106323588 >106323598 >106323605 >106323634 >106323667 >106323696 >106323713
--ByteDance releases Seed-36B with synthetic data and token-aware reasoning:
>106325470 >106325496 >106325497 >106325756
--Comparing newer large models against 70B Llama for local inference:
>106326968 >106327006 >106327017 >106327008 >106327101 >106327167
--Local LLM-powered fortune-telling plugin for Minecraft with expansion ideas:
>106325108 >106325125 >106325132 >106325143 >106325151 >106325161 >106325466
--DeepSeek V3.1 likely replaces R1-05 and V3-0324 without official documentation:
>106323532 >106323555 >106323574 >106323582 >106323608 >106323789
--Debate over practical utility of Gemini's 1M context window despite performance issues:
>106324537 >106324545 >106324656 >106324721 >106324976 >106325185
--VRAM price surge and speculation on market saturation from used enterprise GPUs:
>106324819 >106324893
--Deepseek Reasoner fails scenario understanding without forced planning prompts:
>106324954 >106325062
--SanDisk HBF 4TB VRAM breakthrough vs CUDA ecosystem dominance:
>106325613 >106325621 >106325631 >106325645 >106325917
--Logs:
>106324547
--Miku (free space):
>106325613 >106327191 >106327263

►Recent Highlight Posts from the Previous Thread: >>106323466

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
/g/ - /lmg/ - Local Models General
Anonymous No.106171832
►Recent Highlights from the Previous Thread: >>106167048

--Papers:
>106170128 >106170167
--Prioritizing NSFW LORA preservation amid infrastructure and redundancy concerns:
>106167927 >106167949 >106168075 >106168122 >106168043 >106168067 >106168169 >106168208 >106168211 >106168238 >106168277 >106168305 >106168351 >106168377 >106168392 >106168399 >106168425 >106168448 >106168619 >106168442
--High-speed CPU-only LLM inference with GLM-4.5 on consumer hardware:
>106168800 >106168825 >106168868 >106168847 >106168903 >106168905 >106168940 >106168974 >106168991
--Missing CUDA DLLs prevent GPU offloading in newer llamacpp Windows builds:
>106168428 >106168441 >106168450 >106168577 >106168616 >106168670 >106168691 >106168704 >106168715
--Difficulty reducing model thinking time due to token-level formatting constraints:
>106170269 >106170300 >106170348 >106170361 >106170404
--CPU outperforming GPU for GLM-Air inference on low-VRAM systems:
>106168713 >106168787 >106168814 >106169109
--GPT OSS underperforms on LiveBench despite reasoning and math strengths:
>106167476 >106167550
--Anon purchases 384GB of HBM2 VRAM for $600:
>106168337 >106168343 >106168345 >106168366 >106168377 >106168392 >106168399 >106168425 >106168448 >106168619 >106168462 >106168469 >106168506 >106168571 >106168488 >106168505 >106168517 >106168528 >106168606
--High RAM investment for local GLM inference raises performance and practicality concerns:
>106169135 >106169148 >106169161 >106169197 >106169223 >106169230 >106169278
--Anon finds Dual P100 64GB board for $200:
>106169635 >106170934 >106170984 >106169662
--Satirical timeline of LLM evolution with exaggerated eras:
>106167190 >106167237 >106168679 >106167530
--NEO Semiconductor's X-HBM promises 16x bandwidth and 10x density if viable:
>106169723
--Miku and Dipsy (free space):
>106167506 >106167362

►Recent Highlight Posts from the Previous Thread: >>106167057 >>106168982

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
/g/ - /lmg/ - Local Models General
Anonymous No.105564855
►Recent Highlights from the Previous Thread: >>105557036

--Paper:
>105562846
--RSI and AGI predictions based on reasoning models, distillation, and compute scaling:
>105560236 >105560254 >105560271 >105560267 >105560315 >105560802 >105560709 >105560950
--LLM chess performance limited by poor board representation interfaces:
>105562838
--Exploring and optimizing the Min Keep sampler for improved model generation diversity:
>105558373 >105558541 >105560191 >105560244 >105560287 >105559958 >105558569 >105558623 >105558640
--GPT-SoVITS model comparisons and fine-tuning considerations for voice cloning:
>105560331 >105560673 >105560699 >105560898 >105561509
--Meta releases V-JEPA 2 world model for physical reasoning:
>105560834 >105560861 >105560892 >105561069
--Activation kernel optimizations unlikely to yield major end-to-end performance gains:
>105557821 >105558273
--Benchmark showdown: DeepSeek-R1 outperforms Qwen3 and Mistral variants across key metrics:
>105559319 >105559351 >105559385 >105559464
--Critique of LLM overreach into non-language tasks and overhyped AGI expectations:
>105561038 >105561257 >105561456 >105561473 >105561252 >105561534 >105561535 >105561563 >105561606 >105561724 >105561821 >105562084 >105562220 >105562366 >105562596 >105562033
--Concerns over cross-user context leaks in SaaS LLMs and comparison to local model safety:
>105560758 >105562450
--Template formatting issues for Magistral-Small models and backend token handling:
>105558237 >105558311 >105558326 >105558341
--Livestream link for Jensen Huang's Nvidia GTC Paris 2025 keynote:
>105557936 >105558070 >105558578
--UEC 1.0 Ethernet spec aims to improve RDMA-like performance for AI and HPC:
>105561525 >105561601
--Misc:
>105563620 >105564403
--Miku (free space):
>105560082 >105560297 >105562450 >105563054

►Recent Highlight Posts from the Previous Thread: >>105557047

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script