Search - 4rchive

>>106334403
GPT-OSS was just post-trained in 4-bit; the original weights (not published) were in full precision: https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7637/oai_gpt-oss_model_card.pdf (see picrel)

Full quantization-aware training is still a bit of a black magic at the moment. 16-bit training would probably make things simpler for demonstrating if the hypothesis is true.

Even if it was truly possible to competitively pretrain an LLM at BS1, you'd probably still need at least a few tens of billions of training tokens, which would take quite a while even on one high-end workstation GPU. It would be an interesting experiment, though.

Search results for "3b4cb27fe6f9c327ade1b1a577ab5590" in md5 (1)