here is a sample from my 350m model, its shockingly coherent for its size and and the fact its only seen a 983m tokens so far. I think I can make it scale if I had more gpu