Great to see this here. We used this dataset from Tiny Stories to train small mo...

Great to see this here. We used this dataset from Tiny Stories to train small models (as small as 20M params) and test out knowledge addition. Published a paper based on this dataset. We could get coherent outputs at sizes as low as 20M-25M. (though not as great as LLMs, but still decent enough).

[1]: Blog + Paper: https://medium.com/@ankit_94177/expanding-knowledge-in-large... (Paper is titled: Cross-Domain Content Generation with Domain-Specific Small Language Models)