Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Of course I have a few where I web scrape and build a dataset for myself with prefix tokens. I can break that down more on a specific stream about it.


well not so much as the raw data acquisition (scraping and stuff), but really data prep for finetuning. I'm hearing that each model needs it in a different format - chat finetuning data is different from instruct, etc etc




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: