Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That sounds correct except for that total parameter count. 110B per expert at 16 experts puts you just shy of 1.8T. Are you suggesting there are ca. 30B shared params between experts?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: