Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hi! Very cool results. Are you able to share some numbers about the slope of the scaling curve you found, i.e. how performance responds to a growing nr of demonstrations?

Academically I'd also be very interested how much of a data efficiency improvement you achieved with the pretrained model + task specific post-training versus from-scratch task specific training - like, if post training requires say 50 additional demos, and from-scratch on smaller model requires say 250 demos (or whatever) to match performance, that would be an interesting quntification of the efficiency benefit of using the big foundation model



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: