Hi! Very cool results. Are you able to share some numbers about the slope of the...

Hi! Very cool results. Are you able to share some numbers about the slope of the scaling curve you found, i.e. how performance responds to a growing nr of demonstrations?

Academically I'd also be very interested how much of a data efficiency improvement you achieved with the pretrained model + task specific post-training versus from-scratch task specific training - like, if post training requires say 50 additional demos, and from-scratch on smaller model requires say 250 demos (or whatever) to match performance, that would be an interesting quntification of the efficiency benefit of using the big foundation model