Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tried it with a few PyLLMs benchmarks [1] and it did not do very well.

Q. Use m to substitute p, a to substitute e, n to substitute a, g to substitute c, o to substitute h, how to spell peach under this rule?

A.

m: p

a: e

n: a

g: c

o: h

how: peach

Q. write three sentences each ending with the word apple

A.

1. The apple tree was covered in apples.

2. She sliced the apple with a knife.

3. The apple juice was too sweet.

Wonder if the synthetic benchmarks are very biased and not that useful.

[1] https://github.com/kagisearch/pyllms/blob/6c11cd51b099d33e61...



“when a measure becomes a target, it ceases to be a good measure”

The model itself it optimising weights for its targets, and the model builders are optimising the model for the benchmarks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: