Agreed on the second part. Correcting for bias this way might average out the sc... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		harrisonjackson 10 months ago \| parent \| context \| favorite \| on: People are just as bad as my LLMs Agreed on the second part. Correcting for bias this way might average out the scores but not in a way that correctly evaluates the HN comments. The LLM isn't performing the desired task. It sounds possible to cancel out the comments where reversing the labels swaps the outcome because of bias. That will leave the more "extreme" HN comments that it consistently scored regardless of the label. But that may not solve for the intended task still.

rahimnathwani 10 months ago [–]

  The LLM isn't performing the desired task.

It's 'not performing the task', in the same way that the humans ranking voice attractiveness are 'not performing the task'.

I wouldn't treat the output as complete garbage, just because it's somewhat biased by an irrelevant signal.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact