Agreed on the second part. Correcting for bias this way might average out the scores but not in a way that correctly evaluates the HN comments.
The LLM isn't performing the desired task.
It sounds possible to cancel out the comments where reversing the labels swaps the outcome because of bias. That will leave the more "extreme" HN comments that it consistently scored regardless of the label. But that may not solve for the intended task still.
The LLM isn't performing the desired task.
It sounds possible to cancel out the comments where reversing the labels swaps the outcome because of bias. That will leave the more "extreme" HN comments that it consistently scored regardless of the label. But that may not solve for the intended task still.