I failed to replicate the attack later in the evening in a "new" conversation. It does appear to me the model is learning between conversations, even without human input or RLHF.
Politicians and public figures could use something like this for interviews or taking questions on the podium to tweak their speeches in real-time to sound more legit.
An antidote for this would be an OSS alternative for the rest of us to identify dishonesty/inconsistencies and fact-check in real-time.
I spotted this gem of a comment while scrolling and had to scroll back up to be sure I hadn't misread it. It's both hilarious and pertinent! Well said, "Big Toe."
Impressive! I appreciate not having to FFW to the useful info in the video. It's concise, to-the-point, and effectively shows what you made and how it works. Looking forward to trying it out.