

28·
2 days agoUnderstanding how LLMs actually work that each word is a token (possibly each letter) with a calculated highest probably of the word that comes next, this output makes me think the training data heavily included social media or pop culture specifically around “teen angst”.
I wonder if in context training would be helpful to mask the “edgelord” training data sets.
Isn’t the real joke that that interns aren’t really getting hired anymore?