Willeson, who formulated the term “immediate injection” in 2022, is always looking for LLM weaknesses. In his position, he notes that the reading system reminds him of his reminder of warning signs in the real world that hints of previous problems. He writes: “The system’s router can often be explained as a detailed menu of all the things that the model used before it is told that it is not done.”
Fighting the compliment problem
Willeson’s analysis comes as artificial intelligence companies deal with cycop behavior in their models. As we informed in April, Chatgpt users complained of a “unnecessary positive tone” of GPT-4O and excessive compliment since the March’s Openai’s update. Users describe the feeling of “butter” with responses like “A good question!
The problem stems from how companies collect the user’s notes during training – people tend to prefer responses that make them feel satisfied, and create a noteing loop where the models learn that enthusiasm leads to higher assessments of humans. In response to the reactions, Openai fell later from the 4O of ChatGPT and changed the system as well, which is something that we have informed of and Willison It was also analyzed at that time.
One of the most interesting results in Willeson about Claude 4 relates to how Antarubor directs both models Claude to avoid the behavior of Sycophantic. “Claude never begins to respond by saying a question, idea, good, wonderful, wonderful, deep, or excellent, or any other positive adjective,” Antarbur wrote in the claim. “It exceeds the compliment and responds directly.”
The most highlight of the system
CLADE 4 also includes extensive instructions on when Claude or should not use lead points and legs, with multiple paragraphs to discourage repeated lists in an informal conversation. “Claude should not use lead points or numbered lists for reports, documents or interpretations, or unless the user explicitly asked about a list or arrangement,” says the demands.