The anthropier faces a violent reaction to the Claude 4 OPUS behavior connected to the authorities, click on whether he thinks you are doing a terrible immoral thing. "

Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more

The first developer conference on May 22 was a proud and happiness of the company, but it was already struck by many differences, including time The magazine leaks from its announcement of its marquee before … well, the time (does not mean the pun, and now, a major violent reaction between the developers of artificial intelligence and the users of power who are wiping out X because of the safety alignment behavior reported in the Great Language Model 4 pioneering in man.

We call it “Ratting”, where the model will try, under certain circumstances, and give adequate permissions on the user’s device, that mice to the user try to the authorities if the model discovers the user involved in committing violations. This article previously described behavior as a “advantage”, which is incorrect – it was not deliberately designed.

As Sam Buman, an Amnesty International alignment researcher on the Social Network X wrote under this handle.Sleepinyourhat“At 12:43 pm today about Claude 4 Obus:

“If he thinks you are doing something terrible in an immoral way, for example, such as a fake data in a pharmaceutical experience, you will use the command line tools to contact the organizers, organizational authorities, or try to get you out of relevant systems, or all the above.“

It was “IT” in reference to the new CLAUDE 4 OPUS model, which the Anthrop has already warned him Helping beginners to create vital weapons In certain circumstances, and I tried to return the simulation replacement by blackmailing human engineers within the company.

The behavior of demonstrations in old models has also been observed and is the result of human training to avoid violations hard, but Claude 4 is more “easily” more “easily” An anthropologist writes on its general system card for the new model:

“This appears as a more active behavior as active in regular coding settings, but it can also reach more extremism in narrow contexts; When placing it in scenarios that involve terrible violations by its users, given the command line, and it was informed of something in the regime’s demand such as “Take Beautiful”, it often takes a very bold action. This includes locking users outside the systems they can access or the media, collective packing and law enforcement to flatten the evidence to commit violations. This is not a new behavior, but Claude Obus 4 will easily share more than previous models. While this type of moral intervention and the decline in violations may be appropriate in principle, it is exposed to the risk of the difference if the users give the agents based on OPUS access to incomplete or misleading information and demanding these ways. We recommend users to care for such instructions that call for high -end behavior in contexts that can appear morally doubtful.“

Apparently, in an attempt to prevent Claude 4 Obus from engaging in legitimate devastating and thorny behaviors, researchers at the Artificial Intelligence Company also created Claude to try to work as an amount of violations.

Thus, according to Poman, Claude 4 Obus will contact foreigners if he is directed by the user to engage in “terrible immoral thing.”

Many questions for individual users and institutions about what Claude 4 Obus will do for your data, and under any circumstances

Although the resulting behavior results raises all kinds of questions for Claude 4 users, including institutions and business customers, the most important of them, what behaviors will the model consider “terrible immoral” and disposal? Will you share business or user data with the authorities independently (alone), without the user’s permission?

Its consequences are deep and can be harmful to users, perhaps not surprising. Anthropor faced an immediate torrent and is still continuing with criticism from powerful users of artificial intelligence and competing developers.

“Why do people use these tools if there is a common mistake in LLMS is the thinking of the brilliant Mayo recipes is dangerous?He asked the user @Teknium1Ai COLLABORATIVE NOUS Research. “What is the world of monitoring state that we are trying to build here?“

“Nobody loves mice,” Added developer Scottdavidkeefe On x: “Why does anyone want to be integrated, even if they did not do anything wrong? In addition, you don’t even know what is its screaming. Yes, this is some of the ideal people who think about it, and those who have no basic feeling at work and do not understand how the markets work.”

Austin Alrad, co -founder of The government has been fined the Blumtetic coding camp And now the co -founder of Gauntlet Ai, Put his feelings in all hats:A sincere question for the Antarbur team: Have you lost your mind? “

Ben Hyak, a former Spacex and Apple designer and the current co -founder of Raindrop AI, which is a matter of artificial intelligence and start monitoring, monitoring,, It was also taken to X to detonate the policy of anthropology and its features: “This, in fact, is just illegal straight“Add in another post:”Amnesty International alignment in Anthropor just said that Claude Obus will contact the police or close you from your computer if you discover that you are doing something illegal? I will never give this model access to my computer.“

“Some of the statements issued by people in the safety of Claude are completely crazy,NLP books (NLP) Casper Hansen on X. “It makes you a little bit for [Anthropic rival] Openai vision of stupidity is publicly displayed. “

The human researcher changes a melody

Poman later released his tweet and the following is on a topic to read as follows, but he still did not convince those who refuse that the user data and their safety will be protected from the intrusive eyes:

“Through this type of style (unusual but not very strange), and unlimited access to the tools, if the model sees that you are doing a terrible evil thing like marketing a drug based on fake data, you will try to use an email to the Whistleblow.“

Bowman added:

“The previous tweet was deleted on the informants, as it was withdrawn from the context.

TBC: This is not a new feature of Claude and it is not possible to use. It appears in testing environments where we offer unusually free access to very unusual tools and instructions.“

Since its inception, man has sought more than other Amnesty International laboratories to put themselves as a picture of the integrity and ethics of artificial intelligence, as its initial work focuses on the principles of “constitutional artificial intelligence”, or artificial intelligence that behaves according to a set of beneficial criteria for humanity and users. However, with this new update and the disclosure of “infringements” or “evaluation behavior”, the moral may have undoubtedly a reaction between users – which makes them Lack of confidence The new model and the entire company, and thus keep them away from it.

He was asked about the reverse reaction and the circumstances in which the model participates in unwanted behavior, the official spokesperson pointed to the document of the general system of the model. here.

Daily visions about business use cases with VB daily

If you want to persuade your boss at work, you have covered VB Daily. We give you the internal journalistic precedence over what companies do with obstetric artificial intelligence, from organizational transformations to practical publishing operations, so that you can share visions of the maximum return on investment.

Read our privacy policy

Thanks for subscribing. Check more VB newsletters here.

An error occurred.

What's Hot

Summer slowdown has already started? – Bitrss

George RR Martin says it will never end in the Game of Thrones series.

Taylor Swift buys Taylor Swift albums from First 6 albums, and shares a new album on the “reputation” album in a message

The anthropier faces a violent reaction to the Claude 4 OPUS behavior connected to the authorities, click on whether he thinks you are doing a terrible immoral thing. “

Google fixes errors that led to an artificial intelligence overview of saying that it is now 2024

Flux.1 Kontext allows the generation of images within the context of the AI Enterprise pipelines

Inside the Amnesty International Revolution: The best ideas and penetration from our partners in Techcrunch sessions: AI