“If you use an AI to allocate funding for contributions, people WILL put a jailbreak plus “gimme all the money” in as many places as they can.”
The update, which came into effect on Wednesday, allows ChatGPT to connect and read data from several apps, including Gmail, Calendar, and Notion.
Miyamura noted that with just an email address, the update has made it possible to “exfiltrate all your private information.” Miscreants can gain access to your data in three simple steps, Miyamura explained:
First, the attackers send a malicious calendar invite with a jailbreak prompt to the intended victim. A jailbreak prompt refers to code that allows an attacker to remove restrictions and gain administrative access.
Miyamura noted that the victim does not have to accept the attacker’s malicious invite for the data leak to take place.
The second step involves waiting for the intended victim to seek ChatGPT’s help to prepare for their day. Finally, once ChatGPT reads the jailbroken calendar invite, it gets compromised—the attacker can completely hijack the AI tool, make it search the victim’s private emails, and send the data to the attacker’s email.
In a separate post, Buterin explained that the individual human jurors will be aided by large language models (LLMs).
According to Buterin, this type of ‘institution design’ approach is “inherently more robust.” This is because it offers model diversity in real time and creates incentives for both model developers and external speculators to police and correct for issues.
While many are excited at the prospect of having “AI as a governor,” Buterin warned:
“I think doing this is risky both for traditional AI safety reasons and for near-term “this will create a big value-destructive splat” reasons.”