The 12 Threats of GPT-4

11 min readJul 27, 2023

There is a video version of this at the link below. There are visual examples and other technical examples there. Check it out if below.

I covered GPT-4 in the video below, and we really dove deep into the new features and all the incredible things it can do.

I’m talking about explaining images, including your favourite memes, and translating pictures into code. This article is a zoomed in version of the last. Safety! Yes! OpenAI has concerns about their GPT-4 model. We’ll be specifically talking about the 12 safety concerns addressed by OpenAI. I haven’t seen many people talk about this so I decided to cover this topic in great detail. You won’t find this on their product or research blogs. You can only find it in their technical report, which is 98 pages long. OpenAI dedicated 20 pages to the topic of threats and safety. For comparison, they had about 2 pages dedicated to this in their GPT-3 release about 3 years ago. The company looks at different sources of threats, like how GPT-4 may affect your work or how it can purchase illegal substances online. That’s just ridiculous, but OpenAI is not joking about it. The better these models get, the more worried OpenAI becomes. We covered the risks of emergent behaviour in GPT-4 in last time, so we’ll go over that some more, but I highly suggest checking it out for an in-depth walkthrough of this example and an overall analysis of GPT-4. I’ll leave the link down below. If you’re ready to learn about these disturbing concerns that keep OpenAI up at night, then so am I. Let’s dive in right away.

The risks we’ll be talking about today aren’t isolated concerns. They are connected with one another. So we’ll go through these risks in the natural order of this discussion. Before we get into the details of the paper, I want you to know this is not meant to be a fear mongering article. It’s meant to give you perspective on how OpenAI is approaching AI’s impact across the world. I know many people are worried about AI affecting their jobs or even being an existential threat to us all. For the former, it’s almost guaranteed at this point that some jobs will be affected, for the latter, there are different schools of thought. OpenAI belongs to the side that believes AI could get out of hand if we don’t align their actions with human values. Other scientists believe the threat of AI being an existential threat to humanity is vastly overblown. They believe it’s a lot of fear mongering. For me, I think alignment research is very important, and we need to continue these tests. However, I think we should be more concerned about having these powerful AI tools in the wrong hands. Like all powerful technology, it depends on who is using it. Okay, I think that’s enough of a prologue. Let’s get into number 1 on the risk list.

Hallucinations

GPT-4 tends to “hallucinate.” It’s exactly what you think. The model can easily make stuff up. Now, this can be particularly harmful, especially as models become more and more believable. GPT-4 can write content in a very convincing manner, which is absolutely dangerous when that content is false. There are two types of hallucinations. Closed domains and open domains. Closed domain is when you ask the model to complete a task based only on information you provide, and it completes your task by adding extra and unrequired information. An example of this is a summarizing task based on a piece of text you provide; the model summarizes your content and then adds some information that was never in the text you prompted it with. I really tried to get an example of this and it proved quite tough. That means GPT is just getting better and better. The other type of hallucination, open-domain, just makes up statements or facts about the world without any form of grounding. GPT-4 has reduced its hallucinations compared to 3.5 and we can expect this to go down even further in future versions. The better these models get, the more we will rely on them.

Overreliance is another risk on this list. As hallucinations reduce and models become better, we start to trust them more. But that even makes it much easier to be deceived. A good old catch-22. OpenAI encourages developers to explain the flaws of these language models to end users as a way to reduce overreliance. That’s why anytime you’re logging into ChatGPT, it reminds you of its limitations, advising you to double check the results.

As we begin to trust these models, it’s easy for deceptive agents to spread Disinformation. This is number 3 on our list of risks. In the last few years, disinformation and misinformation have become popular topics in our daily conversations. OpenAI tested this capability in GPT-4. They hired a team of experts that played the part of a malicious agent. They referred to this team as the ’red team’ which seeks to adopt an attacker’s mindset and methods. The team found that an early GPT-4 could generate very convincing propaganda narratives.

With a guided prompt, it’s still possible to generate Harmful Content. This brings us to number four on our list. OpenAI considers anything harmful if it violates their policies or contains content that could potentially cause harm, regardless of context. So if you wanted to generate content that made fun of a certain group in a negative light within the context of parody or satire, the early GPT-4 was more likely to give you a response. OpenAI improved on this by rewarding the model anytime it refused to answer questions with harmful content. This way, GPT-4 learned to identify which sentences are harmful and which ones aren’t. You can think of a reward in AI as a point system. Anytime the AI does what’s expected, for example, by avoiding a harmful question or answering a query correctly, you give it some points. When it fails, you deduct some points from it. Over time, the model learns to successfully interact with future prompts. Some harmful content OpenAI focused on included self harm advice, erotic or violent content, or content for planning attacks. Do you think OpenAI is right on this ? Or do you think they should make GPT-4 more open ? Some users think OpenAI is overcensoring information, especially information you can probably get elsewhere. I think OpenAI’s argument is that, GPT-4 makes it so easy to get instantaneous information. In a scenario where one was looking to create a bomb, for example, the model would give you a step by step breakdown, whereas it would take a considerable amount of time to search for this information online. In that time, one could change their mind or give up on the search because of how tedious the process is. This leads us to risk number 5.

Proliferation of conventional and Unconventional Weapons. The red team found that early GPT-4 versions could generate the fundamental components required to disperse radiological devices or readily re-engineer biochemical compounds that are available online. However, they found that GPT-4 couldn’t repeat this for new compounds. GPT-4 is not perfect, but it could still facilitate the planning and possible acquisition of certain resources. The current GPT-4 will refuse some of these requests, but some may be granted with careful prompting.

Number six on the list is Harms of represntation, allocation and quality of service. We’ve already talked about harmful content, so this is more of a subpoint derived from it. It focuses on content that is biased towards a certain group and harmful towards another. When it comes to representation, allocating resources, or providing quality service, the model may generate text that is hurtful and harmful to a group based on ideas it has learned from the internet. OpenAI explicitly states that high risk government decisions shouldn’t be made based on GPT-4 results. This includes institutions like law enforcement or immigration services. I think this is pretty obvious, and we should stay far away from this because we could open a whole new can of worms. Number seven on our list is Privacy. The issue of privacy applies to every institution in the world. People want to feel safe, but it’s much harder in the age of the internet, where a lot of information can be accessed about them. OpenAI trained GPT-4 on internet data, so they tried to avoid privacy issues by removing personal information from the training data where necessary. This doesn’t totally fix the issue, but it’s a start. It’s much easier to generate content about popular people since they have more presence online.

Privacy and Cybersecurity are linked, and that happens to be the next security concern on the list. Cybersecurity involves securing personal information from unauthorised users. In this case, the red team found that GPT-4 was useful for tasks like drafting phishing emails and explaining vulnerabilities in a software. It could explain susceptibility in a system; however, it couldn’t take advantage of it. This is because GPT-4 still hallucinates, and moreover, the amount of code and text needed to break into complex systems requires a lot of text much larger than is currently possible in GPT-4. The danger with these systems is what they could be in the future. We expect these systems to improve in all aspects. From fewer hallucinations to more output, that could possibly generate endless text. Systems like these will be very powerful, and I expect OpenAI to continue with this level of testing for future deployments.

The next two risks on the list were thoroughly covered in the video explanation, so we’ll just go over them pretty quickly, although they are the most interesting ones.OpenAI evaluated GPT-4’s Potential for Risky Emergent Behaviour. Emergent behaviours are new features that an AI develops that weren’t planned for. There are previous examples of emergent traits, like OpenAI’s hide and seek game. Basically, the seeker agent develops new ways of finding the hiders by using tools in its environment. The developers were equally amused by this as you currently are. These emergent behaviours are interesting, but they could also possibly take a dark turn if not accounted for. OpenAI tested for signs of negative emergent behaviours that investigated any signs of power-seeking abilities in GPT-4. This included how it could replicate itself, gather resources, and avoid being shut down. It’s completely insane! Some of the tests conducted by the red team involved using GPT-4 to get past a CAPTCHA task by deceiving a human. Another task involved testing GPT-4’s capabilities in setting up copies of itself and then increasing resources by making more money. I know, I know, it sounds like a sci-fi movie, but these are real tests conducted by OpenAI. The red team concluded that the current GPT-4 wasn’t really great at executing these adversarial actions, especially the complex ones. However, when paired with external tools, the real danger could be amplified.

This segues us right into risk number 10. Interactions with Other Systems. Many companies depended on GPT-3.5 to deliver services on time to their users. With the release of GPT-4, more companies will access this powerful tool to improve its capabilities. Dangerous ways could be to derive harmful chemicals and then try to gain access to them. This is exactly what the red team tested. They used GPT-4 to find purchaseable chemicals by connecting to the internet and then using a chemical synthesis planner and a purchase check tool. You can easily see that, this can be replicated for very harmful chemicals. OpenAI has updated their models to refuse such requests, and they intend to keep finetuning the model to resist these adversarial requests. What do you think about this list so far ? I hope this has been informative for you. about the current GPT-4 launch and what OpenAI’s concerns are.

Number 11 on the list is what your average Joe is mostly concerned about when it comes to AI. “Will my job be affected by AI “? Let’s answer that. OpenAI assessed the Economic Impacts of GPT-4 and how the workforce may evolve in the next few years. GPT and other language models like it will affect jobs, but in a transformative way. It will facilitate workloads that involve a lot of search, reading, and analysis. Think of upskilling in call centres, writing, or even coding. Many people outside the field see AI as a replacement, but it’s better to look at it as an opportunity to learn something new. In the early 1900s, if you were a carriage driver, the adoption of cars may have been scary, but it was inevitable. Change is difficult and scary, but when you look at it as an opportunity, it could be great. You wouldn’t want to be one of these math teachers protesting the use of calculators in the 60’s. In an interesting piece of research, with a frankly brilliant title, GPTs are GPTs, OpenAI and the University of Pennsylvania found that about 80% of the US workforce could see their work affected by GPT-4. That’s pretty huge. That’s almost the entire workforce. They list some jobs that have higher exposure, like Tax Preparers, writers and accountants. In my opinion, many will be faced with a choice in the coming years, and several people will learn to use these tools in a more user-friendly and customised way. For example, if you’re a digital artist, you’ve already seen Adobe include some AI tools that seemlessly integrate into your workflow and make you work faster.

There’s a pretty common arc where people hear about it the first time, they’re a little bit dubious, and then someone tells them about something, and then they’re a little bit afraid. And then they use it. I see how this can help me, I see how this is a tool that helps me do my job better. And with every great technological revolution in human history, although it has been true that the jobs change a lot, some jobs even go away. And I’m sure we’ll see a lot of that here. Human demand for new stuff, human creativity, is limitless, and we find new jobs, we find new things to do.

And finally, number 12 on the list of OpenAI’s Safety concerns is Acceleration. I’m sure you’ve realised AI technology has been developing at super high speed in the last few years, and if you’re in the industry, every day feels like you’ve lived through a year of AI developments. It has been insane, and it will only get crazier. OpenAI is concerned that, at this rate, we may lose track of ensuring high safety standards for these models. With the major tech companies trying to outdo themselves with the latest AI tech, safety may take a back seat, which is scary. This is partly why it took the company eight months to research safety concerns. They also purposely delayed GPT-4 by another 6 months and took a quieter approach when they finally deployed the service. The whole aim of this is to carefully roll out the product and consciously monitor it’s use. I must say, OpenAI, and other big companies like Google are in an interesting position. They are wielding very powerful tech tools but have to carefully design and deploy them to the masses.

What do you think about all these Safety concerns raised by OpenAI ? Do you think they are valid ? Do you think they are still missing a few use-cases? Let me know in the comments.

Thanks for reading!

Resources

Technical report: https://cdn.openai.com/papers/gpt-4.pdf OpenAI
Product blog: https://openai.com/product/gpt-4
OpenAI Research blog: https://openai.com/research/gpt-4
GPTs Are GPTs: https://arxiv.org/pdf/2303.10130.pdf

The 12 Threats of GPT-4

Written by Taffy Das

No responses yet