The world has a new AI toy, and it’s called GPT-3. The latest iteration of OpenAI’s text generating model has left many starstruck by its abilities – although its hype may be too much.
GPT-3 is a machine learning system that has been fed 45TB of text data, an unprecedented amount. All that training allows it to generate sorts of written content: stories, code, legal jargon, all based on just a few input words or sentences. And the beta test has already produced some jaw-dropping results. But after some initially promising results, GPT-3 is facing more scrutiny.
The model faced criticism last week when Facebook’s head of AI Jerome Pesenti called out bias coming out of a program created with GPT-3. The program in question was a tweet generator; anyone could type in a word and the AI would come up with a relevant, 280-characters-or-less sentence.
The outputs vary, from the weird to the genuinely wise. When I typed in Zuckerberg, for example, GPT’s first suggestion was: “Wild speculation why Zuck doesn’t wear a tie. He plans to one day roll up a tied tie, tightly seal it with superglue and swallow it. Then surgically remove it from his stomach and act like it was bound to happen to all techies.” While the second hit closer to home: “Stay far away from Zuckerberg, the most dangerous thing right now is tech companies entering finance.”
Pesenti tested the words: Jews, black, women and holocaust, and had come up with some grim results. They’re horrible, but not surprising. GPT-3 was trained off of 175 billion parameters from across the internet (including Google Books, Wikipedia, and coding tutorials); its code contains bias. AI systems copying the human prejudices – including, but not limited to, racism and sexism – based on the data they learn from has been well documented. The real question is, what can OpenAI do about it before the system is made commercially available in the future?
The creator of the tweet generator Sushant Kumar says it didn’t take OpenAI long to react. As soon as his program launched, OpenAI called him to discuss how it was being monitored, and when these problematic tweets started to emerge (even though they were few in number) he had a meeting with Greg Brockman, the company’s founder and CTO. Less than a day after Pesenti had flagged the problem, OpenAI launched a toxicity content filter API, which rates all content created by GPT-3 on a toxicity scale from one to five, and anything above a two is flagged for moderation.
OpenAI has never pretended its system is perfect. When it first revealed the system’s predecessor, GPT2, in February 2019 it was not made publicly available because of fears of dangerous applications. GPT-2 was only released in full once OpenAI had seen “no strong evidence of misuse”.
This cautious approach has continued. As the hype around GPT-3 started to build, CEO Sam Altman even called it “too much”, tweeting: “It’s impressive (thanks for the nice compliments!) but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out.”
And when biases started to emerge, Altman didn’t get defensive. “We share your concern about bias and safety in language models,” he tweeted to Pesenti, “and it’s a big part of why we’re starting off with a beta and have [a] safety review before apps can go live.”
The toxicity filter isn’t a catch-all solution, but it shows the kind of work that needs to be done before it can be released to the public – and that OpenAI is willing to do it. “That’s the reason it’s in beta right now,” says Kumar, echoing Altman’s sentiment. “With something as dramatically groundbreaking as this, you need to see everything it can do. Right now, we don’t know what it’s capable of.”
OpenAI did not respond to a request for comment for this article. However in Twitter thread the group said all types of generative AI models are able to “can display both overt and diffuse harmful outputs, such as racist, sexist, or otherwise pernicious language”. People that have access to the GPT-3 beta have been handed usage guidelines that state if they build applications that could be dangerous they may have their access removed. “We do not support use cases which may cause physical or mental harm, including but not limited to harassment, intentional deception, radicalisation, astroturfing, or spam,” OpenAI says.
While all the risks of GPT-3 aren’t clear yet, its power is easily demonstrated. The technology is straightforward. “A lot of people who don’t have coding knowledge find it more easy to use because it’s so intuitive,” says Qasim Munye, a medical student at King’s College London; one of the first to get their hands on the beta test. “You just give it a prompt and it carries on writing for you.”
The few hundred testers don’t have access to the full model, just the API, which comes in the form of a text box. You type in a prompt, indicating what you’d like it to do, and it does it. You might have to rewrite the prompt a couple of times to get the output you’re looking for, but it’s literally that easy. Testers have already been showing its powers: one has generated poetry, another created instant web design code; someone even prompted it to act like a therapist.
“As soon as I got handed the technology I wanted to play around with it,” says Munye. “Because wow, the potential is crazy.” First he made it give answers to complex medical questions, but now he’s working on a short story writing app that uses GPT-3 to help writers get past block. If you’re writing a story and lose inspiration, the GPT-infused Shortly app will continue it for you – logically, coherently, and in your writing style.
This is where the GPT-3 has extraordinary skill. From a single sentence, or even a few words, it can generate a full five, well-written paragraphs. “I’ve been shocked when I’ve seen it,” says Munye, “it’s hard to distinguish from a human in terms of creativity.”
Despite the ease of use, there could be serious consequences. Flooding the internet with fake news, for example. This was a key concern with GPT-2 as well, but this newest iteration would make mass producing content even easier. In another recent Twitter thread, Pesenti continued his critique of GPT-3 flaws, suggesting that OpenAI should have discouraged risky services like Kumar’s from the get go. But without early experimentation, many issues could sneak by unnoticed. Bias and fake news are problems we can easily predict, but what about the stuff we can’t?
“There’s doubtless a lot of biases we haven’t even noticed yet,” says Anders Sandberg, a senior researcher at Oxford University’s Future of Humanity Institute. “It wouldn’t surprise me if we started to use systems like this as tools to detect the weird biases we have.”
Sandberg thinks OpenAI made the right choice in allowing people to freely play around with this API. “It unleashes a lot of creativity and also sets them up for finding interesting problems relatively early on,” he says. A more closed system, “that you have to sign a non-disclosure agreement to even use”, wouldn’t result in as much innovation, because you wouldn’t see the most risky uses. “That is why pre-exploration and testing is so useful, especially when people try totally crazy things,” says Sandberg. “It can be quite revealing.”
As soon as problems pop up, they can be tackled. And, as OpenAI is only giving people access via an API, anything problematic can be shut down. “They’re acting as a middleman, so if people do start using it maliciously on a mass scale, they would have the ability to detect that and shut it down,” says beta tester Harley Turan, “which is a lot safer than the approach they took with GPT-2.” As well as enforcing its own terms of service, Open AI says it is working to “develop tools to label and intervene on manifestations of harmful bias,” plus conducting its own research and working with academics to look determine potential misuse.
Leaving OpenAI in charge may not be a long-term solution, however. “Anytime a tech company becomes a content moderator it ends badly, that’s the general rule,” says Turan, “because you are consolidating moral authority into a company.” It’s not a question of whether the people who run OpenAI are good, moral people, it just gets a little tricky when these decisions are made by a commercial entity (OpenAI shifted from a non-profit to “capped-profit” company last year).
Altman has tweeted OpenAI believes they “need to be very thoughtful about the potential negative impact companies like ours can have on the world.” And, in a public statement, the company was staunch in this position: “This is an industry-wide issue, making it easy for individual organizations to abdicate or defer responsibility. OpenAI will not.” The company’s charter states OpenAI’s “primary fiduciary duty is to humanity” and that the company will not compromise on safety to win the AI development race.
There are many alternate regulatory options, with various pros and cons. There may even be a way for the GPT-3 system to help mitigate its own dark side. “The paradoxical thing is that these text systems actually are pretty good at calculating the probability that something was written by them,” says Sandberg. So rather than helping to stimulate troll factories, GPT-3 could keep its own fake news in check.
“General purpose technologies are the ones that really transform the world,” says Sandberg, and he believes GPT-3 has the potential to do just that, if we can figure out how to use it responsibly. “A new way of processing information is going to be significant to a lot of very, very different applications, which means we can’t predict the consequences very well,” he says, “which is deeply disturbing when you have very powerful technologies arriving very quickly.”
It’s going to take time, and risks are unavoidable, but censorship is not the way to tackle them. A better option is for these questions and issues to be in everyone’s conscious as they’re working with GPT-3. “Ideally you want people to understand the impact they’re going to have,” says Sandberg. “A lot of engineering gets bogged down with getting the stuff to work rather than raising your eyes to the horizon and thinking ‘where does this actually fit into our culture?’ I think that awareness, if you can make it widespread, is actually what could make things much safer and much more useful.”
More great stories from WIRED
🚚 The French town that created its own Amazon
🦆 Google got rich from your data. DuckDuckGo is fighting back
😷 Which face mask should you buy? The WIRED guide
🔊 Listen to The WIRED Podcast, the week in science, technology and culture, delivered every Friday