Within a week of ChatGPT’s November 30, 2022, launch, the AI-powered conversation tool was the talk of the (media) town, fascinating early users with its conversational abilities and even creativity. Soon, the enthusiasts exclaimed, we won’t need people to write marketing copy, ads, essays, reports, or pretty much anything other than the most specialized scientific reports. And AI will be able to handle all our customer service calls, appointment-making, and other routine conversations.
Not so fast! My own experiments with the underlying technology suggest we have a ways to go before we get there.
Still, what is different about ChatGPT versus previous AI wunderkinds is that it isn’t just the tech and business media who are paying attention: Regular folks are too.
A teacher friend asked me just a week after ChatGPT’s debut how teachers will be able to detect students having AI write their term papers for them. Policing cut-and-paste efforts from Wikipedia and the web are tough enough, but an AI tool that writes “original” papers would make student essays and reports meaningless as a judge of their learning.
(Switching to oral presentations with a Q&A component would fix that issue, since students would have to demonstrate live and unaided their actual understanding. Of course, schools don’t currently give teachers the time for that lengthy exam process.)
What is ChatGPT — and GPT-3?
ChatGPT is the latest effort from the OpenAI Foundation (a research company backed by Microsoft, LinkedIn cofounder Reid Hoffman, and VC firm Khosla Ventures) to create natural-language systems that can not only access information but actually aggregate, synthesize, and write it as a human would do. It uses OpenAI’s Generative Pretrained Transformer 3 (GPT-3) database and engine, which contains millions of articles that the engine has analyzed so it can “understand” relationships between concepts and their expressions, as well as the meanings of those concepts, in natural-language text. OpenAI has said that GPT-3 can process natural-language models with 175 billion parameters — just think about that!
GPT-3 is not new, but OpenAI is increasingly opening it to outside users, to help GPT-3 self-train by “observing” how the technology is used and, as important, corrected by humans. GPT-3 is also not the only natural-language AI game in town, even if it gets a lot of the attention. As James Kobielus has written for our sister site InfoWorld, Microsoft has its DeepSpeed and Google its Switch Transformer, both of which can process 1 trillion or more parameters (making GPT-3 look primitive by comparison).
As we’ve seen with several AI systems, GPT-3 has some critical weaknesses that get lost in the excitement of what the first wave of GPT-based services do — the same kinds of weaknesses prevalent in human writing but with fewer filters and self-censorship: racism, sexism, other offensive prejudices, as well as lies, hidden motives, and other “fake news.” That is, it can and does generate “toxic content.” The team at OpenAI understands this risk full well: In 2019, it disabled public access to the predecessor GPT-2 system to prevent malicious usage.
Still, it’s amazing to read what GPT-3 can generate. At one level, the text feels very human and would easily pass the Turing test, which means a person couldn’t tell if it was machine- or human-written. But you don’t have to dig too deep to see that its truly amazing ability to write natural English sentences doesn’t mean it actually knows what it’s talking about.
Hands-on with GPT-3: Don’t dig too deep
Earlier this year, I spent time with Copysmith’s Copysmith.AI tool, one of several content generators that use GPT-3. My goal was to see if the tool could supplement the human writers at Computerworld’s parent company Foundry by helping write social posts, generating possible story angles for trainee reporters, and perhaps even summarizing basic press releases while de-hyping them, similar to how there are content generators to write basic, formulaic stories on earthquake location and intensity, stock results, and sports scores.
Although Copysmith’s executives told me the tool’s content is meant to be suggestive — a starting point for less-skilled writers to explore topics and wording — Copysmith’s marketing clearly is aimed at people producing websites to provide enough authoritative-sounding text to get indexed by Google Search and increase the odds of showing up in search results, as well as writing as many variations as possible of social promotion text for use in the vast arena of social networks. That kind of text is considered essential in the worlds of e-commerce and influencers, which have few skilled writers.
OpenAI restricts third parties such as Copysmith to working with just snippets of text, which of course reduces the load on OpenAI’s GPT-3 engine but also limits the effort required of that engine. (The AI-based content generators typically are limited to initial concepts written in 1,000 characters or less, which is roughly 150 to 200 words, or one or two paragraphs.)
But even that simpler target exposed why GPT-3 isn’t yet a threat to professional writers but could be used in some basic cases. As is often the case in fantastical technologies, the future is both further away and closer than it seems — it just depends on which specific aspect you’re looking at.
Where GPT-3 did well in my tests of Copysmith.AI was in rewriting small chunks of text, such as taking the title and first paragraph of a story to generate multiple snippets for use in social promos or marketing slides. If that source text is clear and avoids linguistic switchbacks (such as several “buts” in a row), usually Copysmith.AI generated usable text. Sometimes, its summaries were too dense, making it hard to parse multiple attributes in a paragraph, or oversimplified, removing the important nuances or subcomponents.
The more specialized terms and concepts in the original text, the less Copysmith.AI tried to be creative in its presentation. Although that’s because it didn’t have enough alternative related text to use for rewording, the end result was that the system was less likely to change the meaning.
But “less likely” doesn’t mean “unable.” In a few instances, it did misunderstand the meaning of terms and thus created inaccurate text. One example: “senior-level support may require extra cost” became “senior executives require higher salaries” — which may be true but was not what the text meant or was even about.
Misfires like this point to where GPT-3 did poorly in creating content based on a query or concept, versus just trying to rewrite or summarize it. It does not understand intent (goal), flow, or provenance. As a result, you get Potemkin villages, which look pretty viewed from a passing train but don’t withstand scrutiny when you get to their doors.
As an example of not understanding intent, Copysmith.AI promoted the use of Chromebooks when asked to generate a story proposal on buying Windows PCs, giving lots of reasons to choose Chromebooks instead of PCs but ignoring the source text's focus on PCs. When I ran that query again, I got a wholly different proposal, this time proposing a section on specific (and unimportant) technologies followed by a section on alternatives to the PC. (It seems Copywriter.AI does not want readers to buy Windows PCs!) In a third run of the same query, it decided to focus on the dilemma of small business supply chains, which had no connection to the original query’s topic at all.
It did the same context hijacking in my other tests as well. Without an understanding of what I was trying to accomplish (a buyer’s guide to Windows PCs, which I thought was clear as I used that phrase in my query), GPT-3 (via Copysmith.AI) just looked for concepts that correlate or at least relate in some way to PCs and proposed them.
Natural writing flow — storytelling, with a thesis and a supporting journey — was also lacking. When I used a Copysmith.AI tool to generate content based on its outline suggestions, each segment largely made sense. But strung together they became fairly random. There was no story flow, no thread being followed. If you’re writing a paragraph or two for an e-commerce site on, say, the benefits of eggs or how to care for cast iron, this issue won’t come up. But for my teacher friend worried about AI writing her students’ papers for them, I suspect the lack of real story will come up — so teachers will be able to detect AI-generated student papers, though this requires more effort than detecting cut and paste from websites. Lack of citations will be one sign to investigate further.
Provenance is sourcing: who wrote the source material that the generated text is based on (so you can assess credibility, expertise, and potential bias), where they are and work (to know whom they are affiliated with and in what region they operate, also to understand potential bias and mindset), and when they wrote it (to know if it might be out of date). OpenAI doesn’t expose that provenance to third parties such as Copysmith, so the resulting text can’t be trusted beyond well-known facts. Enough of the text in my tests contained clues of questionable sourcing in one or more of these aspects that I was able to see that the generated text was a mishmash that wouldn’t stand real scrutiny.
For example, survey data was all unattributed, but where I could find the originals via web searches, I saw quickly they could be years apart or about different (even if somewhat related) topics and survey populations. Picking and choosing your facts to create the narrative you want is an old trick of despots, “fake news” purveyors, and other manipulators. It’s not what AI should be doing.
At the least, the GPT-generated text should link to its sources so you can make sure the amalgam’s components are meaningful, trustworthy, and appropriately related, not just written decently. OpenAI has so far chosen to not reveal what its database contains to generate the content it provides in tools like ChatGPT and Copysmith.AI.
Bottom line: If you use GPT-based content generators, you’ll need professional writers and editors to at least validate the results, and more likely to do the heavy lifting while the AI tools serve as additional inputs.
AI is the future, but that future is still unfolding
I don’t mean to pick on Copysmith.AI — it’s just a front end to GPT-3, as ChatGPT and many other natural-language content tools are. And I don’t mean to pick on GPT-3 — although a strong proof of concept, it’s still very much in beta and will be evolving for years. And I don’t even mean to pick on AI — despite decades of overhype, the reality is that AI continues to evolve and is finding useful roles in more and more systems and processes.
In many cases, such as ChatGPT, AI is still a parlor trick that will enthrall us until the next trick comes along. In some cases, it’s a useful technology that can augment both human and machine activities through incredibly fast analysis of huge volumes of data to propose a known reaction. You can see the promise of that in the GPT-fueled Copysmith.AI even as you experience the Potemkin village reality of today.
At a basic level, AI is pattern matching and correlation done at incredible speeds that allow for fast reactions — faster than what people can do in some cases, like detecting cyberattacks and improving many enterprise activities. The underlying algorithms and the training models that form the engines of AI try to impose some sense onto the information and derived patterns, as well as the consequent reactions.
AI is not simply about knowledge or information, though the more information it can successfully correlate and assess, the better AI can function. AI is also not intelligent like humans, cats, dogs, octopi, and so many other creatures in our world. Wisdom, intuition, perceptiveness, judgment, leaps of imagination, and higher purpose are lacking in AI, and it will take a lot more than a trillion parameters to gain such attributes of sentience.
Enjoy ChatGPT and its ilk. Learn all about them for use in your enterprise technology endeavors. But don’t think for a moment that the human mind has been supplanted.