A Novel Method to Distinguish Human from ChatGPT-Generated Academic Writing

In a groundbreaking study published in Cell Reports Physical Science, researchers from the University of Kansas have developed a method to distinguish human-generated academic writing from text produced by artificial intelligence (AI), specifically OpenAI’s ChatGPT. The study comes at a time when the use of AI in generating written content has become increasingly prevalent, necessitating reliable ways to differentiate between human and AI-generated text.

ChatGPT, released to the public in late November 2022, currently has an estimated user base of 244 million. Its capabilities range from providing detailed responses to a wide array of prompts to generating human-like research reports. While the technology has been lauded for its potential, concerns have been raised about its potential misuse, particularly in academic and professional writing.

The researchers from the University of Kansas developed a unique method to differentiate between human and AI-generated text, specifically focusing on academic writing. This method is based on supervised classification techniques, a type of machine learning where the model is trained on a labeled dataset. In this case, the labels would indicate whether a piece of text was written by a human or generated by ChatGPT.

The team identified 20 distinct features that could help distinguish between human and AI-generated text. These features fell into four categories: paragraph complexity, sentence-level diversity in length, differential usage of punctuation marks, and different “popular words.”

One of the key findings was that human scientists tend to write longer paragraphs compared to ChatGPT. This observation was used as a feature in their model, as it was a consistent difference between the two types of text.

Another interesting finding was the frequent use of equivocal language by human scientists. Words like “but,” “however,” and “although” were used more often by humans than by ChatGPT. This tendency towards equivocation was another feature used in the model.

The team also found differences in the use of punctuation marks and the frequency of certain words. For instance, human scientists more frequently used question marks, dashes, parentheses, semicolons, and colons, while ChatGPT used more single quotes. Scientists also used more proper nouns and/or acronyms, both of which are captured in the frequency of capital letters, and scientists used more numbers.

By incorporating these and other features into their model, the researchers were able to achieve an impressive accuracy rate of over 99% in identifying whether a piece of text was written by a human or generated by ChatGPT. This high level of accuracy suggests that their method is highly effective at distinguishing between human and AI-generated academic writing.

The researchers also compared their method with an existing state-of-the-art tool, the GPT-2 Output Detector, which is based on the RoBERTa model. They found that while the Output Detector was effective at distinguishing between human and AI-generated online content, its performance dropped significantly when applied to academic writing. In contrast, the new method developed by the Kansas team was more effective at both the paragraph and document levels.

The study represents a significant step forward in the ongoing effort to distinguish between human and AI-generated text, particularly in the realm of academic writing. However, the researchers note that further studies are needed to assess the generalizability of their approach to a broader range of human scientific writing and to adapt to the rapidly evolving capabilities of AI language models.

Featured Image Credit: Photo by “Jonathan Kemper” via Unsplash