Chatbots have become increasingly popular and may soon adopt
watermarkers, to answer the ethical and legal questions raised by their use, in relation to intellectual property.
Here are the problems arising from the use of chatbots in academic fields, while tools to detect whether a text or other content is the result of their use are making their debut. But in the future even cryptography and blockchain could take the field against counterfeiting.
The growing proliferation of artificial intelligence (AI) tools specifically designed to generate human-like text has sparked areal chatbot war in Silicon Valley. They are popular due to their ability to automate conversations and improve process efficiency. The most used AI chatbot is Chat GPT created by the start-upopen Ai. It already boasts over 100 million active users in January, just two months after its launch.
But it is enough to type "use of chatbots" on Google to realize the quantity of companies that sponsor them and provide their use to improve the productivity of companies.
Chatbots or similar projects from tech industry giants are plentiful. Google in January published a paper describing a model that can generate new music from a textual description of a song. Meanwhile he is working on the creation of an anti ChatGPT calledApprentice Bard.
Baidu, the Chinese search giant, intends to incorporate a chatbots in its search engine in March. Replika is a chatbot that presents itself with the slogan "the companion who cares about you": it was, correctly, struck down by the comments of the
Privacy Guarantor. But even this case demonstrates how much attention is paid to chatbots.
Since the debut of ChatGPT in November, students started cheating by using it to write their essays. News site Cnet also used ChatGPT to write articles, only to be forced to make corrections later on allegations of plagiarism.
Many students have used chatbots as a study support tool, especially for learning difficult concepts or solving complex tasks. However, that lifts the ethical question of using chatbots.
If students are using chatbots to tackle their homework, one wonders whether they are still the result of their work. Or one wonders if students who use chatbots draw an unfair advantage over their classmates who don't use them.
Teachers, in particular, are trying to adapt to availability of software that can produce a moderately acceptable essay on any subject in an instant. Perhaps we will return to assessments with pen and paper. Or it will increase the supervision of the exams. There are also those who propose to ban the use of artificial intelligence altogether.
This issue is particularly relevant in academia, where students are assessed on the basis of their knowledge and skills. The use of chatbots can represent a threat to academic integrity.
The use of chatbots could be considered one form of plagiarism. In fact, students use automatically generated answers to answer tasks that require their personal knowledge.
Another problem related to the use of chatbots is related to intellectual property. Chatbots can generate answers or solutions to problems. But who owns the intellectual property on those responses? Students using chatbots could be accused of violate the intellectual property rights of the owners of the chatbot.
Meanwhile chatbot owners are receiving intellectual property infringement allegations from those whose texts, documents, photos or other, are used precisely to "feed" the chatbot algorithms.
In mid-January the images site Getty announced the start of the lawsuit against Stability AI: “This week Getty Images started legal proceedings in the High Court of Justice in London against Stability AI, alleging that the latter violated intellectual property rights, including the copyright of content owned by Getty Images (or those it represents).
Getty Images believes that Stability AI has illegally copied and processed millions of copyrighted images and related metadata owned by Getty Images, without a license, to the benefit of Stability AI's commercial interests and to the detriment of content creators.
But artificial intelligence has the potential to spur creative endeavors. Consequentially, Getty Images has licensed leading technology innovators for AI system training purposes so rinspect personal and intellectual property rights. Stability AI has not sought any such license from Getty Images and has instead chosen, in our view, to ignore viable licensing options and long-standing legal protections to pursue its independent business interests." Also in this case, the enormous potential of artificial intelligence is underlined, but also the necessary respect for the barriers established by law.
To avoid these problems, some academic institutions have prohibited the use of chatbots by students. However, it's not always practical or effective, as chatbots have become increasingly sophisticated and difficult to detect.
In the advent of artificial intelligence, both text detectors and text generators are becoming more and more refined. This could have a significant impact on the effectiveness of the different methods and tools proposed to recognize AI-generated text.
Teachers and researchers can also use chatbots to automate their research and teaching processes.
The paradox is that global companies specializing in artificial intelligence cannot reliably distinguish the products of their own machines from the work of humans.
The reason is very simple. The main goal of AI companies is totrain AI “natural language processors” (NLP) to produce results as close as possible to human writing. In fact, the public demand for an easy means of detecting such AIs actually contradicts their own efforts in the opposite direction.
Using watermarkers can be an effective solution for managing the use of chatbots. To simplify, watermarkers are digital markers that you embed in an image or document to identify the owner or author of the work. In this case, watermarkers can be used to identify the answers or solutions generated by the chatbots.
These“watermarks” are invisible to the human eye, but they allow computers to detect that the text probably comes from an artificial intelligence system. When incorporated into large language models, they could help prevent some of the problems already caused.
Indeed, watermarking is a security technique that protects intellectual property, especially digital documents, from unauthorized use and counterfeiting. This technique involves inserting an image, text or other type of watermarker (precisely "watermark") within the document, which makes it unique eeasily traceable.
In some studies these watermarks have already been used to identify, with almost absolute certainty, the text generated by the artificial intelligence. The researchers of the University of Maryland, for example, managed to locate the text created by Meta's open-source language model, OPT-6.7B, using a detection algorithm they built. Although one of the University of Maryland researchers who participated in the watermarking work, John Kirchenbauer, said that “right now it's the Wild West“, perfectly capturing the current situation.
They are the tools by which programmers "teach" the computer to do something with data already labeled by humans, i.e. classify (in our case) the use of certain words instead of others, or combinations of words as elaborated by a chatbot.
OpenAi itself
has introduced in January a "classifier for indicating texts written by AI" admitting, however, that it has a success rate of no more than 26% of the text analysed.
Another classifier that seems more effective is the one created by Edward Tian, a Princetown student, who released the first version in January of GPTZero.
This application identifies the authorship of artificial intelligence based on two factors: the degree of complexity of a text and the variability of the sentences used.
To show how the program works, Tian posted two videos on Twitter comparing the analysis of a New Yorker article and a letter written by ChatGPT.
In both cases, the app was able to correctly identify their human and artificial origin.
The current "trick" to defeat classifiers is to replace some words with synonyms. Websites offering tools that paraphrase AI-generated text for this purpose are already popping up around the world. Using these
sites even Tian's classifier did not exceed the percentages of the other service.
AI-generated text detectors will become more and more sophisticated. The anti-plagiarism service
TurnItIn recently announced the arrival of an AI handwriting detector with a claimed accuracy of 97%.
However, text generators are also getting better. It's there the classic battle in which you don't see a winner, but only two contenders who continually overtake each other without seeing a finish line and, therefore at the moment, a winner.
As often happens in the digital sector, we will reach a point where custom and practice will lead the legislator to be able to regulate this sector in a harmonious way as well.
When, in addition to the law, the problems also concern ethics, it is difficult to get everyone to agree. In the academic field, you won't be able to make everyone happy.
But, as far as intellectual property related to these issues is concerned, the practice will lead to constantly evolving jurisprudential rulings, able to draw on both the law and the technology on the matter.
Instead, it will be even more difficult for companies to protect their intellectual property, as counterfeiters could use advanced text generators to create documents that appear authentic, but are actually the result of a forgery.
A change in the approach to intellectual property protection will be needed, with the adoption of more advanced security techniques, such as cryptography or the blockchain.
Article originally published on
AgendaDigitale and written together with the lawyer Niccolò Lasorsa Borgomaneri, Marsaglia law firm.
DcP - Digital Content Protection S.r.l.
Via Leone XIII, 14, 20145 Milano
PI/CF: 08328200962