Follow the logic here:
1- Generative AI like GPT is trained on a corpus of texts that includes works published on the Internet (e.g. Wikipedia).
2- GPT is then able to generate new text modeled upon what it has gathered from those works. It learns meanings and imitates styles from the corpus.
3- This new text is then published back to the Internet, in the form of blog posts, news articles, books in the Amazon marketplace, etc.
We’re living in a unique time when this process is possible. But it seems to follow logically that we’re going to run into a problem soon:
When the new texts created in step #3 are added into the corpus of texts used in step #1, and future AI is trained in a loop on the output from past AI, a relatively small subset of writing styles is bound to dominate the data set. Rather than reflecting the full range of human verbal expression, the AI could develop a boxed-in voice. And as the proportion of texts that human beings read shifts toward AI-generated works, we will be exposed to less originality.
Generative artificial intelligence is by nature a copycat. There will be an explosion of literary content, but it will be stylistically homogenous, reflecting the AI’s programming and the most prevalent styles in the training data.
Human intelligence is also a copycat. The homogeneity will creep outward into human-authored works too.
Humans will still be creative and there will still be sparks of brilliance — a clever turn of phrase, a joyous fresh style, a newly coined term that more aptly describes a feeling than it has ever been described before.
But it will be harder and harder for innovations to spread. The sparks of creativity will never light a flame. They will never reach the critical mass of imitation required to evolve the artform. They will be smothered by the enormous volume of words generated by the AI. Flickers of light in the darkness.
I’m sure this problem is solvable. But we really should have figured out the solution before unleashing the genie. Generative AI has been birthed into the world without any regulatory oversight. Perhaps all AI-authored works should carry special tags in their meta data. But it’s too late. The AI Gold Rush has begun already. Very soon, products and services built by or upon AI will be absolutely everywhere.
I showed ChatGPT this blog post and ask for its thought. It replied thusly:
The argument presented in the blog post raises valid concerns about the potential homogenization of writing styles in AI-generated content. However, there are a few points to consider.
Firstly, while it’s true that GPT and similar models are trained on a corpus of texts that includes works published on the Internet, the size and diversity of this corpus is constantly growing. As the internet continues to expand and new content is created, AI models have access to an ever-increasing pool of material to learn from. This means that the likelihood of the training data becoming too dominated by a small subset of styles is somewhat mitigated.
Secondly, it’s important to note that AI-generated content is not created in a vacuum. The models are designed to learn from existing human language patterns and styles, but they are also capable of generating novel and creative content that goes beyond what has been seen before. While the output may initially reflect the most prevalent styles in the training data, over time it may develop its own distinct voice and evolve in new directions.
Finally, it’s worth noting that the role of AI-generated content in the wider literary landscape is likely to be complex and multifaceted. While it’s possible that the sheer volume of AI-generated content could drown out more innovative human-authored works, it’s also possible that the AI-generated content could act as a source of inspiration and fuel further creativity. Additionally, as AI-generated content becomes more prevalent, there may be an increased appreciation for the unique qualities of human-authored works, leading to a renewed interest in originality and creativity.
Overall, while the concerns raised in the blog post are worth considering, it’s important to view the issue in a nuanced and complex way. AI-generated content is still a relatively new and evolving field, and it remains to be seen how it will impact the wider literary landscape over time.