GPT3 & Parametric Academic Writing
This is the first blog post in a series of future posts on the topic of parametric academic writing, to answer the question of when and how machines can replace scientists, or at least some of their work processes.
As scientists we spend hours, sometimes days, just formatting and debugging our reference lists that are still inserted at the end of a static text, as if the Internet was never invented. These links are often only used as optional meta-information for a specific reference, but not in the text directly, even though scientists exclusively write — and increasingly read — those texts with electronic tools.
From what I could research, the W3C has not yet issued any meaningful standards for marking and referencing specific text passages on a web document someone else generated, which would allow linking to a self-selected passage directly. Having such standards, however, would help the academic community to shift to fully hypertext based reference systems.
The scientific community has been criticised for their disregard of this topic over the last decades:
Furthermore, many scientists, for the lack of time, outsource the final revision of their reference lists to others, who then manually check their reference lists and make sure the citation formatting is correct. And this is just the formatting process.
While the art of correct citation was very important in the analog world, this process can be fully replaced by machines, if we put our attention to it. After all, science is about creating new knowledge, referencing the existing common body of research is only a means to an end.
The Internet has provided us with a great deal of new tools that could — in theory — ease the process of referencing to other people’s work. However, we are still mostly using these electronic tools — such as plagiarism checkers — for error prevention and finding the mistakes in other people’s work. More sophisticated methods for helping scientists not to have to worry about referencing at all, have hardly been developed, even though big data and blockchain based tools could be used for that purpose.
Thinking one step ahead, especially in the light of the recent release of GPT3 — an open-source AI tool that can write code and prose — one has to wonder:
- Can AI based text generators be trained to write scientific papers?
- How this will influence the plagiarism debate?
- What will be the role of a ‘human scientist’ in a future of AI scientists?
So far, little research has been conducted on the question of when and how machine learning algorithms could replace — at least parts — of the scientific process. At ICIS 2019 a panel on the potential of research automation discussed this topic.
Thilina Rajapakse has experimented with Open AI’s predecessor — GPT2 — to see how transformer models can be used for domain specific language generation — such as academic texts.
Inspired by this work we have set up a GPT2 instance together with some other people to explore to what extent transformer models can already generate academic texts. We are currently defining the parameters with which to test the boundaries of existing text generators, and the definition of acceptable text base etc. We are also waiting to get access to the recently released GPT3 API.
The main question will be how we can use parametric design methods for academic writing. Academia in general can learn much from the architectural community who have been increasingly using parametric design as an R&D method. The future of “parametric academic writing” is a topic, that will need much more discussion as well as research & development at the intersection of Computer Science, Epistemology, Metascience, Socio-Technical Systems and Transhumanism.
- Haoarchive, Karen (August 14, 2020) A college kid’s fake, AI-generated blog fooled tens of thousands. This is how he made it. MIT Technology Review: https://www.technologyreview.com/2020/08/14/1006780/ai-gpt-3-fake-blog-reached-top-of-hacker-news/
- Rajapakse, Thilina (2020) Learning to Write: Language Generation With GPT-2 A guide on language generation and fine-tuning language generation Transformer models with Simple Transformers. It’s easier than you think!, The Startup.
- Extance, A. (2018) How AI technology can tame the scientific literature. Nature.
- Are academics already on the way to being replaced by AI?, Times Higher Education.
- Lesh, N., Marks, J., Rich, C., Sidner, C.L. (2004) Man-Computer Symbiosis ´Revisited: Achieving Natural Communication and Collaboration with Computers. IEICE Transactions on Information and Systems
- Singh, Sarwant (2017) Transhumanism And The Future Of Humanity: 7 Ways The World Will Change By 2030, Forbes
- Bostrom, N. (2003) Introduction to transhumanism. Presented at the Intensive Seminar on Transhumanism, Yale University, 26 June 2003. Available at: https://www.slideshare.net/danila/introduction-to-transhumanism?from_action=save