College of Information Sciences and Technology, Pennsylvania State University
Tuesday, May 25, 2021, 3:00 EDT (1:00 MDT)
A video of this talk is available to NIST staff in the Math channel on NISTube, which is accessible from the NIST internal home page.
Abstract: As Artificial technologies become ubiquitous, the human race will have to contend with the many benefits and disadvantages of these advancements. Particularly in the realm of Natural Language Processing (NLP), Natural Language Generation (NLG) has seen a massive improvement in the generation of well-written texts. While this field has existed for some time, recently the advent of GPT-1 which utilized the Transformer Neural Network framework, led to more sophisticated text-generators such as GPT-2, Grover, Turing-NLG, GPT-3, and now Google's switch. These are just a few of the current state-of-the-art AI text-generators. In fact, what is even more alarming is that GPT-1 got released by OpenAI in 2018, and since then the field of NLG has birthed several other language models that are more than 10 times the size of GPT-1.
However, as these AI text-generators become better at generating sophisticated texts, it will become increasingly difficult to distinguish human-written texts from AI-generated ones, thus posing a security issue. Therefore, it is important to build models that can distinguish human-written texts from AI-generated ones. Consequently, we model this problem as the Turing Test problem. Turing Test is a test that assesses the intelligence of a Machine; if a Machine shows intelligence that is attributed to a human, then the machine has passed the test. In this scenario, a human is the administrator of the test. For the purposes of our work, we do not want the Machine (in this case AI text-generator) to pass the Turing Test. To achieve this goal, we automate the Turing Test process such that a Machine becomes the administrator of the test, making it the Reverse Turing Test. We also study a variant of the Turing Test problem, called Authorship Attribution. This a multi-classification problem, where we ask the question: given a text T and k candidate AI text-generators, can we single out the generator (among k alternatives) that generated T?
Finally, in this talk, we will discuss ways in which we have approached solving both the Turing Test and Authorship Attribution problems. We will also discuss the future of work for these problems.
Bio: I am a third-year Ph.D. student at The Pennsylvania State University and I work with Dr. Dongwon Lee in the PIKE lab. My research interests are Artificial Intelligence, Machine Learning, and Data mining in the application domain of Cybersecurity. Specifically, I am interested in NLP (Natural Language Processing), NLG (Natural Language Generation), and Adversarial Robustness (both in Computer Vision and NLP). I am currently an NSF SFS (Scholarship for Service) Scholar and Alfred P. Sloan Scholar. Lastly, I graduated from the University of Maryland Baltimore County with a B.S in Mathematics and a minor in Statistics in May 2018. My senior thesis is titled "Numerical Simulation of Vibrations of Mechanical Structures."
Host: Anthony Kearsley
Note: This talk will be recorded to provide access to NIST staff and associates who could not be present to the time of the seminar. The recording will be made available in the Math channel on NISTube, which is accessible only on the NIST internal network. This recording could be released to the public through a Freedom of Information Act (FOIA) request. Do not discuss or visually present any sensitive (CUI/PII/BII) material. Ensure that no inappropriate material or any minors are contained within the background of any recording. (To facilitate this, we request that cameras of attendees are muted except when asking questions.)