Potentials of AI in the Analysis and Evaluation of Essay-type tasks
Workshopleitung |
Andrea Palmini and Tunc Yilmaz |
Datum |
15.02.2024 |
Struktur der Veranstaltung |
Präsentation und Diskussion |
Summary
The workshop focused on an introduction to Transformer Based Models and Large Language Models (LLMs) and their application in the context of the Impact Project for text similarity assessments.
In the introduction, the structure, architecture of LLMs was explained, followed by the key features of Transformer Based Models (attention mechanism, parallel processing and encoder-decoder architecture). Next, the Automatic Short Answer Grading (ASAG) and some other examples of tasks that LLMs can perform were addressed.
Focusing on the use of LLMs for text comparison, the methods bag of words and word embeddings were explained, including their limitations.
Based on examples from the IMPACT Project Work Package 5 aimed at assisting professors to evaluate and grade exams, modified versions of real-life examples were graded with the Transformer Similarity Outputs.
Lastly, the new features of the GPT-series were explained, including the concerns and potentials for text similarity assessment.
Discussion
After clarifying some questions of understanding, the discussion centred on implications of limiting the knowledge base on which the LLMs perform to a particular material and how can LLMs perform when the writing tasks involve creative writing and production of novel information or knowledge.
Take-aways
The main potentials of LLMs for text similarity assessment are:
- Prompt Engineering: writing clear instructions and splitting tasks to achieve a certain goal with primitive rules and limitations
- Fine-Tuning: it allows the application across multiple use cases and new domains (e.g., improving similarity for legal texts)
- Retrieval-Augmented Generation: course material and other information can be used to limit the knowledge base of the LLM while performing tasks
- User Experience and User Interface: dialogue style, and other capabilities as well as features give flexibility and control.
However, there are general concerns regarding LLMs on bias and ethics, privacy and security, the computational power and costs; and hallucinations (among others: overfitting, non-sense output and sentence contradiction). In particular, the interpretability of LLMs for text similarity assessment remains challenging: how exactly are the outputs generated?
Summing up, potentials and concerns have to be analysed simultaneously to decide where and how LLMs should be used.
Instructors
Tunc Yilmaz (Freie Universität Berlin, FUB-IT / Center für Digitale Systeme (CeDiS), Bereich E-Learning & E-Examinations) He graduated from the M.Sc. Cognitive Systems program at the University of Potsdam. Since 2022, he has been taking part in the BMBF joint project IMPACT with the tasks of developing AI-based dialog systems for the study orientation and entry phase, and implementation of AI based text analysis tools in the context of summative examinations.
Andrea Palmini (Freie Universität Berlin, FUB-IT / Center für Digitale Systeme (CeDiS), Bereich E-Learning & E-Examinations and System modelling at the Institute of Veterinary-Epidemiology and Biostatistics). He graduated from the M.Sc. Statistics at the Humboldt University of Berlin. Since his graduation he has been involved in projects, studies and lectures related to NLP and the use of Large Language Models. Since 2022, he has been working in the BMBF joint project IMPACT with the task of implementing AI based language processing tools in the framework of summative examinations.
Potentials of AI in the Analysis and Evaluation of Essay-type tasks © 2024 by Tunc Yilmaz and Andrea Palmini (FU Berlin) is licensed under CC BY-NC-ND 4.0.