Building a Dataset for Key Information Extraction from Judgement in Criminal Case Focusing on the Use of GPT-35 Prompt

Park, Yerin and Won, Gwangjae and Park, Roseop

Extracting meaningful information from the factual details of judicial decisions plays a crucial role in understanding cases, and the training of artificial intelligence models for information extraction requires high-quality datasets. However, constructing training datasets in specific domains such as law requires substantial resources, including manpower and time. In this study, we leverage the OpenAI GPT-3.5 model to efficiently construct a training dataset. Specifically, we propose a method called "GPT-3.5&Annotator-in-the-loop" where we perform initial annotation using Simple prompts with GPT-3.5, followed by human review to build a higher-quality dataset by identifying prompts that yield better results. The research results demonstrate that among three types of prompts, the Few-shot prompts that provide multiple examples show the best performance, as confirmed by the confusion matrix and ROUGE-L metric. Furthermore, the human-AI annotation approach used in this study significantly reduces the resources required for dataset construction by approximately 90%. The proposed methodology and results for dataset construction presented in this study demonstrate the potential for efficient dataset acquisition in the field of law.