Transformer-based Legal Argument Structure Extraction Model for Crime Investigation Analysis

Gu, Yeri

The implementation of the revised Korean Criminal Procedure Act in 2020 grants a subjective position to the police to be responsible for the primary investigation, thus making the police investigator's case review process unprecedentedly important. In addition, newly amended legislation strengthens the direct investigation of evidence in courts, hence, logical proving cases in court based on objective evidence are further requested. With such change, the verification of the investigation process through argumentation is expected to be a core competency required by the police. However, the existing case analysis tools focus on collecting and analyzing evidence rather than logical verification, therefore, an argument analysis system that can derive legal claims based on evidence is required for case analysis with logical completeness. The purpose of this study is to devise an argument mining model that allows investigators to examine the case's argument structure with a quick and objective perspective by (1) automatically extracting the argument components, and (2) classifying the relationship between the extracted argument pairs. We also aim to increase the model’s performance by using Transformer-based architectures, which have recently been actively used in the field of natural language processing. Argument Mining is an NLP method that identifies arguments in text and is used in various domains, including education, policy, social media, and law. In this study, 256 criminal judgments of the first court were used to analyze argument components and relations based on the Toulmin+ argument model which is an expanded and reconceived version of the original Toulmin model. The first task of this study attempts to multi-classify a total of seven argument components using the Korean BERT model. The results confirmed that the pre97 trained model can be fine-tuned to the legal corpus by showing equivalent performance to the Support Vector Machine, a supervised classification method that performed well in previous studies. The second task uses the BertForMultipleChoice model and the KLUE BERT-base NLI model to extract the most related phrase in the document and classify their relationships. The model’s outstanding performance is significant considering the difficulty of extracting argument relationships pointed out in previous studies. Finally, this study proposes a system that extracts the argument structures through two preceding tasks and visualizes them in graph form. The results showed that a specific type of argument structure exists in court decisions and that they can be expressed through the model developed in this study. This study is expected to be used in various fields of artificial intelligence investigation systems such as similar case retrieval by training the model on the extracted argument graphs embeddings through additional technological improvements.