Alternative Hypothesis Retrieval Model for Crime Investigation Analysis Using Argument Mining

Park, Sungmi

The Korean National Police became authorized to perform independent investigations due to the revision of the Korean Criminal Procedure Act in 2020. As a result, unprecedented importance was placed on the review process of cases investigated by police. However, existing case analysis support tools do not focus on logical verification, tending instead to focus on collecting and analyzing evidence. This fundamental gap in the review and analysis of cases necessitates a support system for argument analysis. The purpose of this study is to (1) automatically extract and classify elements of arguments found in related case documents, (2) group these elements, and (3) retrieve potential alternative hypotheses from a repository of these elements. Argument[ation] Mining is defined as a technology that identifies arguments and evidence and analyzes arguments' structure. To our knowledge, there is no appropriate corpus for argumentation mining available in Korean. We have collected 73 Korean first instance criminal cases, which we analyzed using a modified Toulmin model. We have selected features based on previous research in argument mining to classify the elements of arguments, especially for the legal domain. However, instead of the usual two- to three types of arguments (premise, claim, the main claim), we have attempted to classify the sentences into six types of arguments based on the modified Toulmin model (datum, warrant, backing, claim, rebuttal and rebuttal support). We have used K-means and Fuzzy c-means clustering algorithms to group the argumentative sentences. K-means is a popular clustering method for documents, while a previous research clustering legal arguments proposed fuzzy c-means. Our alternative hypothesis retrieval model assumes that a new document has been analyzed using the technology stated above. Instead of just finding the most similar sentence to an argument, we use a set of rules to determine the potential alternative hypotheses and use sentence similarity to find a related argument group from the argument repository. Then, we use similarity measurements between the argument nodes and relationships (edge) to retrieve the most relevant alternative hypotheses. Using a new argument from a court decision not included in the initial dataset, we found our model successfully identified relevant alternative hypotheses. In the future, we hope to develop our model further and enhance the scope and accuracy of the potential hypotheses generation, and ultimately serve as a stepping stone towards developing an Artificial-Intelligence-driven investigation system.