Legal Informatics and Forensic Science: Paper Review - SAILER: A Structure-Aware Pre-Trained Model for Legal Case Retrieval(H Li, 2023)

This paper address the critical task of legal case retrieval, which is essential in intelligent legal systems.

While pre-training has succeeded in ad-hoc retrieval, effective strategies for legal case retrieval are still questioned. Legal case documents have intricate logical structures, but existing language models struggle with long-distance dependencies and key legal elements. To tackle these issues, this paper proposes SAILER, a Structure-Aware pre-trained Language Model for Legal Case Retrieval. SAILER optimizes retrieval by utilizing structural information, attending to key legal elements, and employing an asymmetric encoder-decoder architecture for pre-training. This model demonstrates strong discrimination capabilities without legal annotation data, distinguishing cases accurately. Experiments show significant improvements over state-of-the-art methods in legal case retrieval.

Challenges Addressed

Long Document Structures: Legal case documents are lengthy with inherent logical structures, posing challenges for existing models in capturing long-distance dependencies.
Relevance in Legal Domain: Relevance in legal case retrieval hinges on key legal elements, differentiating cases significantly. Existing models struggle with understanding these elements.

Approach

Structure the data (Structure-Aware)
Consists of a Fact Encoder, Reasoning Decoder, and Decision Decoder, targeting different components of legal case documents.
The Fact Encoder uses a BERT-like model to represent the Fact section, wuhile Reasoning and Decision Decoders focuss on key legal elements and judgment prediction.

Method

Structures the data into the following 5 sections :
- PROCEDURE
- FACT
- REASONING
- DECISION
- TAIL
Fact Encoder: Masks tokens and uses final hidden states for representation.
- Use the final hidden state as the representation of the ENTIRE SENTENCE (hf)
Reasoning Decoder: Reconstructs aggressively-masked Reasoning text
- Aggressively mask tokens (30~60%)
- NEEDS to rely heavily on the Fact (hf) to recover the masked information
- Enhance key element attention.
Decision Decoder: Handles legal judgment prediction, masking according to country-specific decision formats.
- Mask information according to the country’s decision format
- IF No format -> select words with high TFIDF values to mask

sailer1

Experiments & Result Analysis

Compares with existing baseline models such as Traditional Retrieval Models, Generic Pre-trained Models, and Retrieval-oriented Pre-trained Models.

SAILER dataset

Chinese : tens of millions of case documents from China Judgment Online
English : U.S. federal and state courts

sailer2

SAILER Results & Analysis

Encoder : High masking ratio prevent generation of high-quality sentence embeddings
Decoder : Exessive masking makes it difficult

SAILER emphasizes legal terms than SEED(retrieval-oriented model)

sailer3

Contributions

SAILER is the first model to leverage structural information in legal case pre-training.
Proposing pre-training objectives that capture long-distance dependencies and logical knowledge.
Extensive experiments demonstrating SAILER’s effectiveness across Chinese and English legal benchmarks.

In conclusion, “SAILER: A Structure-Aware Pre-Trained Model for Legal Case Retrieval” addresses challenges in legal case retrieval by introducing a novel framework. Through its innovative approach, SAILER significantly advances the field by emphasizing structural information and key legal elements, demonstrating its potential to enhance legal case retrieval in intelligent legal systems.

Paper Review - SAILER: A Structure-Aware Pre-Trained Model for Legal Case Retrieval(H Li, 2023)

박지원 (Jee Won, Park)

Challenges Addressed

Approach

Method

Experiments & Result Analysis

Contributions