PaECTER: Patent-level Representation Learning using Citation-informed Transformers

📅 2024-02-29
🏛️ arXiv.org
📈 Citations: 7
Influential: 1
📄 PDF
🤖 AI Summary
This study addresses the limitation of insufficient semantic representation in patent language models, which hinders performance on similarity-based tasks—including citation prediction, classification, and knowledge flow tracking. We propose the first patent representation learning method that explicitly incorporates examiner-added citations as structured semantic signals. Building upon a domain-adapted BERT-for-Patents architecture, we introduce a structure-aware Transformer encoder that models citation relationships as text-enhancing cues, enabling fine-grained, patent-level semantic encoding. Experimental results demonstrate that our method achieves an average rank of 1.32 at Top-1 recall on patent citation prediction—substantially outperforming existing patent language models. Moreover, it attains state-of-the-art performance on downstream tasks including semantic similarity search and cross-technology-domain classification. These results empirically validate the critical value of examiner-added citations in enhancing patent semantic matching capability.

Technology Category

Application Category

📝 Abstract
PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain. More specifically, our model outperforms the next-best patent specific pre-trained language model (BERT for Patents) on our patent citation prediction test dataset on two different rank evaluation metrics. PaECTER predicts at least one most similar patent at a rank of 1.32 on average when compared against 25 irrelevant patents. Numerical representations generated by PaECTER from patent text can be used for downstream tasks such as classification, tracing knowledge flows, or semantic similarity search. Semantic similarity search is especially relevant in the context of prior art search for both inventors and patent examiners. PaECTER is available on Hugging Face.
Problem

Research questions and friction points this paper is trying to address.

Generating patent document representations using citation-informed transformer models
Improving patent similarity prediction over existing domain-specific models
Enhancing prior art search through semantic similarity of patent texts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes BERT for Patents with citation data
Generates numerical representations for patent documents
Outperforms existing models in patent similarity tasks
🔎 Similar Papers
No similar papers found.
Mainak Ghosh
Mainak Ghosh
Starburst Data
Distributed Systems
S
Sebastian Erhardt
Max Planck Institute for Innovation and Competition, Munich, Germany
M
Michael E. Rose
Max Planck Institute for Innovation and Competition, Munich, Germany
E
Erik Buunk
Max Planck Institute for Innovation and Competition, Munich, Germany
Dietmar Harhoff
Dietmar Harhoff
Max Planck Institute for Innovation and Competition
innovationentrepreneurshipproductivityintellectual property