Discussion Paper

Combining AI and Established Methods for Historical Document Analysis


Abstract: This paper examines methodological approaches for extracting structured data from large-scale historical document archives, comparing “hyperspecialized” versus “adaptive modular” strategies. Using 56 years of Philadelphia property deeds as a case study, we show the benefits of the adaptive modular approach leveraging optical character recognition (OCR), full-text search, and frontier large language models (LLMs) to identify deeds containing specific restrictive use language— achieving 98% precision and 90–98% recall. Our adaptive modular methodology enables analysis of historically important economic phenomena including re strictive property covenants, their precise geographic locations, and the localized neighborhood effects of these restrictions. This approach should be easily adapt able to other research involving deeds and similar document.

JEL Classification: C81; N32; R31; R38;

https://doi.org/10.21799/frbp.dp.2025.02

Access Documents

Authors

Bibliographic Information

Provider: Federal Reserve Bank of Philadelphia

Part of Series: Consumer Finance Institute discussion papers

Publication Date: 2025-10-25

Number: 25-02