Discussion Paper
Combining AI and Established Methods for Historical Document Analysis
Abstract: This paper examines methodological approaches for extracting structured data from large-scale historical document archives, comparing “hyperspecialized” versus “adaptive modular” strategies. Using 56 years of Philadelphia property deeds as a case study, we show the benefits of the adaptive modular approach leveraging optical character recognition (OCR), full-text search, and frontier large language models (LLMs) to identify deeds containing specific restrictive use language— achieving 98% precision and 90–98% recall. Our adaptive modular methodology enables analysis of historically important economic phenomena including re strictive property covenants, their precise geographic locations, and the localized neighborhood effects of these restrictions. This approach should be easily adapt able to other research involving deeds and similar document.
JEL Classification: C81; N32; R31; R38;
https://doi.org/10.21799/frbp.dp.2025.02
Access Documents
File(s): File format is application/pdf https://www.philadelphiafed.org/-/media/FRBP/Assets/Consumer-Finance/Discussion-Papers/dp25-02.pdf
Bibliographic Information
Provider: Federal Reserve Bank of Philadelphia
Part of Series: Consumer Finance Institute discussion papers
Publication Date: 2025-10-25
Number: 25-02