Working Paper
Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables
Abstract: Multimodal LLMs offer a watershed change for the digitization of historical tables, enabling low-cost processing centered on domain expertise rather than technical skills. We rigorously validate an LLM-based pipeline on a new panel of historical county-level vehicle registrations. This pipeline is estimated to be 100 times less expensive than outsourcing options, reduces critical parsing errors from 40% to 0.3%, and matches human-validated gold standard data with an R2 of 98.6%. Analyses of growth and persistence in vehicle adoption are statistically indistinguishable whether using LLM or gold standard data. LLM-based digitization unlocks complex historical tables, enabling new economic analyses and broader researcher participation.
JEL Classification: C80; N72; N32; R40;
https://doi.org/10.21799/frbp.wp.2025.28
Access Documents
File(s): File format is application/pdf https://www.philadelphiafed.org/-/media/FRBP/Assets/working-papers/2025/wp25-28.pdf
Bibliographic Information
Provider: Federal Reserve Bank of Philadelphia
Part of Series: Working Papers
Publication Date: 2025-09-30
Number: 25-28