Working Paper

Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables


Abstract: Multimodal LLMs offer a watershed change for the digitization of historical tables, enabling low-cost processing centered on domain expertise rather than technical skills. We rigorously validate an LLM-based pipeline on a new panel of historical county-level vehicle registrations. This pipeline is estimated to be 100 times less expensive than outsourcing options, reduces critical parsing errors from 40% to 0.3%, and matches human-validated gold standard data with an R2 of 98.6%. Analyses of growth and persistence in vehicle adoption are statistically indistinguishable whether using LLM or gold standard data. LLM-based digitization unlocks complex historical tables, enabling new economic analyses and broader researcher participation.

JEL Classification: C80; N72; N32; R40;

https://doi.org/10.21799/frbp.wp.2025.28

Access Documents

Authors

Bibliographic Information

Provider: Federal Reserve Bank of Philadelphia

Part of Series: Working Papers

Publication Date: 2025-09-30

Number: 25-28