Working Paper

ChatMacro: Evaluating Inflation Forecasts of Generative AI


Abstract: Recent research suggests that generic large language models (LLMs) can match the accuracy of traditional methods when forecasting macroeconomic variables in pseudo out-of-sample settings generated via prompts. This paper assesses the out-of-sample forecasting accuracy of LLMs by eliciting real-time forecasts of U.S. inflation from ChatGPT. We find that out-of-sample predictions are largely inaccurate and stale, even though forecasts generated in pseudo out-of-sample environments are comparable to existing benchmarks. Our results underscore the importance of out-of-sample benchmarking for LLM predictions.

JEL Classification: C45; E31; E37;

https://doi.org/10.24148/wp2026-04

Access Documents

File(s): File format is application/pdf https://www.frbsf.org/wp-content/uploads/wp2026-04.pdf
Description: PDF - view

Authors

Bibliographic Information

Provider: Federal Reserve Bank of San Francisco

Part of Series: Working Paper Series

Publication Date: 2026-02-05

Number: 2026-04

Note: PDF date: January 27, 2006.