Working Paper
ChatMacro: Evaluating Inflation Forecasts of Generative AI
Abstract: Recent research suggests that generic large language models (LLMs) can match the accuracy of traditional methods when forecasting macroeconomic variables in pseudo out-of-sample settings generated via prompts. This paper assesses the out-of-sample forecasting accuracy of LLMs by eliciting real-time forecasts of U.S. inflation from ChatGPT. We find that out-of-sample predictions are largely inaccurate and stale, even though forecasts generated in pseudo out-of-sample environments are comparable to existing benchmarks. Our results underscore the importance of out-of-sample benchmarking for LLM predictions.
JEL Classification: C45; E31; E37;
https://doi.org/10.24148/wp2026-04
Access Documents
File(s):
File format is application/pdf
https://www.frbsf.org/wp-content/uploads/wp2026-04.pdf
Description: PDF - view
File(s):
File format is text/html
https://www.frbsf.org/research-and-insights/publications/working-papers/2026/02/chatmacro-evaluating-inflation-forecasts-generative-of-ai/
Description: FRBSF - view
Bibliographic Information
Provider: Federal Reserve Bank of San Francisco
Part of Series: Working Paper Series
Publication Date: 2026-02-05
Number: 2026-04
Note: PDF date: January 27, 2006.