Search Results
Working Paper
Artificial Intelligence and Inflation Forecasts
We explore the ability of Large Language Models (LLMs) to produce in-sample conditional inflation forecasts during the 2019-2023 period. We use a leading LLM (Google AI's PaLM) to produce distributions of conditional forecasts at different horizons and compare these forecasts to those of a leading source, the Survey of Professional Forecasters (SPF). We find that LLM forecasts generate lower mean-squared errors overall in most years, and at almost all horizons. LLM forecasts exhibit slower reversion to the 2% inflation anchor.
Journal Article
Artificial Intelligence and Inflation Forecasts
We explore the ability of large language models (LLMs) to produce in-sample conditional inflation forecasts during the 2019–23 period. We use a leading LLM (Google AI’s PaLM) to produce distributions of conditional forecasts at different horizons and compare these forecasts to those of a leading source, the Survey of Professional Forecasters (SPF). We find that LLM forecasts generate lower mean-squared errors overall in most years and at almost all horizons. LLM forecasts exhibit slower reversion to the 2 percent inflation anchor.
Working Paper
Explaining Machine Learning by Bootstrapping Partial Marginal Effects and Shapley Values
Machine learning and artificial intelligence are often described as “black boxes.” Traditional linear regression is interpreted through its marginal relationships as captured by regression coefficients. We show that the same marginal relationship can be described rigorously for any machine learning model by calculating the slope of the partial dependence functions, which we call the partial marginal effect (PME). We prove that the PME of OLS is analytically equivalent to the OLS regression coefficient. Bootstrapping provides standard errors and confidence intervals around the point ...
Working Paper
The Anatomy of Out-of-Sample Forecasting Accuracy
We develop metrics based on Shapley values for interpreting time-series forecasting models, including“black-box” models from machine learning. Our metrics are model agnostic, so that they are applicable to any model (linear or nonlinear, parametric or nonparametric). Two of the metrics, iShapley-VI and oShapley-VI, measure the importance of individual predictors in fitted models for explaining the in-sample and out-of-sample predicted target values, respectively. The third metric is the performance-based Shapley value (PBSV), our main methodological contribution. PBSV measures the ...
Working Paper
Machine Learning, the Treasury Yield Curve and Recession Forecasting
We use machine learning methods to examine the power of Treasury term spreads and other financial market and macroeconomic variables to forecast US recessions, vis-à-vis probit regression. In particular we propose a novel strategy for conducting cross-validation on classifiers trained with macro/financial panel data of low frequency and compare the results to those obtained from standard k-folds cross-validation. Consistent with the existing literature we find that, in the time series setting, forecast accuracy estimates derived from k-folds are biased optimistically, and cross-validation ...
Working Paper
Evaluating Local Language Models: An Application to Bank Earnings Calls
This study evaluates the performance of local large language models (LLMs) in interpreting financial texts, compared with closed-source, cloud-based models. We first introduce new benchmarking tasks for assessing LLM performance in analyzing financial and economic texts and explore the refinements needed to improve its performance. Our benchmarking results suggest local LLMs are a viable tool for general natural language processing analysis of these texts. We then leverage local LLMs to analyze the tone and substance of bank earnings calls in the post-pandemic era, including calls conducted ...
Working Paper
Integrating Prediction and Attribution to Classify News
Recent modeling developments have created tradeoffs between attribution-based models, models that rely on causal relationships, and “pure prediction models†such as neural networks. While forecasters have historically favored one technology or the other based on comfort or loyalty to a particular paradigm, in domains with many observations and predictors such as textual analysis, the tradeoffs between attribution and prediction have become too large to ignore. We document these tradeoffs in the context of relabeling 27 million Thomson Reuters news articles published between 1996 ...