Integrating Prediction and Attribution to Classify News
Abstract: Recent modeling developments have created tradeoffs between attribution-based models, models that rely on causal relationships, and â€œpure prediction modelsâ€ such as neural networks. While forecasters have historically favored one technology or the other based on comfort or loyalty to a particular paradigm, in domains with many observations and predictors such as textual analysis, the tradeoffs between attribution and prediction have become too large to ignore. We document these tradeoffs in the context of relabeling 27 million Thomson Reuters news articles published between 1996 and 2021 as debt-related or non-debt related. Articles in our dataset were labeled by journalists at the time of publication, but these labels may be inconsistent as labeling standards and the relation between text and label has changed over time. We propose a method for identifying and correcting inconsistent labeling that combines attribution and pure prediction methods and is applicable to any domain with human-labeled data. Implementing our proposed labeling solution returns a debt-related news dataset with 54% more observations than if the original journalist labels had been used and 31% more observation than if our solution had been implemented using attribution-based methods only.
File(s): File format is application/pdf https://www.federalreserve.gov/econres/feds/files/2022042pap.pdf
Part of Series: Finance and Economics Discussion Series
Publication Date: 2022-07-01