Search Results
Working Paper
Corporate Disclosure: Facts or Opinions?
A large body of literature documents the link between textual communication (e.g., news articles, earnings calls) and firm fundamentals, either through pre-defined “sentiment” dictionaries or through machine learning approaches. Surprisingly, little is known about why textual communication matters. In this paper, we take a step in that direction by developing a new methodology to automatically classify statements into objective (“facts”) and subjective (“opinions”) and apply it to transcripts of earnings calls. The large scale estimation suggests several novel results: (1) Facts ...
Working Paper
PEAD.txt: Post-Earnings-Announcement Drift Using Text
We construct a new numerical measure of earnings announcement surprises, standardized unexpected earnings call text (SUE.txt), that does not explicitly incorporate the reported earnings value. SUE.txt generates a text-based post-earnings announcement drift (PEAD.txt) larger than the classic PEAD and can be used to create a profitable trading strategy. Leveraging the prediction model underlying SUE.txt, we propose new tools to study the news content of text: paragraph-level SUE.txt and paragraph classification scheme based on the business curriculum. With these tools, we document many ...
Working Paper
One Threshold Doesn’t Fit All: Tailoring Machine Learning Predictions of Consumer Default for Lower-Income Areas
Modeling advances create credit scores that predict default better overall, but raise concerns about their effect on protected groups. Focusing on low- and moderate-income (LMI) areas, we use an approach from the Fairness in Machine Learning literature — fairness constraints via group-specific prediction thresholds — and show that gaps in true positive rates (% of non-defaulters identified by the model as such) can be significantly reduced if separate thresholds can be chosen for non-LMI and LMI tracts. However, the reduction isn’t free as more defaulters are classified as good risks, ...
Working Paper
Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables
Multimodal LLMs offer a watershed change for the digitization of historical tables, enabling low-cost processing centered on domain expertise rather than technical skills. We rigorously validate an LLM-based pipeline on a new panel of historical county-level vehicle registrations. This pipeline is estimated to be 100 times less expensive than outsourcing options, reduces critical parsing errors from 40% to 0.3%, and matches human-validated gold standard data with an R2 of 98.6%. Analyses of growth and persistence in vehicle adoption are statistically indistinguishable whether using LLM or ...