On binscatter

Abstract: Binscatter is very popular in applied microeconomics. It provides a flexible, yet parsimonious way of visualizing and summarizing ?big data? in regression settings, and it is often used for informal testing of substantive hypotheses such as linearity or monotonicity of the regression function. This paper presents a foundational, thorough analysis of binscatter: We give an array of theoretical and practical results that aid both in understanding current practices (that is, their validity or lack thereof) and in offering theory-based guidance for future applications. Our main results include principled number of bins selection, confidence intervals and bands, hypothesis tests for parametric and shape restrictions of the regression function, and several other new methods, applicable to canonical binscatter as well as higher-order polynomial, covariate-adjusted, and smoothness-restricted extensions thereof. In particular, we highlight important methodological problems related to covariate adjustment methods used in current practice. We also discuss extensions to clustered data. Our results are illustrated with simulated and real data throughout. Companion general-purpose software packages for Stata and R are provided. Finally, from a technical perspective, new theoretical results for partitioning-based series estimation are obtained that may be of independent interest.

Keywords: binning selection; regressogram; uniform inference; partitioning estimators; piecewise polynomials; nonparametric regression; robust bias correction; binned scatter plot; splines;

JEL Classification: C14; C18; C21;

Access Documents

File(s): File format is application/pdf
Description: Full text


Bibliographic Information

Provider: Federal Reserve Bank of New York

Part of Series: Staff Reports

Publication Date: 2019-02-01

Number: 881

Pages: 44 pages