A widely used method for estimating counterfactuals and causal treatment effects from observational data is nearest-neighbor matching. This typically involves pairing each treated unit with its nearest-in-covariates control unit, and then estimating an average treatment effect from the set of matched pairs. Although straightforward to implement, this estimator is known to suffer from a bias that increases with the dimensionality of the covariate space, which can be undesirable in applications that involve high-dimensional data. To address this problem, we propose a novel estimator that first projects the data to a number of random linear subspaces, and it then estimates the median treatment effect by nearest-neighbor matching in each subspace. We empirically compute the mean square error of the proposed estimator using semi-synthetic data, and we demonstrate the method on real-world digital marketing campaign data. The results show marked improvement over baseline methods.
Learn More