北京大学统计科学中心

首页» 新闻动态» 学术讲座» 统计与数据科学系列讲座

统计与数据科学系列讲座

"Double Boosting for High Dimensional IV Regression Models" (coauthored with Hao Xu) and "Asymmetric AdaBoost for High Dimensional Maximum Score Regression" (coauthored with Jianghao Chu and Aman Ullah)

报告人： Tae-hwy Lee，University of California Riverside

时间：2017-09-14

地点：217, Guanghua Building 2

Abstract: Endogeneity in a regression model for the automobile demand equation leads to inconsistent estimation of the price elasticity parameter. The standard solutions are the two stage least squares (2SLS) and generalized method of moments (GMM). These methods face challenges when instruments are high dimensional and when some are irrelevant and/or invalid. It is critical to select relevant and valid instruments for the consistent estimation. In this paper, we introduce a new method that will select relevant and valid instruments simultaneously using boosting algorithm, which we call Double Boosting (DB). We show that the DB consistently selects relevant and valid instruments. In particular, we consider the case when the endogenous variables X (price) are unknown nonlinear functions of observable instruments W (the product characteristics), which can be approximated by some sieve functions such as polynomials. The sieve approximation captures nonlinearity between endogenous variables X and instruments W, while however it produces high dimensional instruments Z = f (W). Monte Carlo simulation demonstrates the DB procedure, and compares its performance relative to other methods such as penalized GMM (Cheng and Liao 2015) and the standard Boosting (Ng and Bai 2008). In the application to estimating the BLP-type automobile demand function (Berry, Levinson and Pakes 1995) with price being endogenous and instruments being high dimensional functions of product characteristics, we ?nd that the DB estimators indicate that automobiles demands are more consistent with the profit maximization compared to other estimators. Abstract for “Asymmetric AdaBoost for High Dimensional Maximum Score Regression”: Adaptive Boosting or AdaBoost, introduced by Freund and Schapire (1996) has been proved to be effective to solve the high-dimensional binary classification or binary prediction problems. Friedman, Hastie, and Tibshirani (2000) show that AdaBoost builds an additive logistic regression model via minimizing the `exponential loss’. We show that the exponential loss in AdaBoost is equivalent (up to scale) to the symmetric maximum score (Manski 1975, 1985) and also to the symmetric least square loss for binary prediction. Therefore, the standard AdaBoost using the exponential loss is a symmetric algorithm and solves the binary median regression. In this paper, we introduce Asymmetric AdaBoost that produces an additive logistic regression model from minimizing the new `asymmetric exponential loss’ which we introduce in this paper. The Asymmetric AdaBoost can handle the asymmetric maximum score problem (Granger and Pesaran 2000, Lee and Yang 2006, Lahiri and Yang 2012, and Elliot and Lieli 2013) and therefore solve the binary quantile regression. We also show that our asymmetric exponential loss is equivalent (up to scale) to the asymmetric least square loss (Newey and Powell 1987) for binary classification/prediction. We extend the result of Bartlett and Traskin (2007) and show that the Asymmetric AdaBoost algorithm is consistent in the sense that the risk of the classifier it produces approaches the Bayes Risk. Monte Carlo experiments show that Asymmetric AdaBoost performs well relative to the lasso-regularized high-dimensional logistic regression under various situations especially when p>>n and in the tails. We apply the Asymmetric AdaBoost to predict the business cycle turning points and directions of stock price changes.

About the Speaker:

Tae-Hwy Lee is a professor of economics at University of California Riverside. He received a Ph.D. in economics in 1990 from University of California San Diego under the supervision of Sir Clive W.J. Granger, a Nobel Laureate in Economic Science. He received his undergraduate degree in economics in 1985 from Seoul National University. His research areas include the topics of nonstationary time series, nonlinear time series models, aggregation issues, specification testing, forecasting, inference in predictive regression, causality, volatility models, quantile models, factor models, nonparametric methods, shrinkage methods, model selection, model averaging, causal inference, machine learning methods, high dimensional models, panel data models, and etc. Professor Lee has received several awards including the NSF/ASA/BLS Senior Research Fellowship, the Econometric Theory Tjalling C. Koopmans Prize, and Korea Sanhak Foundation Award. He has taught or visited at Louisiana State University, California Institute of Technology, Dongguk University Seoul, University of Cambridge, Xiamen University, Shanghai University of Finance and Economics, Dongbei University of Finance and Economics, Central University of Finance and Economics, City University of Hong Kong, Bilgi University Istanbul, University of Southern California, University of California Irvine, University of California San Diego, Federal Reserve Bank of Saint Louis, Korea Development Institute, Korea Institute of Finance, Bank of Korea, and U.S. Bureau of Labor Statistics.