北京大学统计学首届校友学术论坛第三期将于北京时间2022年8月13日上午举行，主题是：“生物统计与生物医学大数据” （Biostatistics and Big Biomedical Data）。本期将邀请宾夕法尼亚大学李洪哲教授、耶鲁大学赵宏宇教授、 加州大学伯克利分校黄海艳教授三位校友作报告分享，之后进行圆桌讨论。根据疫情防控要求，本期论坛将继续采用Zoom线上会议形式进行。
Zoom会议ID：835 1872 6637 (密码: 0813）
Hongzhe Li，Department of Biostatistics，University of Pennsylvania
Dr. Hongzhe Li is Perelman Professor of Biostatistics, Epidemiology and Informatics at the Perelman School of Medicine at the University of Pennsylvania. He is Vice Chair of Research Integration, Director of Center of Statistics in Big Data and former Chair of the Graduate Program in Biostatistic at Penn. Dr. Li has been elected as a Fellow of the American Statistical Association (ASA), a Fellow of the Institute of Mathematical Statistics (IMS) and a Fellow of AAAS. Dr. Li served on the Board of Scientific Counselors of the National Cancer Institute of NIH and regularly serves on various NIH study sections. He served as Chair of the Section on Statistics in Genomics and Genetics of the ASA. Dr. Li’s research focuses on developing statistical and computational methods for analysis of large-scale genetic, genomics and metagenomics data and theory on high dimensional statistics. He has published papers in Science, Nature, Nature Genetics, Nature Methods, Nature Microbiology, Science Translational Medicine, Cell Host & Microbe, JASA, JRSS, Biometrika, Annals of Statistics and Annals of Applied Statistics etc.
Title: Microbiome Data Science - Phylogenetic Tree, Bacterial Growth Rate and Biosynthetic Gene Cluster
The gut microbiome plays an important role in maintenance of human health. High-throughput shotgun metagenomic sequencing of a large set of samples provides an important tool to interrogate the gut microbiome. Besides providing footprints of taxonomic community composition and genes, these data can be further explored to study the bacterial growth rate and metabolic potentials via generation of small molecules and secondary metabolites. Everything from microbiome diagnosis to microbiome-based therapy will rely on vast amounts of data analysis. In this talk, I will present several computational and statistical methods for analysis of data measured on phylogenetic tree and methods for estimating bacterial growth rate for metagenome-assembled genomes (MAGs) and for predicting all biosynthetic gene clusters (BGCs) in the bacterial genomes. The key statistical and computational tools used include Wasserstein distance estimation, optimal permutation recovery based on low-rank matrix projection and a LSTM deep learning method to improve prediction of BGCs. I will demonstrate the application of these methods using several ongoing microbiome studies of inflammatory bowel disease at University of Pennsylvania.
Hongyu Zhao，Department of Biostatistics，Yale University
Hongyu Zhao received his BS in Probability and Statistics from Peking University in 1990 and PhD in Statistics from UC Berkeley in 1995. He is currently the Ira V. Hiscock Professor of Biostatistics, and Professor of Statistics and Data Science and Professor of Genetics at Yale University. Hongyu’s research interests are the developments and applications of statistical and computational methods in molecular biology, genetics, drug developments, and precision medicine. He has published extensively with methodology papers in leading statistics, bioinformatics, computational biology, and genetics journals, and his collaborative work has appeared in leading scientific journals. Since joining Yale in 1996, Hongyu has trained over 100 doctoral and post-doctoral students. He was a Co-Editor of Statistics in Biosciences (2011-2017) and Co-Editor of the Journal of the American Statistical Association – Theory and Methods (2018-2020). Hongyu was the recipient of a number of honors, including the Mortimer Spiegelman Award for a top statistician in health statistics by the American Public Health Association, and the Pao-Lu Hsu Prize by the International Chinese Statistical Association. He is an elected Fellow of the Institute of Mathematical Statistics, the American Statistical Association, and the American Association for the Advancement of Sciences.
Title: Predicting Disease Risk from Genomics Data
Accurate disease risk prediction based on genetic and other factors can lead to more effective disease screening, prevention, and treatment strategies. Despite the identifications of hundreds of thousands of disease-associated genetic variants for thousands of traits through genome-wide association studies in the past two decades, performance of genetic risk prediction remains moderate or poor for most diseases, which is largely due to the challenges in both identifying all the functionally relevant variants and accurately estimating their effect sizes. Moreover, as most genetic studies have been conducted in individuals of European ancestry, it is even more challenging to develop accurate prediction models in other populations. Furthermore, many studies only provide summary statistics instead of individual level genotype and phenotype data. In this presentation, we will discuss a number of statistical methods that have been developed to address these issues through jointly estimating effect sizes (both across genetic markers and across populations), modeling marker dependency, incorporating functional annotations, and leveraging genetic correlations among different diseases and populations. We will demonstrate the utilities of these methods through their applications to a number of complex diseases/traits in large population cohorts, e.g. the UK Biobank. This is joint work with Geyu Zhou, Wei Jiang, Yixuan Ye, and others.
Haiyan Huang，Department of Statistics，UC Berkeley
Haiyan Huang received her BS in Mathematics at Peking University in 1997, her PhD in Applied Mathematics at the University of Southern California in 2001. She did postdoc at Harvard University from 2001-2003.Currently, she is a Professor and the Chair of the Department of Statistics at UC Berkeley. She was the Director of the Center for Computational Biology at UC Berkeley from 2019 to 2022. As an applied statistician, her research is at the interface between statistics and data-rich scientific disciplines such as biology. Over the past few decades, rapidly evolving biological technologies have generated enormous high-dimensional, complex, noisy data, presenting increasingly pressing challenges to statistical and computational science. Her group has devoted to addressing various modeling and analysis challenges from these data.
Title: Measuring Bivariate Dependence Using Count Statistics with Applications to Biomedical Data