Journal Webinar: A scalable bootstrap for massive data

Share Download as iCal file
Click the icons above to Share, Tweet or add this event to your calendar (iCal)
RSS Webinar Series

Wednesday 26 April 2017, 05:00pm

Location Online

The paper 'A scalable bootstrap for massive data' (RSS Series B, Volume 76, Issue 4, 2014) will be presented by Michael I Jordan, University of  California, Berkeley.

Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. He received his Masters in Mathematics from Arizona State University, and earned his PhD in Cognitive Science in 1985 from the University of California, San Diego. He was a professor at MIT from 1988 to 1998. His research interests bridge the computational, statistical, cognitive and biological sciences, and have focused in recent years on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, kernel machines and applications to problems in distributed computing systems, natural language processing, signal processing and statistical genetics. Prof. Jordan is a member of the National Academy of Sciences, a member of the National Academy of Engineering and a member of the American Academy of Arts and Sciences. He is a Fellow of the American Association for the Advancement of Science.He has been named a Neyman Lecturer and a Medallion Lecturer by the Institute of Mathematical Statistics. He received the IJCAI Research Excellence Award in 2016, the David E. Rumelhart Prize in 2015 and the ACM/AAAI Allen Newell Award in 2009. He is a Fellow of the AAAI, ACM, ASA, CSS, IEEE, IMS, ISBA and SIAM.

Co-authors are: Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar

Chair: Richard Samworth, Cambridge University

Abstract: The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large data sets—which are increasingly prevalent—the calculation of bootstrap-based quantities can be prohibitively demanding computationally. Although variants such as subsampling and the m out of n bootstrap can be used in principle to reduce the cost of bootstrap computations, these methods are generally not robust to specification of tuning parameters (such as the number of subsampled data points), and they often require knowledge of the estimator's convergence rate, in contrast with the bootstrap. As an alternative, we introduce the ‘bag of little bootstraps’ (BLB), which is a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators. The BLB is well suited to modern parallel and distributed computing architectures and furthermore retains the generic applicability and statistical efficiency of the bootstrap. We demonstrate the BLB's favourable statistical performance via a theoretical analysis elucidating the procedure's properties, as well as a simulation study comparing the BLB with the bootstrap, the m out of n bootstrap and subsampling. In addition, we present results from a large-scale distributed implementation of the BLB demonstrating its computational superiority on massive data, a method for adaptively selecting the BLB's tuning parameters, an empirical study applying the BLB to several real data sets and an extension of the BLB to time series data.

Journal club papers are carefully selected from recent issues of the Royal Statistical Society's journals by editorial board for their importance, relevance and/or use of cutting-edge methodology. This paper was published in the Journal of the Royal Statistical Society: Series B (Statistical Methodology), Volume 76, Issue 4 and is currently available online to subscribers of the journal. It will be made open access a few weeks preceding the webinar.

Download slided (Power Point)

Contact This email address is being protected from spambots. You need JavaScript enabled to view it.

NB: the webinar starts at 5pm (BST), 9am (PDT) on Wednesday 26 April.

Organiser Name Judith Shorten

Email Address This email address is being protected from spambots. You need JavaScript enabled to view it.

Organising Group(s) Royal Statistical Society





Join the RSS

Join the RSS

Become part of an organisation which works to advance statistics and support statisticians

Copyright 2019 Royal Statistical Society. All Rights Reserved.
12 Errol Street, London, EC1Y 8LX. UK registered charity in England and Wales. No.306096

Twitter Facebook YouTube RSS feed RSS feed RSS newsletter

We use cookies to understand how you use our site and to improve your experience. By continuing to use our site, you accept our use of cookies and Terms of Use.