EMSW21 VIGRE Report Department of Statistics University ...nolan/vigre/VIGRE_Report.pdfcomponent...

58
EMSW21 VIGRE Report Department of Statistics University of California Berkeley I. How well has the integration of research and education been achieved at all levels? A. Integration of research at all levels In the past two years, we have successfully followed the Braid Model of vertical integration to create a healthy research environment for postdocs, graduate students and undergraduates. The Braid Model was described below at the VIGRE workshop in April 2003: This approach sees each level following a time-line, and finds points where the time- lines knit together. At one point a postdoc might be supervising an undergraduate research project; at another, postdocs, advanced graduate students, and faculty might be collaborating in running a topics seminar; at another, faculty might be giving a series of introductory talks for beginning graduate students; at another, advanced graduate students might be preparing beginning graduate students for qualifying exams. Full vertical integration of all four groups might never occur at the same moment, but over time the strands knit together producing a vertically integrated career path for each group. Research in the department has happened in different forms. Some faculty take the group meeting model from biology and engineering, while others keep the one-one supervising style. No matter what form each individual faculty takes, in addition to interactions among subsets of faculty, postdocs, graduate students and undergraduate students, there are often collaborators involved from other disciplines, forming another line of vertical research integration beyond statistics department. Moreover, the VIGRE funding has enabled joint advisors more easily because VIGRE student has guaranteed support and faculty's energy can be spent on research instead of looking for funding for a student or not taking a student due to limited grant money. In other words, the VIGRE grant has contributed greatly to the emergence of different research group styles, and provides VIGRE students with the intellectual flexibility that allows them to take on more exciting topics which would otherwise not be possible. A.1 Research Groups in Computational and Genomic Biology The department has attracted in recent years outstanding young faculty (Huang, Nielsen, Song, Purdom) in computational and genomic biology to to join our senior faculty (Speed and Bickel) in this area. All of these faculty are active members of the program of Designated Emphasis in Computational and Genomic Biology. Each of these faculty holds regular group meetings, some jointly with other faculty. Within each group, teams of postdocs, graduate students and/or undergraduates work under faculty guidance to tackle a tremendous range of problems. As an example, we include a description of the Bickel-Huang group. More group descriptions can be found in Appendix G. Bickel & Huang Research Group

Transcript of EMSW21 VIGRE Report Department of Statistics University ...nolan/vigre/VIGRE_Report.pdfcomponent...

  • EMSW21 VIGRE ReportDepartment of Statistics

    University of California Berkeley

    I. How well has the integration of research and education been achieved at alllevels?

    A. Integration of research at all levels

    In the past two years, we have successfully followed the Braid Model of vertical integrationto create a healthy research environment for postdocs, graduate students andundergraduates. The Braid Model was described below at the VIGRE workshop in April 2003:

    This approach sees each level following a time-line, and finds points where the time-lines knit together. At one point a postdoc might be supervising an undergraduateresearch project; at another, postdocs, advanced graduate students, and facultymight be collaborating in running a topics seminar; at another, faculty might begiving a series of introductory talks for beginning graduate students; at another,advanced graduate students might be preparing beginning graduate students forqualifying exams. Full vertical integration of all four groups might never occur at thesame moment, but over time the strands knit together producing a verticallyintegrated career path for each group.

    Research in the department has happened in different forms. Some faculty take the groupmeeting model from biology and engineering, while others keep the one-one supervisingstyle. No matter what form each individual faculty takes, in addition to interactions amongsubsets of faculty, postdocs, graduate students and undergraduate students, there are oftencollaborators involved from other disciplines, forming another line of vertical researchintegration beyond statistics department. Moreover, the VIGRE funding has enabled jointadvisors more easily because VIGRE student has guaranteed support and faculty's energycan be spent on research instead of looking for funding for a student or not taking a studentdue to limited grant money. In other words, the VIGRE grant has contributed greatly to theemergence of different research group styles, and provides VIGRE students with theintellectual flexibility that allows them to take on more exciting topics which wouldotherwise not be possible.

    A.1 Research Groups in Computational and Genomic Biology

    The department has attracted in recent years outstanding young faculty (Huang, Nielsen,Song, Purdom) in computational and genomic biology to to join our senior faculty (Speedand Bickel) in this area. All of these faculty are active members of the program ofDesignated Emphasis in Computational and Genomic Biology. Each of these faculty holdsregular group meetings, some jointly with other faculty. Within each group, teams ofpostdocs, graduate students and/or undergraduates work under faculty guidance to tackle atremendous range of problems. As an example, we include a description of the Bickel-Huanggroup. More group descriptions can be found in Appendix G.

    Bickel & Huang Research Group

  • Bickel and Huang's research interests lie in developing novel statistical approaches to solveproblems in genomics. This group meets weekly, for two to three hours, to discuss results,appropriate methodology, and relevant papers. In addition, sub-group meetings are oftenheld based on individual projects' needs. Every group member leads or co-leads at least oneproject, with the support from all other group members, under the general guidance ofPeter Bickel and Haiyan Huang.

    This group currently has three postdocs: Ben Brown (protein-DNA binding model; mRNA-seq reads mapping; software for genome structure correction (GSC) statistics), Qunhua Li(significant thresholds for high-throughput biological data using self-consistency betweenreplicates), and Ci-Ren Jiang (identifying gene isoforms using mRNA-seq data and relatingin vitro to in vivo assays; hierarchical multi-label classification for disease diagnosis); oneundergraduate research assistant, Nathan Boley (protein-DNA binding model; mRNA-seqreads mapping; software for genome structure correction (GSC) statistics); three PhDstudents, Jessica Li, Biostatistics (Identifying gene isoforms using mRNA-seq data andrelating in vitro to in vivo assays), Kyungpil Kim, Biostatistics (pathway extension andpathway genes identification), and Daisy Huang, Statistics (robust variance estimation forgene expression data); and two visiting students, Jing Zhou (transcriptome detection usingtiling array data) and Qinghui Gao (database query for functionally related genes).

    A. 2 Machine Learning Groups

    Machine learning is at the frontier of statistical research. It has become a strong focus ofthe department, with research group consisting of faculty in statistics alone as well as jointfaculty with EECS (Bartlett, Bickel, Jordan, Wainwright and Yu). These faculty are allparticipating in the Designated Emphasis on Computation, Communication and Statistics. Allof the faculty hold regular group meetings and they have also been co-advising students.Judging from interests of prospective graduate students, the machine learning group isattracting some of the very best students to apply to the department. These students tendto accept our admission offer after meeting faculty during their visits. To increasecommunication and to bond the group together, a machine learning tea was establishedabout a year ago. Every Friday, students in machine learning from EECS and statisticsgather for tea, and one student presents research. We include a description of the Yu Grouphere as an example. Appendix G contains more descriptions of machine learning groups.

    Yu Research Group

    Bin Yu's group at the moment consists of 7 students, of whom 3 are supervised jointly byYu and other faculty, and 1 postdoc. Two students graduated in 2009 (Rocha is now anassistant professor at Indiana Univ and Vu is an NSF postdoc at CMU) and one postdocRavikumar is now an assistant professor in CS at UTexas -Austin. Yu meets with herpostdocs and students during a 1.5 hour group meeting every week. Four to six studentsfrom outside the group also attend regularly to check out the group or just participate in thegroup meetings. For example, first year student Miles Lopes, second year students JamesLong and Sharmodeep Bhattacharyya, and fourth year student Ying Xu are attendingregularly. The group meeting is used to learn about a new topic such as independentcomponent analysis, and students take turns to make presentations. The meeting time isalso used to report to each other recent research results, and to ask for feedback andpractice for conference talks and oral exams. Yu also meets with students individually eitherweekly or biweekly. Some of the students have joint advisors, so regular meetings arearranged so that both advisors can be present. Students often work with postdocs in a team

  • or graduate students with with undergraduates. Yu is also a PI of a CDI grant, so there is aregular group meeting on that project where there are postdocs, graduate students fromstatistics and EECS. Two EECS undergraduates worked on this project last year. Students inYu's group are encouraged to attend at least one professional meeting per year, and 5 outof 7 students went to a meeting last year. Several of the students presented their research.Half of the group has gone to China in the summers to work with Yu and her collaboratorsthere. There have been six research teams of vertical integration in the Yu Group incollaboration with other groups on campus. We include 3 of them below:

    Postdoc Pradeep Ravikumar and graduate student Garvesh Raskutti under the jointsupervision of Martin Wainwright and Bin Yu: high dimensional sparse Guassian graphicalmodel estimation.

    Postdoc Jinzhu Jia, graduate student Luke Miratrix, graduate student Brian Gawalt (EECS),under the joint supervision of Bin Yu and Laurent El Gaoui (EECS): StatNews Project,supported by NSF-CDI grant, to understand "word image" from corpus of news articles suchas those in New York Times. This is an example of interdisciplinary vertical integration.

    VIGRE graduate student Harry Kim and undergraduate Ryan Garner, under the jointsupervision of Cari Kaufman and Bin Yu: analysis of global spread of avian influenza(H5N1).

    A.3 Other Research Groups

    Many of our faculty (Aldous, Brillinger, Cheng, Chatterjee, El Karoui, Evans, Kaufman,Mossel, Nolan, Pitman, Rice, Stark) meet with their students individually, but for eachproject, there are often teams of students either involving postdocs and graduate students,or graduate students and undergraduates, with faculty supervision. Appendix G containsresearch descriptions of these faculty.

    B. Integration of education at all levels

    In the research groups described above, learning happens at all levels through discussions,presentations, reading literature, and interactions between faculty, postdocs, graduate andundergraduate students. Integration of education also happens across vertical levels withcollaborators' groups. Postdocs, and graduate students are an important part of formaleducation in the classrooms -- they instruct courses and work as teaching assistants inundergraduate courses through discussion sessions and grading of projects. Courses taughtby postdocs are assigned very carefully to match research interests of the postdocs. VIGREpostdoc Ross is now teaching an M.A. level introductory probability/statistics course. He ismentored by our experienced faculty Prof. Ani Adhikari. Last year, VIGRE postdoc EmiliaHuerta-Sanchez went to an experienced faculty Prof. Purves when she had teachingquestions. Based on feedback from Huerta-Sanchez and Ross, we are planning to organizean information sheet on logistics of teaching for first-time instructors.

    Departmental seminars have been an important venue for faculty, postdocs, graduatestudents and undergraduates to gather and exchange ideas with speakers and amongthemselves. At the beginning of every academic year, selected faculty give short talks ontheir research interests, aimed particularly at first and second year graduate students. InFall 2008, the presentations were:

    David Aldous described some of his work on spatial networks.

  • David Brillinger presented methodology and results from a study of the relationshipsbetween input and output nerve firings.

    Lisa Goldberg discussed the role of statistics in quantitative finance.

    Nick Jewell's talk introduced the students to a selection of recent applied problems in theDivision of Biostatistics.

    Michael Jordan described some of his recent research on hierarchical nonparametricBayesian methods.

    Rasmus Nielsen presented some of his work in statistical genetics.

    Ken Wachter showed how empirical biodemographic research has revealed commonfeatures in age-specific mortality rates of diverse species including humans, flies, andworms.

    Bin Yu gave an overview of three main research directions in her group: long-terminterdisciplinary research on remote sensing (cloud detection and aerosol retrieval, withJPL); vision neuroscience (with the Gallant group on campus); and sparse modeling (theoryand algorithms) for high dimensional data.

    These "Faculty Research Lectures" are an effective way of introducing new students todetails of work being done in the department, in language that they can understand. Thishelps them make appropriate choices of research areas and supervisors, and has beeninstrumental in students starting research very early, even as early as their first year in thePh.D. program. Based on feedback from past students, the lectures are being delayedslightly this year, towards the end of the first semester rather than right in the beginning.This gives students a chance to settle into their work in the department and then focuscalmly on research areas.

    The department currently has five weekly seminars to cover different areas of activeresearch and to meet : Neyman seminar (Wed 4-5, statistics), Probability seminar (wed3-4), genomics seminar (Thurs. 4-5), student seminar (Fri. 3-4), and VIGRE undergraduateseminar (Thurs. 11-12). Postdocs and graduate students are invited to dinner with thespeakers, and the department covers the expenses of the graduate students.

    The VIGRE undergraduate seminar is especially worth noting since it is a direct consequenceof the VIGRE funding and can be taken by undergraduates for credit. In this series, studentsare exposed to real world applications of statistics, and information on graduate schools andjob seeking skills. A team of postdocs and graduate students organizes this series everysemester. This semester, postdoc Emilia Huerta-Sanchez is working with graduate studentMegan Goldman to lead this seminar series. A list of talk titles is included in Section II ofthis report. This seminar has also worked as a great recruitment tool for attracting studentsinto statistics graduate programs, because statistics graduate programs are fed by not onlystatistics undergraduates, but in large numbers by math undergraduate majors and nowEECS, economics and other undergraduate majors. Co-organization also happens betweenfaculty. In Fall 2008, new faculty Kaufman organized the Neyman seminar jointly with BinYu. The probability seminar this year is being organized by the team of Prof. Klass, VIGREpostdoc Ross, and VIGRE graduate student Chris Haulk. The role of organization gives juniorfaculty, postdocs and students an opportunity to meet senior and junior researchers of theirchoice to exchange ideas and to form networks for possible collaborations and futurepositions.

  • Many faculty participate in the the VIGRE Undergraduate Research Apprenticeship program.During the academic years 2007-09 and 2008-09 about 20 undergraduates held VIGREResearch Assistantships. Detailed descriptions of how undergraduates are included infaculty research programs appear in Section II of this report. As one example of thebroadening of the education of undergraduate participants in this research program, VIGREundergraduate Josh Levin joined Stark's Uncertainty Quantification group worked on thedesign of inertial confinement fusion experiments last year, in collaboration with Dr.Stephen Libby of LLNL. Levin participated in the regular group meetings with graduatestudent Johann Gagnon-Bartsch, postdoc Janne Huttunen, Philip Stark, and Libby. As aresult, he was particularly inspired by the project and had separate meetings with Libby totalk about careers in mathematical science and physics and internship opportunities at theNational Laboratories. Recent graduate VIGRE projects are listed in Section III. Below is alist of recent Undergraduate VIGRE projects. See Appendix F for project descriptions.

    Getting Started with RDevon Schurick, John Jimenez (Ani Adhikari, John Rice, and Phil Spector)

    Uncertainty Quantification with applications in Inertial Confinement FusionJosh Levin (Stark)

    StatNews ProjectPaul Pearce, Hisham Zarka (Bin Yu and Laurent El Ghaoui)

    Statistical analysis on global spread of avian influenza (H5N1)Ryan Garner (Cari Kaufman, Bin Yu, and graduate student Harry Kim)

    Statistical modeling of DNA selection protocolsOlga Prilepova (Rasmus Nielsen)

    ENCODE and the Berkeley Drosophila Transcription Network ProjectDevon Schurick (Peter Bickel and Postdoc Ben Brown)

    Simulating Models of SpeciationAlec Kennedy, Diana Su (Grad Student Peter Ralph)

    Counting Civilian Casualties in Times of War and ConflictRebecca Andreassen (Nicholas P. Jewell)

    A glimpse into the impact of undergraduate VIGRE research experience is offered by thefollowing comments from Hisham Zarka, now a business analyst at McKinsey:

    "I had the opportunity to work on the StatNews project in developing a system for thestatistical analysis of news data which was funded by VIGRE. The project was my firstexperience on a research project, and the exposure that I obtained in using statisticaltechniques in a comprehensive and scalable manner was unparalleled. Despite the courses Ihad taken on the subjects of computer science and statistics, I felt that the courses did notexpose me to the real nature of computational statistics as practiced in research and inindustry. In particular, the programming and tools involved in developing and deployingtechniques at scale, rather than in the artificial setting of the classroom, were quite differentand understanding them offered a substantial number of unique learning opportunities. Ifurthermore found talking about concrete and practical experiences from the VIGRE projectto be much more compelling during interviews for jobs than discussions about courseworkand class projects of more limited scope."

  • A new undergraduate topic course series under the course number 157 has been offeredregularly and recent topics include Probability in the Real World (Aldous), Statistics andFinance (El Karoui), Statistical Bioinformatics (Huang), and Bayesian Inference (Kaufman).See Section II for details.

  • II. How is your EMSW21 program broadening education at all levels?

    Over the duration of the EMSW21 program, the department has continued to broadeneducation at all levels. At the undergraduate level, the department regularly offers itsadvanced undergraduate topic seminars (Stat 157) which introduce students to modernresearch and real-world problems, and many of our advanced undergraduate courses havebeen revised to include more experience with real data problems and modern methodology.The VIGRE undergraduate seminar series, which introduces students to the wide array ofcareer possibilities for statisticians, has also become a stable part of our course offeringsand attracts students from many disciplines as well as students who have yet to declare amajor.

    At the graduate level, we have broadened education through the active participation offaculty in three Designated Emphasis (DE) programs (see Appendix D for descriptions of allthese DEs), and through a wide array of graduate topic course series (260). We have alsobroadened the education of our graduate students through engagement in projects in thesummer between the first and second years of the program and through self-organizedreading groups.

    The VIGRE undergraduate and graduate students and postdocs all participate in a variety ofoutreach activities, including the university's annual open house, field placements in localelementary, middle, and high school mathematics classrooms, and summer researchprograms for undergraduates.

    All of these VIGRE activities are described in more detail below.

    VIGRE Undergraduate Seminar Series: Undergraduate students are exposed to theworld of mathematics and statistics outside the traditional academia through the VIGREUndergraduate Seminar. The purpose of this seminar series is to give students a view ofcareer possibilities in statistics, including areas of application in academia, government, andthe private sector. Speakers include faculty, postdoctoral fellows, and statisticians workingfor government and in the private sector. In Fall 2007 the seminar was organized by VIGREPostdoc Nicholas Crawford and by graduate student David Purdy. In Fall 2008 it wasorganized by VIGRE Postdocs Nicholas Crawford and Emilia Huerta-Sanchez. In Fall 2009the seminar is being organized by Postdoc Emilia Huerta-Sanchez and graduate studentMegan Goldman. To illustrate the content, some of the topics in Fall 2007, 2008, and 2009are:

    Considering graduate school.Colette Patt, Director, Science Student Diversity Programs, UC Berkeley

    Data, Data, Everywhere: Learning from the Web to Improve the Web Search.Carrie Grimes, Google

    A Day in the Life of a Consulting Actuary.Mike Tessler, Health & Welfare consultant, Vickie Sun, Retirement Associate, Towers Perrin

    The Power of Shame and the Rationality of Trust.Steven Tadelis, Haas Business School, UC Berkeley

    What do we know about the average length of human life throughout the world, and how dowe know it?John Wilmoth, Associate Professor, Department of Demography, UC Berkeley

  • Graduate Student Panel: Graduate students in Statistics and Biostatistics will share theirinsights and perspectives

    A Mathematician's Life at RAND.Lauren Caston, RAND Corporation

    We Measure America: the Census Bureau in the 21st Century.Linda Clark, Information Services Specialist, US Census Bureau

    The Role of Statisticians in Quantitative Finance Before and After the Crash.Lisa Goldberg, Department of Statistics and MSCIBARRA

    Phylogenetics: Using mathematics and Statistics to deduce evolutionary history.Erick Matsen, Postdoc, Department of Statistics, UC Berkeley

    Data Analysis at Facebook.Alex Smith, Facebook.

    Network models and optimal experimental design.Fergal Casey, University of College Dublin

    Opportunities in Financial Engineering.Linda Kreitzman, Executive Director, Haas Business School, UC Berkeley

    What is the difference between bioinformatics, computational biology and mathematicalbiology?Lior Pachter, Associate Professor, Math and CS Departments, UC Berkeley

    Influenza Vaccine: Does it Work?Art Reingold, UCB Epidemiology

    Undergraduate Special Topics in Statistics: The undergraduate special topics seminar,STAT 157 (which was introduced in the curriculum review under the first VIGRE award)offers another avenue for undergraduate students to be exposed to non-traditionalstatistics. This seminar-style course offers contemporary topics that are chosen by theinstructor and that are not part of a traditional undergraduate curriculum. It features topicsof current research interest and more student research and presentation than typical for anupper division class and thus has some REU-like character. We have developed a traditionof having junior faculty and postdocs lead these courses. In this way, the seminar offers anopportunity for these new faculty and postdocs to improve their instruction skills by bringingtheir research into a non-traditional undergraduate classroom. Brief descriptions of severalStat 157 classes follow:

    Computational Biology and Statistics (Huang): This course provides an introduction tostatistical and computational methods for the analysis of biological data. Statistical topics,introduced in a biological context, include sequencing theories, multiple hypothesis testing,clustering analysis, hidden Markov model, matrix decomposition, etc. The course alsointroduces computational techniques and involves critical reading of articles related toanalyses in the biological and medical science. The class is designed to promote grouplearning and requires active participation in class discussions as well as requiring teams ofstudents to present homework solutions and projects. Besides gaining a good grasp ofcourse materials, I hope the course is also helpful to improve the students' teamwork spirit,

  • as well as their independent analytical, reasoning and communication skills.

    Probability in the Real World (Aldous): Features student research and presentation. Someillustrative topics are: Waiting in long lines and editing long documents; Prediction markets;Mathematics of card shuffling; Anchoring, Probability matching, conservatism and base ratediscounting, conjunction fallacy; The long term and the Kelly criterion; one safe and onerisky asset.; Coincidences in Wikipedia; the Netflix Prize; Coding as compression orencryption; asymptotic equipartition property; Social Networks and the Diffusion ofEconomic Behavior; Epidemic and product adoption models over social networks; Acomparative analysis of influenza vaccination programs; Global Catastrophic Risks; Streakyhitting in baseball.

    Statistics and Finance (El Karoui): The aims were a) to introduce students to various usefulstatistical tools through financial data/motivation. b) Complete their toolbox/ give thempointers before they go out and work. c) Give them some experience with real data. Tried tointegrate data and models/theory, lead students to think critically about statistical methodsand models in light of data. Emphasis on interaction data/models and thinking about data.Topics included: ARMA, GARCH, Black-Scholes “world” (short intro to stochastic calculus;binomial trees; notion of implied vol), Markowitz theory, CAPM and multifactor pricingmodels, PCA and Factor analysis, Brief intro to Risk Management (VaR, Expected Shortfall;pitfalls of heavy tails), Copulas, Non-parametric function estimation methods (high-levelintro and applications to non-parametric option pricing a la Lo et Al.). Student researchtopics included: Pairs trading; Multivariate Time Series and dimension reduction; Black-Litterman model; Option pricing with GARCH models; Markowitz and beyond; Algorithmictrading strategies; American Options; Smoothing in time series; Finite difference methods;Behavioral finance and SP500; Weather derivatives; Interest rates questions; Monte Carlomethods in finance.

    Bayesian Inference (Kaufman): This is a seminar course on statistical inference from aBayesian viewpoint, with an emphasis on computation. The Bayesian approach to statisticshistorically predates the "classical" or frequentist statistical methods you may have seen inother classes, but it did not gain widespread popularity until the introduction of newalgorithms for sampling-based numerical integration, which have made it possible to fitmore complicated Bayesian models. My main goals for this course are to help you gain asolid understanding of the basic Bayesian approach to inference, based on expressing alluncertainty in terms of conditional probability distributions; to introduce you to commonpractice in fitting and interpreting Bayesian models in applied problems, includinghierarchical model specification and computational techniques, to give you practice inreading statistics articles from the literature and presenting statistical ideas to others. Forthe final project, the class will divide in teams and work on a randomized response survey.

    THE BOOK (Evans): This will be a seminar style course in which we will work through thebook Proofs from THE BOOK by Aigner and Ziegler. I can give no better description of thiswonderful book than the review by E.J. Barbeau from Mathematical Reviews.

    “Paul Erdös maintained that God kept a Book with only the most elegant mathematicalarguments. This volume, conceived in consultation with Erdös and published in his memory,suggests some of the Book's contents. Thirty sections treat results drawn from numbertheory, geometry (mainly combinatorial), analysis, combinatorics and graph theory; thesecan be followed by one versed in undergraduate mathematics including discrete topics. Theproofs date mainly from the entire span of the twentieth century; many are due to Erdöshimself. The authors have done a fine job of arranging diverse material into a thematicprogression. Many readers will find unfamiliar results along with some old favorites: a

  • decade-old proof of Fermat's two-square theorem, Hilbert's equidecomposible polyhedraproblem, Sylvester's problem on lines determined by pairs of points, applications of Euler'sformula V - E + F = 2, maximizing the number of touching pairs of d-simplices, Hall'smarriage theorem, a Dinitz coloring problem for an n x n chessboard, the art gallerytheorem, probabilistic combinatorics, and much more.”

    I will then divide the class into groups and assign them readings to present on specifieddates. Be prepared to put in a lot of time understanding for yourself what is in the chaptersyou will be assigned, possibly bolstering your mathematical background knowledge by doingsome quite extensive background reading, and putting together a high quality presentationfor class.

    Revisions of Upper Division Courses: Under our first VIGRE award, the Departmentintroduced a new core course to the undergraduate major program, Stat 133 Concepts inComputing with Data. This course has gained in popularity to the point where it is nowbeing offered every semester with enrollments of about 75 students. In addition, studentswho have taken Stat 133 now have the computational skills that enable them to participatein a broad range of research projects, and the success of Stat 133 has enabled thedepartment to revamp some of its traditional courses, such as Stat 151B Linear Modeling:Theory and Applications, to include more modern, real-life statistical methodology andapplications. Below are two recent course descriptions for Stat 151B.

    Modern Applied Statistics and Machine Learning (McAuliffe, Spring 2010). This upperdivision course will be offered in spring 2010 by our new adjunct professor Jon McAuliffe.The course will cover contemporary methods of statistical prediction as extensions ofclassical methods. It will emphasize computing with data. In particular, it provides anoverview of modern applied statistical methods, including supervised learning, linearregression and classification, splines and basis expansions, the bootstrap, model selection,classification and regression trees, boosting and stage-wise additive models, and supportvector machines. This course is planned to be a good stepping stone for a job in statistics.

    Linear and Generalized Linear Models with Data (Yu, Fall 2007). This course is about findingrelationships between variables, continuous and discrete, for the sake of prediction andpossibly interpretation. There are many examples where these relationships are sought, e.g.How are the stock prices and other economical variables related to the stock pricetomorrow? How do we tell white clouds apart from snow/ice in the polar regions(classification of pixels into cloudy or not)? How does the song a young songbird listens torelate to the neuron firing spikes in its brain? Answers to these questions will be exploredthrough data analysis projects. In general, students will learn how to look at data, how toform a model, how to carry out inferences under a model, and how to check the modelvalidity, among many other things. The topics covered include: prediction error, uncertaintymeasure, bootstrap, cross-validation, exponential distributions, generalized linear models,link functions, normal linear regression, logistic regression (for classification), Poissonregression, log-linear models and many more.

    Students who took the class reported later that they used their data projects from this classin their job interviews and this data experience was crucial for getting the job. One studentin this class, Dan Nguyen, is now a PhD student in Quantitative Marketing at University ofChicago after working. When asked about the course, he wrote:

    "In my first job, I built statistical models that estimated demand in various consumermarkets. Because of the modeling experience that I had gained from doing the

  • projects in 151b, I was able to give opinions and participate in discussions aboutmodel selection and data issues with higher management that others in my cohortcould not. Now, I am currently a first-year PhD in Quantitative Marketing, and I amstill benefiting from the class because it has given me a solid foundation for evaluatingand conducting empirical research. After graduating from Berkeley, I have probablyused the material from 151b more than from any other class that I had taken as anundergraduate."

    The First Summer for Graduate Students:The summer between the first and second years in the graduate program has greatlybroadened the training of our graduate students. Before VIGRE, the students spent the firstsummer preparing for exams that were based on first year coursework and were held inAugust. Now, they are expected to participate in research, enroll in a reading course orsummer course to help prepare them for research, or otherwise engage in a statistics/probability endeavor. Below are examples of what our current second year students did lastsummer

    Yuval Bejamini worked with Bin Yu and Gallant Neuroscience Lab on regularized predictionof fMRI from natural images. He also worked with Terry Speed on copy numberregularization in next-generation sequencing, and participated in a reading group onrandom matrix theory.

    Sharmodeep Bhattacharyya attended the Machine Learning Summer School at theUniversity of Chicago. In addition, he participated in a weekly reading group on randommatrix theory, and started research on theoretical foundations of clustering under PeterBickel. Sharmodeep was also a full-time teaching assistant for a summer course, STAT 21.

    Tessa Childers-Day took a 3 unit reading course with Lisa Goldberg in financial statistics. Inthis course, Tessa , familiarized herself with the terminology of the field, learned about thetheory behind asset management, specifically mean variance optimization of portfolios, andput these theoretical concepts into practice, by selecting a small group of stocks, gatheringreal world data about them, and attempted to optimize the return of the portfolio. Tessaalso took an informal reading course with Jim Pitman on measure theory to prepare forSTAT 205A this semester.

    Brianna Hirst spent the summer in an informal reading course with Nick Jewell with thegoal of finding a research topic. She read and discussed papers and books from a widevariety of topics in biostatistics. Additionally, Brianna spent a week working with Deb Nolanas a teaching assistant for the undergraduate Exploration in Statistics Research workshop.

    Winston Lin studied measure theory and real analysis with Jim Pitma to prepare for Stat205A, and worked with Jas Sekhon on propensity-score weighted regression estimates oftreatment effects. Most of the work with Sekhon was reading papers, but he also replicatedand extended some Monte Carlo simulations from a paper by David Freedman and RichardBerk.

    Luke Miratrix was an intern with Genentech, a biotechnology company located in South SanFrancisco. Miratrix's project had to do with personalized medicine---a branch of drugresearch that attempts to identify subpopulations that would benefit, or benefit most, froma drug. The project was to design statistical tests that could simultaneously assesswhether a given drug were effective and also identify a cut-off point for the expression of aparticular protean in biopsies of tumors that would determine a subpopulation which had atleast a pre-specified level of benefited from the drug.

  • Reading Groups: In our efforts to broaden education, faculty, postdocs, and graduatestudents have started to regularly organize reading groups. These reading groups are aviable means to expand and supplement our regular course offerings. As described above,reading groups are part of the group meetings for many research groups (see thedescriptions of Bartlett, Jordan, Wainwright and Yu groups in Appendix G, for example).

    There are also other reading groups which are coordinated between multiple facultyadvisors and/or bring together students from different research groups. For example, thissemester Evans is running a reading group on mathematical finance. It is attended byJoshua Abramson (Statistics), Chris Haulk (Statistics), Alexandru Hening (Mathematics),Douglas Rizzolo (Mathematics), and Eric Wayman (Mathematics). Hening and Wayman areEvans' PhD students, Abramson is a first year student for whom Evans is the assigned first-year mentor, Haulk is Pitman's PhD student, and Rizzolo is also working with Pitman. Thisreading group is working through a large volume of collected articles, "The Mathematics ofArbitrage", by Freddie Delbaen and Walter Schachermayer as well as covering some of thebackground stochastic analysis for discontinuous process from sources such as "StochasticIntegration and Differential Equations" by Philip Protter.

    In Spring 2009, Pitman held a reading group with 6 students, where they went through thelater chapters of Kallenberg's Modern Probability Theory as an Introduction to StochasticCalculus. It was very successful as evident from students' comments:

    "Clear exposition style, great intuition and very friendly! One of the best instructorsI've ever had","Prof. Pitman was very good at motivating the material and explaining how to thinkabout solutions""Very knowledgeable, and excellent at explaining the intuition of the subject andmaking the more difficult concepts of the textbook intelligible."

    In a previous semester, Pitman organized a similar sized group including Chris Haulk, TanyaGordeeva, and Aaron Chen, and they worked through much of Karatzas and Shreve'sBrownian Motion and Stochastic calculus.

    In the past summer, postdoc Jinzhu Jia, with help from Yu on choosing reading materials,organized a reading group of 12 people on concentration inequalities by reading Boucheron,Lugosi, and Bousquet’s paper “Concentration Inequalities” and Ledoux’s Lecture notes,which begins with Gaussian Random Variables and gives a very useful result for Lipschitzfunction. The group then moved on to read Chatterjee's papers on Stein's method andreading Vershynin’s lecture notes on “Non-Asymptotic Theory of RandomMatrices”. Yu's students (6 of them) attended the reading group regularly along with5 other students (Bean, Bhattacharyya, Lee, Long, and Xu). When asked to comment onthe experience, two participants wrote:

    "Because we shared the responsibilities of presenting, everyone stayed up to speed onthe readings. This produced enlightening discussions. Summer is a difficult time toorganize a reading group because people are often away from campus for extendedperiods. However, I think this reading group overcame that obstacle. It was a goodexperience that I think everyone involved has benefited from."

  • "I like this reading group very much. From the reading group, I learned a lot abouthow to bound a random variable, especially about how to bound the eigenvalues of arandom matrix. I used some of the results to bound the eigenvalues of a randommatrix in a recent paper."

    Designated Emphases (DEs): The Designated Emphasis (DE) Programs act as "graduateminors" to PhD students to broaden their education from a single discipline to multi-disciplines with a structure well-thought out by participating faculty. Multi- orinterdisciplinary training is necessary for current developments of frontier areas of science.By training our students and postdocs through the DE's, we better prepare them for theirfuture postdocs or regular job/faculty/research positions.

    The new Designated Emphasis in Computational Science and Engineering was initiated twoyears ago and eleven Statistics faculty are among the 108 faculty from 20 departments toparticipate in this exciting program. A great many fields of science, engineering, financeand social science are embracing modeling, simulation, and data analysis as necessary toolsto advance their fields. Sometimes this is driven by the march of Moore’s Law providingcomputational power that makes simulations possible that were not possible before; it isalso driven by the availability of large data sets. This new DE-CSE gives a necessaryopportunity for PhD students enrolled in these 20 programs to become fluent in modeling,simulation, and data analysis tools. The participating departments are extraordinarilydiverse and include Computer Science, Mathematics, Statistics, Chemistry, MechanicalEngineering, Astronomy, Neuroscience and Political Science, among many others. Amongother things, through this DE-CSE, our participating faculty and their students have accessto the UCSD Triton Resource that is a new high-performance, data-centric computerresource housed at the San Diego Supercomputer Center.

    Special Topics in Statistics: Broadening at the graduate level has resulted in newroutinely offering special topics courses. We include descriptions of four recent offerings asexamples of the variety of topics.

    Bayesian Modeling and Inference (Jordan): This course fills in a gap in our graduatecurriculum and was welcomed by graduate students from statistics and EECS and otherdepartments. The course covered priors (conjugate, noninformative, reference), hierarchicalmodels, spatial models, longitudinal models, dynamic models, survival models, testing,model choice, inference (importance sampling, MCMC, sequential Monte Carlo),nonparametric models (Dirichlet processes, Gaussian processes, neutral-to-the-rightprocesses, completely random processes), decision theory and frequentist perspectives(complete class theorems, consistency, empirical Bayes), and experimental design.

    Algebraic Statistics (Sturmfels): This course sent an invitation to algebraic statistics. Thelist of topics included: independence and hypotheses testing, the many bases of a lattice,hierarchical models, likelihood inference for discrete models, likelihood inference forGaussian models, MLE for implicit models, discrete CI models, Gaussian CI models, theintersection axiom, Graphical models, Hammersley-Clifford Theorem, likelihood ratio tests,mixture models, secant varieties, factor analysis, information criteria, marginal likelihood,phylogenetics: the general Markov model, generalized principal component analysis,phylogenetics: Group-based models.

    Computational and Mathematical Population Genetics (Song): The course is divided into twoparts: coalescent theory and DNA forensic analysis. Topics included: recombination,selection, demography, importance sampling, sampling formulas, DNA match probability,

  • relatedness, and weight of evidence.

    Applications of Group Representation Theory to Probability and Statistics (Evans): Thetopics included random matrix theory, random walks on groups, diffusions on Lie groups,and spectral representations of stationary random fields on groups and homogeneousspaces.

    Public Outreach: Cal Day is an annual university open house, where prospective students,alumni, and community members visit the campus to meet the faculty and students andfind out about the research and other activities going on. Our undergraduate researcherspresent their results in a poster session on Cal-Day. Undergraduates are also recruited toassist in running booths that graduate students and postdocs have set up. These boothspresent games and research topics of general interest, the spring 2009 topics included thefollowing.

    Statistical Methods in Biology; Emilia Huerta-Sanchez (Postdoc)Learn about using statistics to find a close correspondence between genetic and geographicdistances among Europeans.

    Randomness; Chris Haulk (Graduate student)What is randomness? Can you create it? And can you recognize it when you see it? Thisposter will attempt to provoke thought on these questions.

    The Bell Curve; Karl Rohe (Graduate student)Measure the length of your pinky, toss a bean bag at a target-Discover what these activitieshave in common.

    Brain Teasers and Probability; Nate Coehlo (Graduate student)Monty Hall, The Secretary Problem, Loaded Revolver. Find out how probability can helpanswer some perplexing problems.

    Random Walks Through the Integers; Moorea Brega, Daisy Huang (Graduate students)Choose Your Favorite Number, and It Will Show Up within a Limited Number of Steps on aRandom list.

    Prediction; David Purdy (Graduate student)Predictive models are everywhere, from gambling to medicine to marketing, learn moreabout how you're already using them.

    Statistics Undergraduate Research Poster Session; Rebecca Andreassen, Ryan Garner, AlecKennedy, Josh Levin, Olga Prilepova, Devon Shurick, and Diana Su (Undergraduatestudents)Undergraduates discuss their experiences and in-progress research through the VerticalIntegration of Research and Education (VIGRE) program.

    Photos from Cal Day:

  • In addition to Cal Day, Nolan and graduate students performed a coin flipping activity onRadioLab for a story on stochasticity - "This hour, RadioLab examines Stochasticity, which isjust a wonderfully slippery and smarty-pants word for randomness. How big a role doesrandomness play in our lives? Do we live in a world of magic and meaning or … is it all justchance and happenstance? To tackle this question, we look at the role chance andrandomness play in sports, lottery tickets, and even the cells in our own body."http://www.wnyc.org/shows/radiolab/episodes/2009/09/11

    Also in the realm of public outreach, Stark worked with PhD students Mike Higgins, LukeMiratrix, and Sean Ruddy to develop methods for auditing elections to assess the evidencethat the outcome is correct. As part of this work, the graduate students collaborated withelections officials to audit four elections in 2008 and will be auditing three more thisNovember. They are also working with elections officials and legislators in several statesand election integrity advocates throughout the country on legislation and regulations toimprove election auditing. See “Checking It Twice,” Julie J. Rehmeyer, Science News, 19January 2008. http://www.sciencenews.org/articles/20080119/mathtrek.asp for an exampleof the sorts of news coverage Stark and his students receive.

    K-12 activities: Each fall semester, Nolan runs a seminar for undergraduates who areinterested in becoming math teachers or who want to serve the local community byvolunteering in local public schools. In the seminar, Nolan and VIGRE graduate students,introduce undergraduates to the theory and practice necessary to design and deliver high-quality inquiry-based lessons. Freshman and sophomores are placed in elementary schoolclassrooms and more advanced students are placed in middle and high school classrooms.Students spend one hour a week assisting in a local school, where they design and carry outtwo math activities in addition to helping out in the classroom. The lessons for elementaryclassrooms are chosen from nationally acclaimed modules in Great Explorations in Math andScience (GEMS). The lessons developed for high school statistics classes are made availableon the VIGRE Web site. These activities are carefully designed in that the student writes a

    http://www.wnyc.org/shows/radiolab/episodes/2009/09/11http://www.sciencenews.org/articles/20080119/mathtrek.asp

  • 5E lesson plan for the activity that includes detailed instructions on how to carry out theactivity, and it is pilot tested in the seminar where the student receives feedback from theinstructor, peers, and graduate students assistants. Once the student revises the lessonplan based on the feedback, he/she carries out the activities in the classrooms and receivesfeedback from the instructor and an external observer. Finally, the students write asummary that reflects on the classroom activity, including what worked well and what didn'twork and how they would modify the lesson in the future. In addition, through the fieldplacements, the undergraduates serve as college-going role models in their classrooms,gain experience in communicating mathematics to a broad audience, and are afforded theopportunity to explore teaching in mathematics as a career.

    Summer Programs: Deborah Nolan organized (with Mark Hansen, UCLA, and DuncanTemple Lang, UC Davis) a one-week NSF-sponsored undergraduate workshop, Explorationsin Statistics Research, June 13-20, 2009. This seven-day program brought 24undergraduate students from around the country to Berkeley to expose them to modernstatistics research with the goal of encouraging them to attend graduate school in statistics.Plenary speakers included Jasjeet Sekhon, UC Berkeley, who had students examine the2000 presidential election results from Florida; Claudia Tebaldi, Climate Central, whobrought data from 21 climate models for the students to compare predictions for globalclimate change; and Chris Volinsky, AT&T Labs, who brought data on television viewingbehavior for students to explore how to design a recommender system. The workshop wassupported in part with EMSW21 funds. Two VIGRE graduate students, Megan Goldman andBrianna Hirst assisted for the week as part of their VIGRE outreach activities. In addition,the data, code, and background materials for these topics are being prepared fordissemination more broadly.

    Career Resources: VIGRE supported graduate students constructed a suite of Web pagesthat include a variety of career resources, job postings, and other useful information.Included here is a screen shot of one page on the site/

  • III. How has your EMSW21 program improved the instruction skills andcommunication skills of students and post-docs?

    Many of the activities described in Sections I and II include components aimed at helpingstudents and postdocs improve their instruction and communication skills. We brieflydescribe these various opportunities, starting with the undergraduates, then graduates, andconcluding with the postdocs.

    Cal-Day Posters: Our undergraduate researchers gain practice in communication skillspresenting their results in a poster session on Cal-Day, an annual university open housethat attracts about 35,000 visitors. We have found this venue to be quite successful ingiving the undergraduates experience in presenting their results to the general public, andin eliciting public interest in statistics. Photographs of students with their posters appear inSection II of this report.

    Research Reports: In addition to preparing posters, undergraduate students are alsogiven the opportunity to practice their written communication skills by preparing reports ontheir research projects. Recent undergraduate research reports are posted on the VIGREWebsite. During the academic years 2007-09 and 2008-09 about 20 undergraduates heldVIGRE Research Assistantships. Reports are posted on the Web athttp://www.stat.berkeley.edu/~vigre/ and sample reports will be available at the site visit.(See Appendix F for descriptions of the projects in which students participated in the pasttwo years.)

    Undergraduate Course Presentations & Reports: The undergraduate special topicseminar Stat 157 (see Section II of this report for course descriptions) typically includes aproject component where students work either individually or in groups on a project whichis presented orally in class, or written up as a report, or both. For example, this semesterHuang expects her students to work in groups on homework and projects and present theirfindings to the class. As another example, Kaufman seminar gives her students practice inreading statistics articles from the literature and presenting statistical ideas to others, andthey work on an all-class final project. Sample student works will be made available at thesite visit. Furthermore, many of our advanced statistics courses have been modified toinclude similar project work. See for example the course descriptions for Stat 151B inSection II. As another example, the survey sampling class routinely includes a whole-classproject where students design, carry out, and analyze the results of a survey.

    Graduate Student Seminar: VIGRE graduate students organized the purchase and use ofvideo equipment to help them strengthen their communication skills. This equipment isused in the Graduate Student Seminar, which is organized by the Statistics GraduateStudents Association.

    Graduate Reading/Research Groups: Many of the research groups' meetings includegraduate student presentations of reading materials and problems. For example, Jordanorganizes a regular reading group with the goal of strengthening students' backgrounds andto help define possible areas for research. Each week, reading materials are distributedbefore the group meeting, all participants are asked to read the material in advance, and astudent or postdoc presents the material and leads the discussion. The basic idea is toprovide an atmosphere in which attendees can teach each other, and in which ideas thatmight be difficult to understand in self-study can become easier to understand in thecontext of a group discussion. One of the effects of broadening is that several regular

  • reading groups have formed in the department as offshoots of research groups, and mostfirst year students engage in reading groups in the summer before starting the second yearin the program. These research groups are described in more detail in Appendix G and thereading groups are described in Section II.

    Cal Day: The graduate students organize and run Cal Day, where they build and staffbooths that features many fun statistics and probability puzzles. The graduate studentsstand ready to explain them in layman's terms and answer other questions that the publiccome to them to ask. See Section II for a list of the activities from the 2009 Cal Day andphotos of graduate students engaging the public with their activities.

    Presentation of research: In addition to the opportunities in the graduate studentsseminar and reading groups, VIGRE graduate students are also provided the opportunity topresent their research at conferences. The VIGRE grant enables graduate students toattend conferences and present their results. Below are accounts from graduate studentsabout the conferences they attended and the benefits they received from VIGRE travelsupport.

    Nate Coehlo, August 2007This summer I went to the Harvard-Smithsonian Center for Astrophysics for the annualmeeting of the Taiwan-America Occultation Survey (TAOS). The meeting was attended bypeople from several countries, involved with all aspects of the project. In the talks I learnedabout the instrumentation that collects the data, the photometry that turns the telescopeimages into the individual light-curves, the database structure for the light curves, and themany statistical issues that arise when addressing the scientific questions of the project. Ialso set up accounts so I can access the data from Berkeley. The meeting gave greatcontext for the data, and a clear picture of where a Statistician can contribute. I thankVIGRE for helping fund this trip.

    Chris Haulk, July 2007During the summer I attended the Park City Mathematics Institute Summer School inProbability and Statistical Mechanics. Over the course of three weeks there were six mini-courses on determinantal processes, renormalization group techniques, random matrices,2D percolation, stochastic Loewner evolutions, and random tilings. Lectures were deliveredby famous scholars from around the globe, including S. Varadhan and Fields medalistsAndrei Okounkov and Wendelin Werner. Problem-solving sessions were led by the graduatestudents and colleagues of these researchers. Guest lecturers spoke on diffusion-limitedaggregation, novel models of percolation, spin glasses, and a host of other topics at theinterface of probability and statistical mechanics. I learned a good deal about somefascinating topics of current interest and met many aspiring probabilists from around theglobe. In addition, I enjoyed a few hikes in the magnificent wilderness around Park City,Utah. This trip would not have been possible without VIGRE support.

    Peter Ralph, January 2009In January I traveled to Philadelphia, PA and Ithaca, NY to meet and collaborate with peoplethere and to give two talks. The primary reason was to collaborate with Todd Parsons andhis colleagues in Josh Plotkin's lab at the University of Pennsylvania. Todd is an expert onderiving stochastic processes as scaling limits of discrete biological systems. I learned a lotabout choosing different scaling limits to obtain different limiting processes, and got toparticipate in the ongoing analysis of a particular problem. It was also stimulating toconverse with the different members of the Plotkin lab, who are working on problemsranging from population genetics to linguistics. While there, I gave the Applied Math andComp Sci Colloquium, as well as giving two smaller expository talks to the Plotkin research

  • group. In Ithaca, I gave the Probability Seminar, and got several useful suggestions frompeople there on some current projects. The trip was a very positive experience, both interms of connecting with others working on similar problems, as well as being exposed tonew ideas.

    Vincent Vu, April 2009I attended the IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP 2009) held in Taipei, Taiwan. This year there was a special session on signalprocessing for neural spike trains---one of the application areas of my research. At theinvitation of the organizers, I and my co-authors, Prof. Bin Yu and Prof. Rob Kass (CMU),contributed a paper to the special session. The NSF VIGRE Travel support allowed me totravel to Taipei to present our paper, "Some Statistical Issues in Estimating Information inNeural Spike Trains." This was an excellent opportunity because I was able to share myresearch and meet other researchers. One interaction with a scientist from RIKEN BrainScience Institute may lead to a future collaboration.

    Postdoctoral Teaching Experience: Postdocs receive ample opportunities to improvetheir instructional skills through having primary responsibility for one course each semester.We take great care in designing a teaching plan with our postdocs that enables them toteach both courses aligned with their research interest and courses that give them broadclass room experience. For example, Nick Crawford, a probabilist, has taught the upperdivision undergraduate course Stat 134 Concepts in Probability and an advanced graduatetopics course in his field of research in Stat 206B. In addition to expand his teachingrepertoire, he also taught a basic lower-division introduction to statistics, Stat 20. Hisother teaching-type responsibilities included co-organizing the graduate probability seminarwith David Aldous and the undergraduate VIGRE seminar with a graduate student.

    Seminars: Another opportunity for the postdocs to improve their communication skills is bygiving seminars in the department and at conferences. The postdocs have a small travelbudget that enables them to attend conferences to present their research. In addition, weexpect them to give seminars in an appropriate departmental seminar series.

    Nick Crawford gave two seminars in the department: Oct 2007, Thermodynamics for meanfield quantum spin glasses; and Oct 2008, Random Currents and the Transverse Field IsingModel.

    Emilia Huerta-Sanchez gave a talk in Oct 2009, An excess of rare deleterious mutationsrevealed by deep resequencing of 200 exomes.

  • IV. What has been the effect of the mentoring programs that have beendeveloped?

    Mentoring of undergraduates by faculty as well as by graduate students takes place mainlythrough participation in activities like the VIGRE Undergraduate Research Apprenticeship.Through regular meetings with the groups conducting research, undergraduates come tounderstand the steps involved in the research process, and form personal connections withfaculty and graduate students. Collaboration with faculty, and the friendly and collegialrelationships that develop, show the faculty in a different light than in classes. The project-based work that is done in courses like our Computing with Data class, and our SpecialTopics class, also involve regular one-on-one contact between undergraduates and facultyor graduate students. The effect of these interactions is to greatly increase the self-confidence of the undergraduates, and to demonstrate to them that they have a place in theworld of statistical research. Almost invariably, the faculty research supervisors or projectsupervisors become advisors about graduate programs and careers. They are able torecommend the undergraduates for jobs and for graduate school, and theirrecommendations are much more informative than the typical letters written by instructorsin classes. When the undergraduates move to the next stage, whether in the workforce or ingraduate school, they are prepared for collaborating with senior scientists in a way thatwould not have been possible just through coursework.

    The VIGRE undergraduate seminar series offers a unique opportunity for ourundergraduates to form connections with professionals in the field. These connections haveled to job offers for some students. Most importantly, students form a direct connection withColette Patt of the Physical Sciences Dean's Office. She is responsible for increasingenrollments in graduate programs in the physical sciences, and is a regular participant inthe VIGRE seminar series. After her seminar presentation, she schedules individualmeetings with students who are interested in going on to graduate school, and makesherself available to all the others to answer their questions about graduate programs and tohelp reduce their doubts and uncertainties.

    As part of its VIGRE activities, the department has developed two mentoring programs forincoming graduate students. Mentorship is provided both by senior graduate students aswell as by faculty.

    The first part of the mentoring program consists of student mentoring via the SummerBootstrap Camp. The camp takes place over four days and gives new students a thoroughintroduction to the department (facilities as well as research and coursework) and to livingin Berkeley.

    VIGRE graduate students Derek Bean, Yuval Benjamini, Michael Higgins, Wayne Lee ran the2009 Bootcamp for entering graduate students, under the supervision of Ani Adhikari. Thecamp took place Aug 22-25. Based on evaluations of Bootcamp 2008, it was decided toprepare incoming students for the academic activities of the camp by sending them earlyinformation about prerequisite material for the three first-year Ph.D. courses. Derek Beanedited and updated the pages on prerequisite material, which had been created by VIGREstudents in past years. Incoming students were sent the link to this page in July to givethem time to think about the courses for which they were best prepared:http://www.stat.berkeley.edu/twiki/pub/SGSA/ReadingForIncomingStudents/Welcome.htmlThe Wiki of the Statistics Graduate Student Association was also used to disseminateinformation about housing in Berkeley, and incoming students had one-on-one contact withcurrent students for help and advice about finding housing. As in 2008, social activities ofthe Bootcamp included a day-long tour of San Francisco and Berkeley, and transportationand logistical help for new students moving into housing on and off campus. Academic

    http://www.stat.berkeley.edu/twiki/pub/SGSA/ReadingForIncomingStudents/Welcome.html

  • activities consisted of sessions on the three first-yearcourses, with Yuval Benjamini, Michael Higgins, and Wayne Lee providing students withcourse introductions created in consultation with faculty teaching those courses. PhilipSpector provided an introduction to the departments computing environment, withemphasis on R and LaTex. Students met with Peter Bartlett, Head Graduate Advisor, forone-on-one advising sessions. Camp activities include a social event at the home of PhilipSpector, in which new students met the extended department community.

    The main effect of mentoring via the Bootstrap Camp is that new students are able to makebetter informed choices regarding living and working in Berkeley than students could in thedays before the Camp. This is clearly reflected in three areas. First, admitted students aretold about the Bootstrap Camp and know that they will have help finding housing. Unlikeour competing institutions, Berkeley offers little by way of furnished graduate housing. Theknowledge that they will have support during the search for housing is critical to newstudents. It reduces a major source of anxiety about accepting admission to Berkeley andtherefore increases our acceptance rate. In addition, the practical move-in help offeredduring Bootstrap Camp has been rated as very valuable, in student evaluations of theCamp. Second, the detailed discussions about course offerings for new students, with adviceand curricular details being provided both from the student and faculty perspectives, helpsnew students to accurately assess their preparation and make appropriate choices. Thishelps students start out in the program with confidence, and the effect is visible in themorale of the student body as well as in the extremely low attrition rate in the program.Finally, the direct connections formed between new students and current students duringthe activities of the Bootstrap Camp have been invaluable in getting new students involvedearly in research and independent reading outside their regular coursework. Senior studentsdescriptions of their own paths in research, coupled with introductions to their supervisorsduring social activities of the Bootstrap Camp, help to break the ice between new studentsand faculty, provide role models for new students, and emphasize the point that they are atBerkeley to do research. The last three years have seen a marked increase in the number offirst-year students regularly attending meetings of the departments research groups, orapproaching faculty for supervision in reading courses.

    The second part of the mentoring program for graduate students involves interactions withfaculty, in particular with the head graduate advisor and with assigned faculty mentors.

    The head graduate advisor meets each student at least once per semester to discuss theirprogress and plans. These discussions range over a variety of topics relevant to theindividual students' current situation, including progress on coursework and plans for futurecourses, progress in settling on a research area and finding an advisor, plans for theremaining graduate program milestones (such as the qualifying exam and the dissertation),experiences in teaching, progress in research and other aspects of participation in theresearch community (such as reviewing responsibilities and presentations at conferences),career ambitions, and what to expect in navigating the job market.

    In addition, first-year graduate students are assigned a faculty mentor, who meets with thestudent more frequently. These assignments reflect the research interests of the facultymentor and the student, so that it is a natural and convenient opportunity for discussions totake place on research topics at a very early stage. This seems to have played a role in theincrease in involvement of first-year students in research group meetings. In addition,faculty mentors provide advice on planning courses, on group meetings and otherdepartmental activities that might be of interest, on potential faculty research advisors, andon other aspects of life in the department. And when first-year students face difficulties inthe program, their faculty mentor provides help and advice. The students have providedpositive feedback on this process: beyond the helpfulness of the advice, the students have

  • expressed appreciation for the opportunity to have frequent and in-depth interaction withfaculty beyond those teaching their current courses. Some have suggested that thisopportunity be extended by having more than one faculty mentor assigned. We plan toexperiment with this in the spring semester, when we will give the current first-yearstudents the option to have a second faculty mentor assigned.

    In this phase, students also receive a written performance evaluation each semester fromthe head graduate advisor. These letters are based on feedback from the course instructors,from faculty mentors, and from other faculty. Sample letters sent to students at the end ofthe first semester and the first year are included in Appendix C. Since the elimination of thepreliminary exam, one-on-one mentoring by faculty in the preliminary stages of the Ph.D.program has become a key component in identifying any problems that new students maybe having, and dealing with problems swiftly and effectively. This is a much more humaneand effective way of assessing and assisting students, compared to waiting for a year to seewhether the student would pass or fail a prelim.

    A direct consequence of one-on-one faculty mentoring of students is the broadening ofgraduate student education well beyond the courses offered by our department. Facultymentors inform students about courses of interest in other fields, and about related work offaculty in other departments. They provide introductions to faculty on campus, at theLawrence Berkeley Laboratories, as well as at the Mathematical Sciences Research Institute.This leads to a rich and broad research experience for students, far beyond what is availableto them within the statistics department alone.

    The department's VIGRE program includes an organized mentoring process for postdocs. Aresearch mentor is found or each postdoc even before the postdoc offer is made. Thismentor is found by the Chair and is typically an expert in the postdoc's main research area.The research mentor and the postdoc begin communicating well before the postdoc arrivesin Berkeley. Subsequently, the research relationship develops through regular meetings andcollaboration. In addition to research mentors, each postdoc has a teaching mentor who isavailable to answer questions about classroom teaching, developing course materials suchas handouts, homeworks and projects, exam preparation and grading, and handling studentproblems. The teaching mentors are assigned based on the type of course the postdocteaches. Teaching mentors have shared their own course materials and best practices withpostdocs new to teaching at Berkeley, and remain a resource for help and advice for theduration of the postdocs' stay in Berkeley.

    Postdocs are also supported by mentors in career development. We encourage and assistpostdocs in preparing grant applications and in preparing job talks. We are delighted thatas a result of these efforts, Nicholas Crawford applied for and received a Fullbright award,and Emilia Huerta-Sanchez applied for and received an NSF postdoc.

  • V. How has your EMSW21 program promoted recruitment into the mathematicalsciences?

    As detailed in our proposal, we have devoted considerable energy to our undergraduateprogram, and the resulting growth is proof of the success of our efforts. The following tableshows the numbers of Bachelor’s degrees by year. Non-integer numbers occur because ofthe way the university accounts for double and triple majors, and many of our majors aredouble majors. Indeed, our undergraduate major advisors actively encourage statisticsstudents to major in other fields in addition to statistics, to have a solid background in afield of application or in mathematics before entering the workforce or graduate school.Typical majors undertaken concurrently with statistics are Economics, Applied Mathematics,and Mathematics. The number of undergraduate degrees awarded is larger than thenumber of majors counted by the University – in 08-09 we counted 46 degrees awarded; 13of the recipients were women. We view this low proportion as an anomaly, as typically 50%of our undergraduate majors are women. For example, in 2007-08, 26 of 54 majors werewomen. The number of degrees awarded compares very favorably with the average ofabout 15 degrees per year that we used to award in the 1990s before we received our firstVIGRE grant. We list figures for other departments in the Division of Mathematical andPhysical Sciences for comparison.

    00-01 01-02 02-03 03-04 04-05 05-06 06-07 07-0808-09Astronomy 10.5 11.3 10.5 13.5 17.8 18.7 13.2 16.716.3Earth Planetary Sci 9.5 20.5 10.0 15.8 13.5 10.5 15.5 17.529.0Mathematics 89.5 138.3 169.2 177.8 186.5 160.0 127.0 139.7164.7Other Math Phys Sci 4.0 7.0 6.5 3.0 4.0 4.0 0.0 4.03.0Physics 40.5 60.8 67.0 53.7 66.2 57.2 49.7 50.356.3Statistics 24.8 22.0 23.7 30.8 26.5 28.2 36.7 44.331.3

    As described in Section II of this report, the department’s VIGRE Undergraduate Seminarprovides a venue for students to learn about careers in statistics and about graduateschool. Detailed enrollment figures for the VIGRE seminar are provided in Appendix B. Herewe would like to point out a very positive effect of the seminar in terms of attractingstudents into statistics. The number of students enrolled in the seminar has gone upconsiderably, from 18 in Fall 2007 to 31 in Fall 2009. More notable, however, is theproportion of seminar students with undeclared majors. This has risen from 28% (5/18) to55% (17/31) in that time period. These figures are a clear indication that the seminar is ofgreat interest to undergraduates who are trying to decide on what field to pursue as theircareer. Moreover, the seminar has a positive effect on the students’ choice of major – of the17 undeclared students currently in the VIGRE seminar, 6 have already expressed their wishto become statistics majors, and more are expected to follow.

    As detailed in Section II of this report, the VIGRE program has led to the creation orrevision of three upper-division undergraduate courses. These are Statistics 133 (Conceptsof Computing with Data), Statistics 151B (Linear Modeling: Theory and Applications; nowfocusing on statistical learning theory), and Statistics 157 (Undergraduate Seminar inSpecial Topics in Probability and Statistics). Enrollments in these classes have skyrocketedover the past few years. Detailed enrollment figures are provided in Appendix B, but a brief

  • summary is revealing. Stat 133 started out in 2004 with approximately 50 students once ayear, and has seen a massive increase in enrollment, making it steadily a class ofapproximately 75 students every semester. Stat 151B had 15 students in Spring '03, butmore than doubled to 36 students in 2006, and is expected to be full at 40 students inSpring 2010. Stat 157 also has gained tremendously in popularity. It had 12 students whenit was first offered in 2003, but currently has a class size of 32. This is proof of theresounding success of these classes in drawing students into statistics, as the majority ofenrolled students are not statistics majors.

    Unlike in most other fields, the typical applicant to graduate programs in our field tends notto have an undergraduate major in the field. This is in part because many universities donot offer an undergraduate statistics major at all. Therefore, efforts into recruitment intoour graduate programs focus as much on undergraduate majors in related fields (forexample mathematics, computer science, economics) as on undergraduate statistics majors.Our pool of potential applicants to statistics graduate programs is far larger than thenumber of undergraduate majors in statistics. It is worth noting that of the 10 first-yearstudents in our Ph.D. program this year, only two entered with a statistics degree – ofthose, one was a triple major in Statistics, Math, and Applied Math, and the other was anundergraduate Math major who did a one-year Master’s program in Statistics. All the otherswere Mathematics, Computer Science, Physics, or Economics majors. VIGRE has been acentral element in attracting students with such a breadth of backgrounds. Statistics is afield in which ideas from many fields come together in exciting ways. As part of its VIGREactivities, our department augmented its PhD program with Designated Emphasis programsin mathematics, computational biology, and statistical learning. These programs have had adirect effect of bringing to our program strong students whose primary undergraduatebackground was not in statistics. The enthusiasm and ability for cross-disciplinary graduatework in our department has exploded since the award of the VIGRE grant.

    The Explorations in Statistics Research Workshop, described in detail in Section II of thisreport, has a similar aim. Organized by Deborah Nolan (with Mark Hansen, UCLA, andDuncan Temple Lang, UC Davis), it is a one-week undergraduate program designed toexpose students from a variety of backgrounds to the excitement of modern appliedstatistics research with the hope of recruiting them into graduate study in statistics. SeeSection II for a description of the topics covered in the 2009 workshop. The program wasfirst offered at UCLA in 2005 and 2006, and has been continued for the summers of2009-2012 with NSF funding. In 2009 the program was at Berkeley. In 2010-2012 it will beat NCAR, Columbia, and UCLA. Graduate students supported by VIGRE will assist at all ofthem. The student response to the program has been overwhelmingly positive:

    "I wanted to thank you once more for the amazing opportunity to participate in theworkshop - I had a great time and gained a lot of information that will aid in myfuture decisions."

    "I want to thank you all again very much for inviting me to last week's workshop. Theexperience was absolutely wonderful, and I learned a great deal. I had manyconversations with other students in the program about how much we learned, andhow enjoyable everything was. I'm sure I could speak on behalf of the whole groupto say we felt lucky to be there!"

    The summer program has been supported by VIGRE and EMSW21 through graduate studentassistance with the data analysis projects. The workshop brings together about 40participants each year from around the country, including 24 undergraduate students, 6graduate students, 3 researcher-presenters, 2-3 organizers, and 2-3 additional faculty. Theprogram has the potential to increase the number and diversity of U.S. students who pursue

  • graduate degrees in statistics. In the three years the program was offered, 38 of the 72undergraduate participants were women and 13 were students from historicallyunderrepresented groups in science. Students from this program have gone on to statisticsPh.D. programs at the University of Washington, Iowa State, Carnegie Mellon, Stanford, andBerkeley. Indeed, Berkeley Ph.D. student Meghan Goldman is a graduate of the summerprogram and was graduate assistant for the program in Summer 2009. The program hashad a transformative effect on some of its participants. As a Ph.D. student at Iowa state toldDeborah Nolan, it was that single week in the summer that convinced him to change hismajor from mathematics to statistics, and to go on to graduate work in statistics.

    The VIGRE award has increased representation of domestic students and female studentsin our PhD program, as illustrated by the following comparisons: the percentage of domesticPhD students has increased from 11/36.75 (30%) in 2002-2003 (pre-VIGRE award) to 37.5/57.25 (61%) in 2008-2009, and the percentage of female students has increased from7/36.75 (19%) to 15/57.25 (30%). The percentage trend for domestic students isconsistently increasing from 2003 to 2009 (with a jump of 10% last year), the percentagesfor female students were around 20% except for 2003-2004 (25%) and the two recentyears (2007-2008: 29% and 2008-2009: 29%).

    The department’s graduate program has a strong track record of success with students whoarrive at the program with undergraduate degrees from highly ranked research universities.In an effort to broaden our community of students, we are actively seeking strong studentsfrom schools that have in the been unrepresented among our graduate students. This effortis paying off. For example, Derek Bean, our first Ph.D. student from the University of Maine,is doing exceptionally well. We are also encouraging strong applicants from liberal artscolleges. During the annual Joint Statistical Meetings at Salt Lake City in August 2008,Deborah Nolan arranged a meeting in which John Rice (then Chair of the department) andAdmissions Director Ani Adhikari met with her and Prof. Julie Legler of St. Olaf College. As aresult of this meeting, St. Olaf undergraduate Brianna Hirst applied to and was acceptedinto our Ph.D. program, and is now flourishing as a graduate student under the supervisionof Nick Jewell. Hirst is our first student from St. Olaf, and it is our hope that she will be thefirst of many students from relatively small liberal arts colleges. Based on the advice ofProf. Legler, our graduate admissions committee contacted undergraduate programsindividually, to seek out potential applicants. That is one of the reasons why the number ofapplicants to our Ph.D. program was almost 50% higher in 2008 as it had been in 2007.Similarly, it is through personal connections at other liberal arts colleges thatundergraduates have been encouraged to apply to our graduate program. For example,Terry Speed, Philip Stark, and John Rice have given talks at Reed College and PomonaCollege. Now that we have Ph.D. students from these colleges (Reed, St. Olaf, Bowdoin)one of our new VIGRE outreach efforts this year will be to assist graduate students makeconnections with undergraduates at their alma maters.

    The department’s VIGRE program has been instrumental in the recruitment of strongminority students as candidates for the Ph.D. degree. Until a few years ago, the departmenthad a relatively weak record of attracting minority students as applicants, and had nominority students in the Ph.D. program. In the past two years, however, the number ofsuch applicants has increased and in each of the past two years we have admitted a strongstudent of minority background – Tessa Childers of UC Davis in the class which entered in2008, and Miles Lopes of UCLA in the current incoming class. For their first two years, bothstudents were awarded the Chancellor’s Fellowship of UC Berkeley, won in competition withPh.D. candidates from other departments on campus. However, the honor of being offeredthe VIGRE fellowship for their next two years in the program was the key to theiracceptance of our offer of admission. Both students were heavily recruited by other schools,and it was the combination of the two fellowships that led to them choosing Berkeley. The

  • prospect of summer research and research-related travel that is part of the VIGRE programwas particularly attractive. In an effort to provide as many resources as possible, thedepartment also applied for funds from to the NSF-funded AGEP program known as“Berkeley Edge”. This program provided travel money for Lopes to attend a MachineLearning workshop at the University of Chicago in Summer 2009, and also supported him inindependent study with Prof. Martin Wainwright later in the same summer.

    The department is an active participant in the Berkeley Edge program, regularly hostingprospective graduate students from minority backgrounds to give them a sense of what isrequired to succeed in the Ph.D. program. Instructors in undergraduate courses identifystrong students to recommend to Berkeley Edge. This semester, for example, Ani Adhikarihas recommended two African American undergraduates, Alex Jones and Philip Persley.Though he is a an exceptionally strong Applied Mathematics and Computer Science major,Persley is keen to join the Ph.D. program in Statistics with a Designated Emphasis instatistical learning theory. With a combination of advice from Colette Patt of the BerkeleyPhysical Sciences Diversity Office and detailed guidance about course selection from facultyin Statistics, we believe that Persley is likely to succeed in his application. The identificationand encouragement of individual students who have the appropriate background,motivation, and ability are the most important factors in our success at attracting andnurturing students of minority backgrounds.

    At the postdoc level, Emilia Huerta-Sanchez was recruited to the department after a visit tothe department. She noted that the extensive outreach activities in the department was adetermining factor for her making the decision to come to Berkeley.

  • VI. How has the interaction of several levels of students and faculty been enhanced?

    As demonstrated throughout this report, VIGRE activities have had an enormous effect onthe interaction between undergraduates, graduate students, postdocs, and faculty in thedepartment as well as elsewhere on campus. In this section we briefly review the mainforums for interaction, and provide one further example which has not been mentionedbefore.

    The new research group model (see Appendix G for group descriptions) in the departmenthas provided a natural space for vertical integration of research efforts and interactions offaculty, postdocs, graduate students and undergraduate students. Within this model, peoplemeet regularly and work in teams and most of the time, the team is vertical integration atits most fundamentally functional form. That is, the research group model greatly enhancesthe interactions of several levels of students and faculty.

    As pointed out earlier (Section II), the VIGRE undergraduate seminar provides anotherintegration enhancing venue for postdocs, graduate students and undergraduates to interactand exchange ideas. The reading groups (Section II) provide another venue for faculty andpostdocs to interact with graduate students.

    Women faculty and graduate students have been organizing lunches supported by VIGRE.The lunches are usually twice a semester and extremely well attended by women faculty,postdocs, and graduate students from statistics and biostatistics. Lunch discussions are ontopics such as how to balance career and family and participants exchange tips on how tosucceed as a woman scientist. Each semester, one of the lunches will be formally organizedaround a particular topic. Last spring, a panel of four women faculty from engineering,sociology, and anthropology joined the group and led a discussion based on their personalexperiences on the topic of balancing family and career. Questions were solicitedbeforehand from students and a Yu served as the moderator. This semester, a panel ofcurrent or past postdocs (who are now assistant professors in the department) will be heldon the postdoc experience, including the process of applying for a postdoc position, the prosand cons of holding a postdoc.

  • VII. What is your EMSW21 program doing to affect the time to degree?

    To examine the effect of EMSW21 on time to degree, we compare the time to for PhDsawarded in the two-year period before our first VIGRE award (Fall 2001-Spring 2003)against the time to degree for the PhDs awarded in the most recent two year period (Fall2007-Spring 2009) when all PhD candidates had entered the program after the start of ourfirst VIGRE award. We call the first cohort the "Before" group and the other the "After"group. (Note that we did not include data on students who took a job and finished thedegree while working.)

    The Before group consists of 13 students with a mean time to degree of 5.54 years(median= 5.5 years) and standard deviation of 1.16 years (interquantile range=1.5 years),while the After group consists of 18 students with a mean time to degree of 5.2 years(median=5 years) and standard deviation of 1.00 year (interquantile range = 1.375). It isclear that the average time to degree (both in terms of mean and median) and the timevariability have been reduced, implying that there is a shift towards shorter time to degree.Among the 13 PhDs awarded in the Before group, only 2 went to domestic students (15%),but among the 18 degrees awarded after the VIGRE award, 5 went to domestic students(28%), reflecting the fact that the proportion of domestic students in our PhD program hasalmost doubled under VIGRE.

    It is also worth noting that 1 of the 2 Before domestic students who received their degree instatistics (as opposed to probability) analyzed real data in his thesis, while 4 out of 4 of theAfter students' theses in statistics contained substantial data analyses. Since more work,energy, and time are typically involved in completing an interdisciplinary applied thesis incomparison to a theoretical thesis in statistics, perhaps the change under VIGRE is largerthan the observed average difference of -0.34 years to degree. The number of students istoo small to be conclusive.

  • VIII. Has there been effective dissemination to the mathematical sciences community of theresults of this activity?

    The faculty have disseminated new course materials and made curriculum changes availableto other institutions through publication in journals and on the Web. In addition, we haveorganized workshops to bring together faculty from around the country who are interestedin making similar curriculum changes in terms of integrating computation more fully into thestatistics curriculum. Also, we have publicized our mentoring activities and curriculumchanges through panels, talks and lunches at professional society meetings. Thesedissemination efforts are briefly described below.

    Bootstrap camp materials: VIGRE Graduate students have created and subsequentlyedited and updated the Web pages on prerequisite material for our graduate courses.Incoming students are sent the link to this page in July to give them time to think about thecourses for which they were best prepared:http://www.stat.berkeley.edu/twiki/pub/SGSA/ReadingForIncomingStudents/Welcome.html

    Course Materials: The VIGRE grant has supported projects to develop innovativeeducational materials for advanced undergraduate course