The Department of Applied Mathematics weekly seminar is given by scholars and researchers working in applied mathematics, broadly interpreted.
Title: Double dipping: problems and solutions, with application to single-cell RNA-sequencing data
Abstract: In contemporary applications, it is common to collect very large data sets with the vaguely-defined goal of hypothesis generation. Once a dataset is used to generate a hypothesis, we might wish to test that hypothesis on the same set of data. However, this type of "double dipping" violates a cardinal rule of statistical hypothesis testing: namely, that we must decide what hypothesis to test before looking at the data. When this rule is violated, then standard statistical hypothesis tests (such as t-tests and z-tests) fail to control the selective Type 1 error --- that is, the probability of rejecting the null hypothesis, provided that the null hypothesis holds, and given that we decided to test this null hypothesis.