Zhiyi Zhang, Department of Mathematics and Statistics, University of North Carolina at Charlotte

Title:Statistical Implications of Turing’s Formula

Abstract:

This talk is organized into three parts.

1. Turing’s formula is introduced. Given an iid sample from an countable alphabet under a probability distribution, Turing’s formula (introduced by Good (1953), hence also known as the Good-Turing formula) is a mind-bending non-parametric estimator of total probability associated with letters of the alphabet that are NOT represented in the sample. Many of its statistical properties were not clearly known for a stretch of nearly sixty years until recently. Some of the newly established results, including various asymptotic normal laws, are described.

2. Turing’s perspective is described. Turing’s formula brought about a new perspective (or a new characterization) of probability distributions on general countable alphabets. The new perspective in turn provides a new way to do statistics on alphabets, where the usual statistical concepts associated with random variables (on the real line) no longer exist, for example, moments, tails, coefficients of correlation, characteristic functions don’t exist on alphabets (a major challenge of modern data sciences). The new perspective, in the form of entropic basis, is introduced.

3. Several applications are presented, including estimation of information entropy and diversity indices