The ability to accurately discover all hidden relations between items that share similarities is of paramount importance to a wide range of disciplines. Clustering algorithms are employed throughout social sciences, biology, computer science, economics, and physics. Having the ability to cluster accurately is valuable, because once clusters have been identified, any subsequent analyses and optimization procedures benefit from a powerful reduction in dimensionality.
The focus of this project is on mathematically studying new clustering problems with time-dependencies, which have recently become omnipresent in state-of-the-art data-driven applications: data is often time-sensitive; their value fleeting; and data-driven algorithms must execute faster than the underlying system generating the data changes.
Generally speaking, the canonical Stochastic Block Model (SBM), which does not have a time-dimension let alone model correlations through time, has become the dominant benchmark to investigate the performance of cluster detection algorithms. The mathematical results that have been obtained for SBMs and its variants, particularly on the detectability of communities and exact recovery thereof, are fundamental, intriguing, and deep. The challenging derivations of these results explicitly build on the independencies and symmetries in the SBM, and can thus not be directly translated to clustering problems with time-dependencies.
Our first objective will be to do a rigorous analysis of time-dependent clustering in MCs. MCs are the fundamental stochastic model that generates a random sequence of events in which a weak-form of time dependency occurs that is challenging yet will still be tractable for our clustering problem. Concretely, we will:
Objective 1: Create and rigorously analyze new time-dependent clustering algorithms that can optimally cluster in MCs with clusters that are thinned, noisy, and overlap.
To guarantee that the new algorithms achieve their respective fundamental detectability limits, we will prove versions of Wigner’s Semicircle Law, as well as Marcenko-Pastur’s Law, that are valid for random centered matrices with dependent entries driven by MCs. Our key insight is that we can now combine new state-of-the-art concentration results for Markov chains with the properties of cluster structures, which will enable us to bound moments of the eigenvalue distribution measure asymptotically and sufficiently sharp. Specifically, we will:
Objective 2: Develop new mathematical theory on the concentration of spectra of random matrices with Markovian dependent entries.
You are an ideal candidate if you:
You will embed in the Stochastic Operations Research (SOR) research group, part of the Statistics, Probability, and Operations Research (SPOR) cluster and thus the M&CS department. The SOR research group is concerned with complex systems operating under randomness and uncertainty, and aims to develop mathematical models and techniques for the analysis and optimization of such systems. Methodologically, SOR’s research program falls at the intersection of Applied Probability and Operations Research, and SOR in particular engages in cutting-edge research in the area of queueing theory and analysis of random walks and higher-dimensional Markov processes. A key aim is to develop analytic, probabilistic, algorithmic and asymptotic methods, with emphasis on asymptotic laws and scaling limits for large-scale critical systems. While fundamental and methodological in nature, the research is deeply inspired by applications in computer-communications, logistics and service operations, but also biological systems, particle interactions and social networks. SOR comprises of approximately 10 faculty, and 20 PhDs.
We offer you:
If you are interested in this PhD student position, use the ‘apply now’ button. Screening of candidates begins on November 1st, 2020 and continues until the position is filled. Applications received by November 1st, 2020 will receive full consideration.
To be considered, you must upload the following documents (in pdf):
Please be aware that you can upload only 5 documents up to 2 MB each.
Questions regarding the academic content of the position can be directed to:
For information concerning employment conditions you can contact:
More information on employment conditions can also be found here: