Thesis defense Tatiana Makhalova

Dear all, 

I have the pleasure to invite you to the defense of my Ph.D. thesis entitled: 
"Contributions to pattern set mining: from complex datasets to significant and useful pattern sets". 
The defense will be held on Wednesday, June 23 , at 2 p.m. online (in English). 

The defense will be supported by Teams for the thesis committee and the defense will be also publicly available on YouTube through the link [ https://youtu.be/o0WMwOITClQ | https://youtu.be/o0WMwOITClQ ] 
If something goes wrong, updates will be published on [ https://docs.google.com/document/d/1BZfuWAI-QBQn1HLLODPWXiL52nkWvgD_9h9LS-0Zxm0/edit?usp=sharing | this page ] . 

The thesis committee will be composed of 

Reviewers: 
Arnaud Soulet, MCf HDR, Université de Tours, Tours 
Jilles Vreeken, Pr. The CISPA Helmholtz Center for Information Security, Saarbrücken 

Examiners: 
François Charoy, Pr. Université de Lorraine, Nancy 
Antoine Cornuéjols, Pr. AgroParisTech, Paris 
Elisa Fromont, Pr. Université de Rennes, Rennes 
Esther Galbrun, CR Inria, University of Eastern Finland, Kuopio 
Christel Vrain, Pr. Université de d'Orléans, Orléans 

Supervisors: 
Sergei O. Kuznetsov Pr. NRU HSE, Moscow 
Amedeo Napoli, DR CNRS LORIA, Nancy 

Abstract: 
We discuss different aspects of pattern mining in binary and numerical tabular datasets. The objective of pattern mining is to discover a small set of non-redundant patterns that may cover entirely a given dataset and be interpreted as useful and significant knowledge units. We focus on such issues as (i) formal definition of pattern interestingness, (ii) the mitigation of the pattern explosion problem, (iii) measure for evaluating the performance of pattern mining, and (iv) the discrepancy between interestingness and quality of the discovered pattern sets. 
The first part of the talk is devoted to a so-called closure structure and the GDPM algorithm for its computing. The closure structure allows for estimating both the data and pattern complexity. Moreover, we discuss how the closure structure allows an analyst to understand the intrinsic data configuration before selecting an interestingness measure for pattern mining. 
In the second part, we discuss the difference between interestingness and quality of pattern sets. We present the KeepItSimple algorithm that adopts the best practices of supervised learning in pattern mining and relates interestingness and the quality of pattern sets. We show that KeepItSimple allows for efficient mining of a set of interesting and good-quality patterns without any pattern explosion. 
The third part of the talk is devoted to numerical pattern mining. We present an MDL-based algorithm called Mint for mining pattern sets in numerical data. The Mint algorithm relies on a strong theoretical foundation and at the same time has a practical objective in returning a small set of numerical, non-redundant, and informative patterns. Mint has very good behavior in practice and usually outperforms its competitors. 

Keywords: Pattern Set Mining; Pattern interestingness; MDL; Minimum Description Length principle; Closed patterns; Equivalence classes; Data complexity; Closure structure; Pattern explosion; Pattern evaluation; Formal Concept Analysis; Interval Pattern Structures; Binary data; Numerical data 

Best regards, 
Tatiana Makhalov 

Received on Saturday, 19 June 2021 14:32:38 UTC