Thesis defense Tatiana Makhalova from Amedeo Napoli on 2021-06-19 (public-lod@w3.org from June 2021)

From: Amedeo Napoli <amedeo.napoli@loria.fr>
Date: Sat, 19 Jun 2021 15:01:13 +0200 (CEST)
To: Tatiana Makhalova <tatiana.makhalova@inria.fr>, Sergei Kuznetsov <skuznetsov@yandex.ru>, Amedeo Napoli <amedeo.napoli@loria.fr>
Message-ID: <64756685.4803438.1624107673227.JavaMail.zimbra@loria.fr>
Archived-At: <http://sympa.inria.fr/sympa/arcsearch_id/mailing-list-cla-2011/2021-06/64756685.4803438.1624107673227.JavaMail.zimbra%40loria.fr>

Dear all,

I have the pleasure to invite you to the defense of my Ph.D. thesis entitled:
"Contributions to pattern set mining: from complex datasets to significant and useful pattern sets".
The defense will be held on Wednesday, June 23 , at 2 p.m. online (in English).

The defense will be supported by Teams for the thesis committee and the defense will be also publicly available on YouTube through the link [ https://youtu.be/o0WMwOITClQ | https://youtu.be/o0WMwOITClQ ]
If something goes wrong, updates will be published on [ https://docs.google.com/document/d/1BZfuWAI-QBQn1HLLODPWXiL52nkWvgD_9h9LS-0Zxm0/edit?usp=sharing | this page ] .

The thesis committee will be composed of

Reviewers:
Arnaud Soulet, MCf HDR, Universit� de Tours, Tours
Jilles Vreeken, Pr. The CISPA Helmholtz Center for Information Security, Saarbr�cken

Examiners:
Fran�ois Charoy, Pr. Universit� de Lorraine, Nancy
Antoine Cornu�jols, Pr. AgroParisTech, Paris
Elisa Fromont, Pr. Universit� de Rennes, Rennes
Esther Galbrun, CR Inria, University of Eastern Finland, Kuopio
Christel Vrain, Pr. Universit� de d'Orl�ans, Orl�ans

Supervisors:
Sergei O. Kuznetsov Pr. NRU HSE, Moscow
Amedeo Napoli, DR CNRS LORIA, Nancy

Abstract:
We discuss different aspects of pattern mining in binary and numerical tabular datasets. The objective of pattern mining is to discover a small set of non-redundant patterns that may cover entirely a given dataset and be interpreted as useful and significant knowledge units. We focus on such issues as (i) formal definition of pattern interestingness, (ii) the mitigation of the pattern explosion problem, (iii) measure for evaluating the performance of pattern mining, and (iv) the discrepancy between interestingness and quality of the discovered pattern sets.
The first part of the talk is devoted to a so-called closure structure and the GDPM algorithm for its computing. The closure structure allows for estimating both the data and pattern complexity. Moreover, we discuss how the closure structure allows an analyst to understand the intrinsic data configuration before selecting an interestingness measure for pattern mining.
In the second part, we discuss the difference between interestingness and quality of pattern sets. We present the KeepItSimple algorithm that adopts the best practices of supervised learning in pattern mining and relates interestingness and the quality of pattern sets. We show that KeepItSimple allows for efficient mining of a set of interesting and good-quality patterns without any pattern explosion.
The third part of the talk is devoted to numerical pattern mining. We present an MDL-based algorithm called Mint for mining pattern sets in numerical data. The Mint algorithm relies on a strong theoretical foundation and at the same time has a practical objective in returning a small set of numerical, non-redundant, and informative patterns. Mint has very good behavior in practice and usually outperforms its competitors.

Keywords: Pattern Set Mining; Pattern interestingness; MDL; Minimum Description Length principle; Closed patterns; Equivalence classes; Data complexity; Closure structure; Pattern explosion; Pattern evaluation; Formal Concept Analysis; Interval Pattern Structures; Binary data; Numerical data

Best regards,
Tatiana Makhalov

Received on Saturday, 19 June 2021 14:32:38 UTC