PhD position (3 years) on Semantic Web and Data Mining @(IRISA, University Rennes 1, France)

* Call for applications to a PhD position *

The SemLIS team at IRISA (one of the largest French lab in Computer 
Science), located in Rennes, is looking for candidates for a PhD in the 
domains of the Semantic Web and Data Mining:
"Semantic Lifting of Complex Data by Hypergraph Compression - Stylistics 
in the usage of computer languages"

Details about the PhD subject can be found below or at 
http://www.irisa.fr/fr/offre-theses.
Expected start date: September 2018.

For more information or to apply for the position, please contact :
- Peggy Cellier (peggy.cellier@irisa.fr)
- Sébastien Ferré (sebastien.ferre@irisa.fr).

---------------------------------------------

*Application*
An important challenge in Big Data and more specifically in the Semantic 
Web is semantic lifting, i.e. finding ways to turn data into knowledge. 
A large part of those data are documents expressed in languages ranging 
from computer languages (e.g., source code, SQL queries) to natural 
languages (e..g, web pages, books). Analogously to stylistics that aims 
at analysing documents in natural language, it is interesting to analyse 
documents in computer language to discover design patterns and good 
practices from experts’ usage in order to help transfer their knowledge 
to beginners. However, since the size of the documents and their complex 
low-level representations (e.g., instructions, operators), it is a 
challenge for a human to gain knowledge by merely reading the raw 
documents. Patterns discovered in documents could be used to perform a 
semantic lifting by re-encoding the low-level data into a compressed 
higher-level knowledge representation.

*Objectives*
The aim of this PhD is to propose an approach of semantic lifting based 
on hypergraphs, i.e. graphs with n-ary edges. We propose to use the same 
structure for both low-level data (e.g., control-flow graph) and 
higher-level knowledge (e.g., design patterns, idiomatics, boilerplate 
code). The advantage of hypergraphs is that they are expressive and 
already used to represent knowledge in the Semantic Web. In addition, 
using the same structure for data and knowledge allows to iterate steps 
of semantic lifting, and therefore to tune the abstraction level of the 
compressed knowledge.

*Technical context and challenge*
Formal Concept Analysis (FCA) [1] is a theoretical framework that is 
used for unsupervised knowledge extraction, and has been applied to many 
domains (e.g., social sciences, software refactoring). An extension of 
FCA, Graph-FCA [2,3], has recently been defined in the SemLIS team, 
where the input is a hypergraph, i.e. a set of n-ary relationships 
instead of one binary relationship, thus allowing more expressive 
representations. Graph mining algorithms [4,5] and Relational Concept 
Analysis (RCA) [6] can also be used to generate graph patterns that they 
are generalized to hypergraphs. However, the number of concepts can get 
very large, as it is often the case in data mining. A key issue of this 
PhD is to select a small subset of the graph concepts in order to 
re-encode the data at a higher level [7]. The selection criteria will 
follow the MDL principle (Minimum Description Length) [8,9], which has 
already been applied to FCA-like data but not yet to graphs. The idea is 
to use graph concepts to re-encode the representation of the original 
graph in a compressed way by factorizing out repeated patterns. The 
"best" concepts are those that compress the most.

*Candidate profile*
The expected research work requires a taste for theory, algorithmics, 
implementation, and experiments.

*References*
[1] Ganter, B., Wille, R.: Formal Concept Analysis – Mathematical 
Foundations. Springer, Heidelberg (1999)
[2] Ferré, S.: A proposal for extending Formal Concept Analysis to 
knowledge graphs. In: Int. Conf. Formal Concept Analysis (ICFCA). LNAI 
9113, pp. 271-286. Springer (2015)
[3] Ferré, S., Cellier P.: Graph-FCA in Practice - Int. Conf. Conceptual 
Structures (ICCS). LNAI 9717, Springer (2016)
[4] Huan, J., Wang, W., Prins, J.: Efficient mining of frequent 
subgraphs in the presence of isomorphism. Int. Conf. Data Mining. IEEE 
(2003)
[5] Yan, X. and Han, J.: CloseGraph: mining closed frequent graph 
patterns. Int. Conf. Knowledge discovery and data mining (SIGKDD). ACM 
(2003)
[6] Dolques, X. and Huchard, M. and Nebut, C. and Reitz, P.: Learning 
transformation rules from transformation examples: An approach based on 
relational concept analysis. Enterprise Distributed Object Computing 
Conf. Work. (EDOCW). IEEE (2010)
[7] M. Klimushkin, S. Obiedkov, and C. Roth: Approaches to the selection 
of relevant concepts in the case of noisy data. International Conference 
on Formal Concept Analysis. Springer Berlin Heidelberg, 2010.
[8] J. Vreeken, M. Van Leeuwen, A. Siebes: Krimp: mining itemsets that 
compress - Data Mining and Knowledge. Springer, 2011.
[9] D.J. Cook, and L.B. Holder. Substructure discovery using minimum 
description length and background knowledge. Journal of Artificial 
Intelligence Research 1 (1994): 231-255.

Received on Tuesday, 27 March 2018 09:53:49 UTC