- From: Sebastien Ferre <Sebastien.Ferre@irisa.fr>
- Date: Tue, 27 Mar 2018 11:53:23 +0200
- To: semantic-web@w3.org
- Message-ID: <4d6d2bf8-055b-424a-c19d-ba5c534eb513@irisa.fr>
* Call for applications to a PhD position * The SemLIS team at IRISA (one of the largest French lab in Computer Science), located in Rennes, is looking for candidates for a PhD in the domains of the Semantic Web and Data Mining: "Semantic Lifting of Complex Data by Hypergraph Compression - Stylistics in the usage of computer languages" Details about the PhD subject can be found below or at http://www.irisa.fr/fr/offre-theses. Expected start date: September 2018. For more information or to apply for the position, please contact : - Peggy Cellier (peggy.cellier@irisa.fr) - Sébastien Ferré (sebastien.ferre@irisa.fr). --------------------------------------------- *Application* An important challenge in Big Data and more specifically in the Semantic Web is semantic lifting, i.e. finding ways to turn data into knowledge. A large part of those data are documents expressed in languages ranging from computer languages (e.g., source code, SQL queries) to natural languages (e..g, web pages, books). Analogously to stylistics that aims at analysing documents in natural language, it is interesting to analyse documents in computer language to discover design patterns and good practices from experts’ usage in order to help transfer their knowledge to beginners. However, since the size of the documents and their complex low-level representations (e.g., instructions, operators), it is a challenge for a human to gain knowledge by merely reading the raw documents. Patterns discovered in documents could be used to perform a semantic lifting by re-encoding the low-level data into a compressed higher-level knowledge representation. *Objectives* The aim of this PhD is to propose an approach of semantic lifting based on hypergraphs, i.e. graphs with n-ary edges. We propose to use the same structure for both low-level data (e.g., control-flow graph) and higher-level knowledge (e.g., design patterns, idiomatics, boilerplate code). The advantage of hypergraphs is that they are expressive and already used to represent knowledge in the Semantic Web. In addition, using the same structure for data and knowledge allows to iterate steps of semantic lifting, and therefore to tune the abstraction level of the compressed knowledge. *Technical context and challenge* Formal Concept Analysis (FCA) [1] is a theoretical framework that is used for unsupervised knowledge extraction, and has been applied to many domains (e.g., social sciences, software refactoring). An extension of FCA, Graph-FCA [2,3], has recently been defined in the SemLIS team, where the input is a hypergraph, i.e. a set of n-ary relationships instead of one binary relationship, thus allowing more expressive representations. Graph mining algorithms [4,5] and Relational Concept Analysis (RCA) [6] can also be used to generate graph patterns that they are generalized to hypergraphs. However, the number of concepts can get very large, as it is often the case in data mining. A key issue of this PhD is to select a small subset of the graph concepts in order to re-encode the data at a higher level [7]. The selection criteria will follow the MDL principle (Minimum Description Length) [8,9], which has already been applied to FCA-like data but not yet to graphs. The idea is to use graph concepts to re-encode the representation of the original graph in a compressed way by factorizing out repeated patterns. The "best" concepts are those that compress the most. *Candidate profile* The expected research work requires a taste for theory, algorithmics, implementation, and experiments. *References* [1] Ganter, B., Wille, R.: Formal Concept Analysis – Mathematical Foundations. Springer, Heidelberg (1999) [2] Ferré, S.: A proposal for extending Formal Concept Analysis to knowledge graphs. In: Int. Conf. Formal Concept Analysis (ICFCA). LNAI 9113, pp. 271-286. Springer (2015) [3] Ferré, S., Cellier P.: Graph-FCA in Practice - Int. Conf. Conceptual Structures (ICCS). LNAI 9717, Springer (2016) [4] Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. Int. Conf. Data Mining. IEEE (2003) [5] Yan, X. and Han, J.: CloseGraph: mining closed frequent graph patterns. Int. Conf. Knowledge discovery and data mining (SIGKDD). ACM (2003) [6] Dolques, X. and Huchard, M. and Nebut, C. and Reitz, P.: Learning transformation rules from transformation examples: An approach based on relational concept analysis. Enterprise Distributed Object Computing Conf. Work. (EDOCW). IEEE (2010) [7] M. Klimushkin, S. Obiedkov, and C. Roth: Approaches to the selection of relevant concepts in the case of noisy data. International Conference on Formal Concept Analysis. Springer Berlin Heidelberg, 2010. [8] J. Vreeken, M. Van Leeuwen, A. Siebes: Krimp: mining itemsets that compress - Data Mining and Knowledge. Springer, 2011. [9] D.J. Cook, and L.B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research 1 (1994): 231-255.
Received on Tuesday, 27 March 2018 09:53:49 UTC