- From: MultilingualWeb-LT Working Group Issue Tracker <sysbot+tracker@w3.org>
- Date: Thu, 21 Jun 2012 08:16:27 +0000
- To: public-multilingualweb-lt@w3.org
mlw-lt-track-ISSUE-29 (Maxime): Please comment the possible solutions for HTML+ITS2.0 to HTML (and | embedded by | embedding) RDF http://www.w3.org/International/multilingualweb/lt/track/issues/29 Raised by: Maxime Lefrançois On product: Hi all, Some mapping to RDF will work, and using NIF may lead to the possibility of round-tripping. There are still many issues that we need to address, One of which is: To ease (enable) the propagation and overriding possibilities of its properties such as its:translate, we need to capture part of the DOM arborescence in the RDF. Verbosity and unstability problems arise when we go too far in this direction. We give below a short description of the solutions we may have to map ITS to RDF. to illustrate our arguments, we use the following HTML Document Snippet adapted from http://richard.cyganiak.de/2007/10/lod/#how-to-join : <html> <head> </head> <body> <h2 title="how to join" translate=”yes” >How can I get my <span translate=”no”>dataset</span> into the diagram? </h2> </body> </html> Generally, we distinguish between solutions on three aspects: 1. ITS info may be in HTML or in RDF or both. a. ITS info is removed from the HTML to be only present in RDF - pros: clear separation between ITS and the document, no redundancy - cons: this implies progressive modifications to the document (NIF problems) b. RDF coexist with ITS attributes in the HTML - pros: the document is kept unchanged - cons: redundancy 2. three possible ways to attach a set of triples to a document (may be a combination of all three): a. inline RDFa (RDFa is usually used to make text that's already human-readable machine-readable too.) ; - cons: We face lots of issues with RDFa inline annotation. For instance, when we want to add RDFa inline progressively during the process: we'd need to modify every NIF URIs computed so far. b. in the head: using a script element and a media type of text/n3, text/turtle or application/rdf+xml ; c. in the head: using a link element to refer to another document. - pros of 2.b. and 2.c.: NIF works well - The offset-based recipe only need to have a tiny extension so that offset 0 stands just before the '<' of "<body... " - The hash-based recipe only needs to be tweaked a little. 3. annotating a part of the HTML String or (a Element node | an Attribute node ) in the DOM, or (a Text node | an Attribute node ) in the DOM The research issue NIF has to face, is wether we annotate a part of the HTML String or of the HTML DOM a. Part of the HTML String: - for the h2 in the HTML Document Snippet below, the annotated string would be: "<h2 title="how to join" translate=”yes” >How can I get my <span translate=”no”>dataset</span> into the diagram? </h2>" - pros: well defined, based on the HTML String, - cons: deconnected from the DOM, not natural for XML / HTML documents - other pros and cons ? b&c. The value of an attribute of a single DOM node: the value of an attribute ( the textContent attribute ? ) of the Node type as defined in the DOM level core recommendation http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent - pros: well defined, based on the DOM - other pros and cons ? - ex Text node: for the first text node of the h2 element node, the annotated string would be: "How can I get my " - ex Text node: for the second text node of the h2 element node, the annotated string would be: " into the diagram? " - ex Attribute node: for the @id attribute node, the annotated string would be: "how to join" - ex Element node: for the h2 element node, the annotated string would be: "How can I get my dataset into the diagram? " - cons: for a element, all the inner markup disapears -> no more ITS info about the inner span -> for ITS2.0 that would make us rather annotate only Attribute Nodes and Text Node for ITS. -> there is a conceptualization shift with HTML+ITS2.0 where element nodes may be annotated, and text nodes can't. - other pros and cons ? - What about a set of DOM elements (like the XPath selectors of the global rules select) - a new class in the string such as str:StringSet ? - other pros and cons ? We discussed a lot about this 3.a., 3.b. and 3.c., ... any comment/idea for the specific use case of ITS 2.0 is welcome Whatever case we choose (String annotation or DOM node annotation), we are discussing about a new NIF recipe inspired by XPath 1.0 to create a URI for that fragment. What combination of 1, 2, and 3 do you think is the best compromise for ITS 2.0 ? Examples and short descriptions of combinations may be given on demand. Kind regards, Maxime Lefrançois, RDF TaskForce (John McCrae, Tadej Stajner, Sebastian Hellmann)
Received on Thursday, 21 June 2012 08:16:35 UTC