mlw-lt-track-ISSUE-29 (Maxime): Please comment the possible solutions for HTML+ITS2.0 to HTML (and | embedded by | embedding) RDF

mlw-lt-track-ISSUE-29 (Maxime): Please comment the possible solutions for HTML+ITS2.0 to HTML (and | embedded by | embedding) RDF 

http://www.w3.org/International/multilingualweb/lt/track/issues/29

Raised by: Maxime Lefrançois
On product: 

Hi all,

Some mapping to RDF will work, and using NIF may lead to the possibility of round-tripping.
There are still many issues that we need to address, 

One of which is:  
  To ease (enable) the propagation and overriding possibilities of its properties such as its:translate, we need to capture part of the DOM arborescence in the RDF. Verbosity and unstability problems arise when we go too far in this direction.

We give below a short description of the solutions we may have to map ITS to RDF. to illustrate our arguments, we use the following HTML Document Snippet adapted from http://richard.cyganiak.de/2007/10/lod/#how-to-join :
<html>
 <head>
 </head>
 <body>
  <h2  title="how to join" translate=”yes” >How can I get my <span translate=”no”>dataset</span> into the diagram? </h2>
 </body>
</html>


Generally, we distinguish between solutions on three aspects:

1. ITS info may be in HTML or in RDF or both.
 a. ITS info is removed from the HTML to be only present in RDF
     - pros: clear separation between ITS and the document, no redundancy
     - cons: this implies progressive modifications to the document (NIF problems)
 b. RDF coexist with ITS attributes in the HTML
     - pros: the document is kept unchanged
     - cons: redundancy

2. three possible ways to attach a set of triples to a document (may be a combination of all three):
 a. inline RDFa (RDFa is usually used to make text that's already human-readable machine-readable too.) ;
     - cons: We face lots of issues with RDFa inline annotation. For instance, when we want to add RDFa inline progressively during the process: we'd need to modify every NIF URIs computed so far.
 b. in the head: using a script element and a media type of text/n3, text/turtle or application/rdf+xml ;
 c. in the head: using a link element to refer to another document.
     - pros of 2.b. and 2.c.: NIF works well
         - The offset-based recipe only need to have a tiny extension so that offset 0 stands just before the '<' of "<body... "
         - The hash-based recipe only needs to be tweaked a little.

3. annotating a part of the HTML String or (a Element node | an Attribute node ) in the DOM, or (a Text node | an Attribute node ) in the DOM
The research issue NIF has to face, is wether we annotate a part of the HTML String or of the HTML DOM
 a. Part of the HTML String:
     - for the h2 in the HTML Document Snippet below, the annotated string would be: "<h2  title="how to join" translate=”yes” >How can I get my <span translate=”no”>dataset</span> into the diagram? </h2>"
     - pros: well defined, based on the HTML String, 
         - cons: deconnected from the DOM, not natural for XML / HTML documents
         - other pros and cons ?
 b&c. The value of an attribute of a single DOM node: the value of an attribute (  the textContent attribute ? ) of the Node type as defined in the DOM level core recommendation http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent
     - pros: well defined, based on the DOM
     - other pros and cons ?
     - ex Text node: for the first text node of the h2 element node, the annotated string would be: "How can I get my "
     - ex Text node: for the second text node of the h2 element node, the annotated string would be: " into the diagram? "
     - ex Attribute node: for the @id attribute node, the annotated string would be: "how to join"
     - ex Element node: for the h2 element node, the annotated string would be: "How can I get my dataset into the diagram? "
        - cons: for a element, all the inner markup disapears ->  no more ITS info about the inner span -> for ITS2.0 that would make us rather annotate only Attribute Nodes and Text Node for ITS. -> there is a conceptualization shift with HTML+ITS2.0 where element nodes may be annotated, and text nodes can't.
         - other pros and cons ?
         - What about a set of DOM elements (like the XPath selectors of the global rules select)
         - a new class in the string such as str:StringSet ?
         - other pros and cons ?
 
We discussed a lot about this 3.a., 3.b. and 3.c., ... any comment/idea for the specific use case of ITS 2.0 is welcome 
Whatever case we choose (String annotation or DOM node annotation), we are discussing about a new NIF recipe inspired by XPath 1.0 to create a URI for that fragment.

What combination of 1, 2, and 3 do you think is the best compromise for ITS 2.0 ?
Examples and short descriptions of combinations may be given on demand.

Kind regards, 
Maxime Lefrançois, 
RDF TaskForce (John McCrae, Tadej Stajner, Sebastian Hellmann)

Received on Thursday, 21 June 2012 08:16:35 UTC