mlw-lt-track-ISSUE-29 (Maxime): Please comment the possible solutions for HTML+ITS2.0 to HTML (and | embedded by | embedding) RDF from MultilingualWeb-LT Working Group Issue Tracker on 2012-06-21 (public-multilingualweb-lt@w3.org from June 2012)

From: MultilingualWeb-LT Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Thu, 21 Jun 2012 08:16:27 +0000
To: public-multilingualweb-lt@w3.org
Message-Id: <E1ShcZ1-00034p-Gj@tibor.w3.org>

mlw-lt-track-ISSUE-29 (Maxime): Please comment the possible solutions for HTML+ITS2.0 to HTML (and | embedded by | embedding) RDF

http://www.w3.org/International/multilingualweb/lt/track/issues/29

Raised by: Maxime Lefrançois
On product:

Hi all,

Some mapping to RDF will work, and using NIF may lead to the possibility of round-tripping.
There are still many issues that we need to address,

One of which is:
To ease (enable) the propagation and overriding possibilities of its properties such as its:translate, we need to capture part of the DOM arborescence in the RDF. Verbosity and unstability problems arise when we go too far in this direction.

We give below a short description of the solutions we may have to map ITS to RDF. to illustrate our arguments, we use the following HTML Document Snippet adapted from http://richard.cyganiak.de/2007/10/lod/#how-to-join :
<html>
<head>
</head>
<body>
<h2 title="how to join" translate=”yes” >How can I get my <span translate=”no”>dataset</span> into the diagram? </h2>
</body>
</html>

Generally, we distinguish between solutions on three aspects:

1. ITS info may be in HTML or in RDF or both.
a. ITS info is removed from the HTML to be only present in RDF
- pros: clear separation between ITS and the document, no redundancy
- cons: this implies progressive modifications to the document (NIF problems)
b. RDF coexist with ITS attributes in the HTML
- pros: the document is kept unchanged
- cons: redundancy

2. three possible ways to attach a set of triples to a document (may be a combination of all three):
a. inline RDFa (RDFa is usually used to make text that's already human-readable machine-readable too.) ;
- cons: We face lots of issues with RDFa inline annotation. For instance, when we want to add RDFa inline progressively during the process: we'd need to modify every NIF URIs computed so far.
b. in the head: using a script element and a media type of text/n3, text/turtle or application/rdf+xml ;
c. in the head: using a link element to refer to another document.
- pros of 2.b. and 2.c.: NIF works well
- The offset-based recipe only need to have a tiny extension so that offset 0 stands just before the '<' of "<body... "
- The hash-based recipe only needs to be tweaked a little.

3. annotating a part of the HTML String or (a Element node | an Attribute node ) in the DOM, or (a Text node | an Attribute node ) in the DOM
The research issue NIF has to face, is wether we annotate a part of the HTML String or of the HTML DOM
a. Part of the HTML String:
- for the h2 in the HTML Document Snippet below, the annotated string would be: "<h2 title="how to join" translate=”yes” >How can I get my <span translate=”no”>dataset</span> into the diagram? </h2>"
- pros: well defined, based on the HTML String,
- cons: deconnected from the DOM, not natural for XML / HTML documents
- other pros and cons ?
b&c. The value of an attribute of a single DOM node: the value of an attribute ( the textContent attribute ? ) of the Node type as defined in the DOM level core recommendation http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent
- pros: well defined, based on the DOM
- other pros and cons ?
- ex Text node: for the first text node of the h2 element node, the annotated string would be: "How can I get my "
- ex Text node: for the second text node of the h2 element node, the annotated string would be: " into the diagram? "
- ex Attribute node: for the @id attribute node, the annotated string would be: "how to join"
- ex Element node: for the h2 element node, the annotated string would be: "How can I get my dataset into the diagram? "
- cons: for a element, all the inner markup disapears -> no more ITS info about the inner span -> for ITS2.0 that would make us rather annotate only Attribute Nodes and Text Node for ITS. -> there is a conceptualization shift with HTML+ITS2.0 where element nodes may be annotated, and text nodes can't.
- other pros and cons ?
- What about a set of DOM elements (like the XPath selectors of the global rules select)
- a new class in the string such as str:StringSet ?
- other pros and cons ?

We discussed a lot about this 3.a., 3.b. and 3.c., ... any comment/idea for the specific use case of ITS 2.0 is welcome
Whatever case we choose (String annotation or DOM node annotation), we are discussing about a new NIF recipe inspired by XPath 1.0 to create a URI for that fragment.

What combination of 1, 2, and 3 do you think is the best compromise for ITS 2.0 ?
Examples and short descriptions of combinations may be given on demand.

Kind regards,
Maxime Lefrançois,
RDF TaskForce (John McCrae, Tadej Stajner, Sebastian Hellmann)

Received on Thursday, 21 June 2012 08:16:35 UTC