Re: mlw-lt-track-ISSUE-29 (Maxime): Please comment the possible solutions for HTML+ITS2.0 to HTML (and | embedded by | embedding) RDF from Sebastian Hellmann on 2012-06-21 (public-multilingualweb-lt@w3.org from June 2012)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Thu, 21 Jun 2012 22:07:21 +0200
To: Jirka Kosek <jirka@kosek.cz>
CC: MultilingualWeb-LT Working Group <public-multilingualweb-lt@w3.org>, MultilingualWeb-LT Working Group Issue Tracker <sysbot+tracker@w3.org>
Message-ID: <4FE37EF9.9020801@informatik.uni-leipzig.de>

Hi Jirka,

On 06/21/2012 03:40 PM, Jirka Kosek wrote:
> On 21.6.2012 10:16, MultilingualWeb-LT Working Group Issue Tracker wrote:
>
>> Generally, we distinguish between solutions on three aspects:
>>
>> 1. ITS info may be in HTML or in RDF or both.
>>   b. RDF coexist with ITS attributes in the HTML
>>       - pros: the document is kept unchanged
>>       - cons: redundancy
> I think that having document unchanged is non-goal. By having the same
> information expressed in a two different ways means that it is very
> likely that such information can diverge and you will have to decide
> what takes precedence then.
>
> I think that mapping from HTML+ITS to RDFa should strip all original
> its-* attributes and global rules referenced by<link rel="itsrules">
>
>> 2. three possible ways to attach a set of triples to a document (may be a combination of all three):
>>   a. inline RDFa (RDFa is usually used to make text that's already human-readable machine-readable too.) ;
>>       - cons: We face lots of issues with RDFa inline annotation. For instance, when we want to add RDFa inline progressively during the process: we'd need to modify every NIF URIs computed so far.
> Maybe you can use something different then NIF then. For example XPath
> location of element that has its-* attached to it?
Well, your chainsaw picture still seems quite appropriate. For RDFa to 
work *at all* , you will need to create URIs. XPath is a candidate for 
that and it is straightforward to append after a '#', which is very 
similar to the way NIF uses URIs.  What features would you like that 
XPath location to have? Felix mentioned that evaluating XPath can be 
quite expensive. Could you give us a figure what would be feasible?  
Something like "#xpath_html/1/body/1/h2/1" would allow to use SAX 
instead of DOM, but xpath would also not 100% proof against changes, 
right? The only 100% stable way to use RDFa (and also RDF) is to create 
URIs based on element ids. Still it remains impossible to mark up 
everything inline and you would need additional triples either in the 
script or parallel in DOM.

<html><head>
<!-- option 1-->
<script type="text/turtle">
@prefix my:<http://domain.org/doc.html#>
:id_XX its:translate "false" .
</script></head>
<body>  <h2>How can I get my<span id="id_XX" about="my:id_XX" property="str:anchorOf">dataset</span>  into the diagram?</h2>
<!-- option 2 -->
<span about="my:id_XX"   property="translate" content="true" />


My personal opinion on this is that RDFa inline is possible, but might 
not be worth the trouble. Having RDF would be feasible, however.

>>   b. in the head: using a script element and a media type of text/n3, text/turtle or application/rdf+xml ;
>>   c. in the head: using a link element to refer to another document.
>>       - pros of 2.b. and 2.c.: NIF works well
>>           - The offset-based recipe only need to have a tiny extension so that offset 0 stands just before the '<' of "<body... "
>>           - The hash-based recipe only needs to be tweaked a little.
> More I think about NIF which references location based on its character
> offset (please correct me if I'm wrong) I think it's solution that will
> not work. Even simple edit in an underlying HTML document will break up
> all existing annotations.
The use case for this is not to be stable during editing. It rather 
defines a transition from the RDF/NIF World to ITS. So, if you have 
tools such as NERD, Open Calais, DBpedia Spotlight, you can define a 
transition from their NIF-RDF output to ITS inline markup. I think the 
conversion to ITS should be trivial.

>> We discussed a lot about this 3.a., 3.b. and 3.c., ... any comment/idea for the specific use case of ITS 2.0 is welcome
>> Whatever case we choose (String annotation or DOM node annotation), we are discussing about a new NIF recipe inspired by XPath 1.0 to create a URI for that fragment.
> ITS 1.0 is used to attach data categories to element and attributes. So
> anything what will be able to address DOM nodes of type Element and
> Attribute should work. Supporting just DOM Text node is not necessary as
> there is no corresponding XML representation.
Ok, so all its annotations are either referring to Element or Attribute. 
This should be feasible with a computationally efficient subset of xpath.
The NIF offset solution is built upon RFC 5147 [1] in a way that
@prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#> .
ld:offset_717_729  owl:sameAs ld:char=717,12 .

If you were to include a URI scheme (using id and/or subset of xpath) 
for RDF and RDFa in  MLW-LT and ITS, we could extend NIF by building 
upon it.
I would assume, this would be a legit way to do it.  IIRC there were 
some use cases for RDF (especially for provenance and for harvesting NER 
tool output ).

All the best,
Sebastian

[1] http://tools.ietf.org/html/rfc5147



>
>     Jirka
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

Received on Thursday, 21 June 2012 20:07:51 UTC