Re: mlw-lt-track-ISSUE-29 (Maxime): Please comment the possible solutions for HTML+ITS2.0 to HTML (and | embedded by | embedding) RDF from Felix Sasaki on 2012-06-21 (public-multilingualweb-lt@w3.org from June 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 21 Jun 2012 22:22:45 +0200
To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Cc: Jirka Kosek <jirka@kosek.cz>, MultilingualWeb-LT Working Group <public-multilingualweb-lt@w3.org>
Message-ID: <CAL58czqF4tEndOuxzFsXLEYtxfNY+6EHkZE98J7hpNgK9zcEyg@mail.gmail.com>
2012/6/21 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>

> Hi Jirka,
>
>
> On 06/21/2012 03:40 PM, Jirka Kosek wrote:
>
>> On 21.6.2012 10:16, MultilingualWeb-LT Working Group Issue Tracker wrote:
>>
>>  Generally, we distinguish between solutions on three aspects:
>>>
>>> 1. ITS info may be in HTML or in RDF or both.
>>>  b. RDF coexist with ITS attributes in the HTML
>>>      - pros: the document is kept unchanged
>>>      - cons: redundancy
>>>
>> I think that having document unchanged is non-goal. By having the same
>> information expressed in a two different ways means that it is very
>> likely that such information can diverge and you will have to decide
>> what takes precedence then.
>>
>> I think that mapping from HTML+ITS to RDFa should strip all original
>> its-* attributes and global rules referenced by<link rel="itsrules">
>>
>>  2. three possible ways to attach a set of triples to a document (may be
>>> a combination of all three):
>>>  a. inline RDFa (RDFa is usually used to make text that's already
>>> human-readable machine-readable too.) ;
>>>      - cons: We face lots of issues with RDFa inline annotation. For
>>> instance, when we want to add RDFa inline progressively during the process:
>>> we'd need to modify every NIF URIs computed so far.
>>>
>> Maybe you can use something different then NIF then. For example XPath
>> location of element that has its-* attached to it?
>>
> Well, your chainsaw picture still seems quite appropriate. For RDFa to
> work *at all* , you will need to create URIs. XPath is a candidate for that
> and it is straightforward to append after a '#', which is very similar to
> the way NIF uses URIs.  What features would you like that XPath location to
> have? Felix mentioned that evaluating XPath can be quite expensive. Could
> you give us a figure what would be feasible?  Something like
> "#xpath_html/1/body/1/h2/1" would allow to use SAX instead of DOM, but
> xpath would also not 100% proof against changes, right?


Just FYI, without arguing for anything, for the ITS 1.0 test suite
http://www.w3.org/International/its/tests/
we created something like this
/{}myMetaDoc/{}body[1]/{}insert[1]/{myChineseMakupLanguage}书籍[1]
to identify each element and attribute note - taken from a "path" attribute
at
http://www.w3.org/International/its/tests/test1/Translate1-result.xml

The format expands namespace (if there is none, there is empty curly
brackets).

Felix


> The only 100% stable way to use RDFa (and also RDF) is to create URIs
> based on element ids. Still it remains impossible to mark up everything
> inline and you would need additional triples either in the script or
> parallel in DOM.
>
> <html><head>
> <!-- option 1-->
> <script type="text/turtle">
> @prefix my:<http://domain.org/doc.**html# <http://domain.org/doc.html#>>
> :id_XX its:translate "false" .
> </script></head>
> <body>  <h2>How can I get my<span id="id_XX" about="my:id_XX"
> property="str:anchorOf">**dataset</span>  into the diagram?</h2>
> <!-- option 2 -->
> <span about="my:id_XX"   property="translate" content="true" />
>
>
> My personal opinion on this is that RDFa inline is possible, but might not
> be worth the trouble. Having RDF would be feasible, however.
>
>
>   b. in the head: using a script element and a media type of text/n3,
>>> text/turtle or application/rdf+xml ;
>>>  c. in the head: using a link element to refer to another document.
>>>      - pros of 2.b. and 2.c.: NIF works well
>>>          - The offset-based recipe only need to have a tiny extension so
>>> that offset 0 stands just before the '<' of "<body... "
>>>          - The hash-based recipe only needs to be tweaked a little.
>>>
>> More I think about NIF which references location based on its character
>> offset (please correct me if I'm wrong) I think it's solution that will
>> not work. Even simple edit in an underlying HTML document will break up
>> all existing annotations.
>>
> The use case for this is not to be stable during editing. It rather
> defines a transition from the RDF/NIF World to ITS. So, if you have tools
> such as NERD, Open Calais, DBpedia Spotlight, you can define a transition
> from their NIF-RDF output to ITS inline markup. I think the conversion to
> ITS should be trivial.
>
>
>  We discussed a lot about this 3.a., 3.b. and 3.c., ... any comment/idea
>>> for the specific use case of ITS 2.0 is welcome
>>> Whatever case we choose (String annotation or DOM node annotation), we
>>> are discussing about a new NIF recipe inspired by XPath 1.0 to create a URI
>>> for that fragment.
>>>
>> ITS 1.0 is used to attach data categories to element and attributes. So
>> anything what will be able to address DOM nodes of type Element and
>> Attribute should work. Supporting just DOM Text node is not necessary as
>> there is no corresponding XML representation.
>>
> Ok, so all its annotations are either referring to Element or Attribute.
> This should be feasible with a computationally efficient subset of xpath.
> The NIF offset solution is built upon RFC 5147 [1] in a way that
> @prefix ld: <http://www.w3.org/**DesignIssues/LinkedData.html#<http://www.w3.org/DesignIssues/LinkedData.html#>>
> .
> ld:offset_717_729  owl:sameAs ld:char=717,12 .
>
> If you were to include a URI scheme (using id and/or subset of xpath) for
> RDF and RDFa in  MLW-LT and ITS, we could extend NIF by building upon it.
> I would assume, this would be a legit way to do it.  IIRC there were some
> use cases for RDF (especially for provenance and for harvesting NER tool
> output ).
>
> All the best,
> Sebastian
>
> [1] http://tools.ietf.org/html/**rfc5147<http://tools.ietf.org/html/rfc5147>
>
>
>
>
>>                                Jirka
>>
>>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Projects: http://nlp2rdf.org , http://dbpedia.org
> Homepage: http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
> Research Group: http://aksw.org
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Thursday, 21 June 2012 20:23:12 UTC