Re: Resolution proposal for ISSUE-2 from Tadej Stajner on 2012-03-22 (public-multilingualweb-lt@w3.org from March 2012)

From: Tadej Stajner <tadej.stajner@ijs.si>
Date: Thu, 22 Mar 2012 15:47:11 +0100
To: public-multilingualweb-lt@w3.org
Message-ID: <4F6B3B6F.7060900@ijs.si>
On 3/22/2012 2:11 PM, Felix Sasaki wrote:
>
>
> Am 22. März 2012 13:52 schrieb Jirka Kosek <jirka@kosek.cz 
> <mailto:jirka@kosek.cz>>:
>
>     On 22.3.2012 13:09, Felix Sasaki wrote:
>
>     > Solution 1) will be user friendly, and we will define an RELAX
>     NG schema
>     > HTML5+ITS (or + XYZ). The same approach has been taken for Aria
>     in the
>     > accessibility space, and Aria is now even part of the HTML5 core
>     language.
>     >
>     > Comments are very welcome. I hope we can agree on during next
>     week's call
>     > and find a volunteer for maintaining the schema and another one
>     for the
>     > mappings.
>
>     I volunteer for creating and maintaining schema.
>
>
> Great, thanks a lot.
>
>
>     > Regarding the "URIs for element nodes in HTML5" discussion: Ivan
>     said that
>     > our group should consider whether this is really an issue.
>
>     I would expected more positioned reply from SW activity lead :-)
>
>
> Well, to be fair, he was more precise:
>
> "RDFa does not include any definition, as far as the extracted RDF is 
> concerned, on pointing 'back' to the original source structure. This 
> should be done explicitly. I am not sure whether this is a major 
> issue, this is something for the group to consider..."
>
> But the essence is the same: is it important for us?

Some things to add (and to shed some light on ACTION-32):

I think it's important to define a way to do it, but not have it 
obligatory to serialize because it has zero utility until someone 
actually uses it in pure RDF. The thing is, as long as the HTML document 
is available and the RDFa is inlined, the references to the HTML 
structure in RDF don't add any additional information and can be 
trivially reconstructed. RDFa consumption tools can likely handle that 
kind of content as-is.

The tricky case is if someone at some point wants to get pure RDF from 
this (dropping the HTML in the process), we should have some 
specification that they could follow to achieve these references. The 
use case I can think of is feeding ITS-marked-up input into a NLP 
pipeline running on something like NIF, which needs URIs for annotated 
fragments of text. Luckily the conversion itself is pretty mechanical, 
so I see some strategies for minting URIs that can be dereferenceable 
directly to the fragment:
* have the RDF node point back to the HTML element's id, if there is any 
(<meta property="its:annotates" resource="#id_myElement_bar" />)
* have the RDF node mint a URI for the fragment using one if the NIF 
recipes (<meta property="its:annotates" 
resource="#hash_1_3_12341234123412341_bar" />)

A question for people consuming RDF/RDFa - is defining this sort of "URI 
generation recipe" at the RDFa consumption stage breaking too many 
assumptions? I'd like to avoid having producers generate redundant data.

.. and back to answering "how much RDF do we need"?
My reason for considering RDFa was to encode the additional information 
we might have about the concepts that are behind the text. Right now the 
most important uses are:
- the URI of the concept (the "means " relation);
- the type URI of the concept (see ISSUE-3) (the "this fragment 
represents a concept of the type" relation);
- the labels of the concept in other languages;

Since we can model those via the proposed data categories, we don't need 
explicit RDF support to represent this - it is however very important 
that these predicates can point to URIs in the RDF space (as is 
currently the case with its:termInfoRef, for instance), and that we at 
least have a process in place for transforming "HTML5+ITS" into 
HTML5/RDFa , /Microdata, or /RDFa Lite. Right now the examples you 
submitted look good for that purpose, adding an HTML URI generator 
should cover that part.

-- Tadej


>
>     Anyway we probably shouldn't spend much time on mappings as I can't
>     imagine anyone using RDFa/microdata in favor of simple attributes.
>
>
> I hope that the mapping can be fairly mechanical and will not need 
> much time. Even if it is not created by hand, I can imagine tools like 
> Enrycher that easily can generate it. Having then a mapping of 
> Enrycher output as an input to schema.org <http://schema.org> based 
> SEO is a nice scenario, IMO, but it depends on RDFa/microdata.
>
> Felix
>
>
>                                    Jirka
>
>     --
>     ------------------------------------------------------------------
>      Jirka Kosek      e-mail: jirka@kosek.cz <mailto:jirka@kosek.cz>
>     http://xmlguru.cz
>     ------------------------------------------------------------------
>           Professional XML consulting and training services
>      DocBook customization, custom XSLT/XSL-FO document processing
>     ------------------------------------------------------------------
>      OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
>     ------------------------------------------------------------------
>
>
>
>
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
>
Received on Thursday, 22 March 2012 14:47:43 UTC