Re: Resolution proposal for ISSUE-2 from Phil Ritchie on 2012-03-22 (public-multilingualweb-lt@w3.org from March 2012)

From: Phil Ritchie <philr@vistatec.ie>
Date: Thu, 22 Mar 2012 21:12:42 +0000
To: "Felix Sasaki" <fsasaki@w3.org>
Cc: "Tadej Stajner" <tadej.stajner@ijs.si>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Message-ID: <7CC5E685-FCE4-4959-8A84-0B0F27704F4C@vistatec.ie>
I'm afraid I need to do some serious reading over the weekend on RDFa and Microdata before I'll feel qualified to contribute properly to the discussion.

The important considerations for me would relate to parsability but all of the proposals would seem to provide well structured, non-ambiguous, simply tokenised format.

Phil



On 22 Mar 2012, at 17:18, "Felix Sasaki" <fsasaki@w3.org> wrote:

> Thank you, Tadej. Trying to summarize what you say: we need
> 
> 1) HTML5 + ITS (or XYZ) schema 
> 2) Algorithm for transforming "HTML5+ITS" into HTML5/RDFa , /Microdata, or /RDFa Lite. Could we say we just cover RDFa lite?
> 3) Algorithm (what you wrote below) to generate URIs in RDFa
> 
> Your question about "A question for people consuming RDF/RDFa" still needs an answer, but otherwise I think we are done with this. Any thoughts by others, esp. implementors in the group? 
> 
> Felix
> 
> Am 22. März 2012 15:47 schrieb Tadej Stajner <tadej.stajner@ijs.si>:
> On 3/22/2012 2:11 PM, Felix Sasaki wrote:
>> 
>> 
>> 
>> Am 22. März 2012 13:52 schrieb Jirka Kosek <jirka@kosek.cz>:
>> On 22.3.2012 13:09, Felix Sasaki wrote:
>> 
>> > Solution 1) will be user friendly, and we will define an RELAX NG schema
>> > HTML5+ITS (or + XYZ). The same approach has been taken for Aria in the
>> > accessibility space, and Aria is now even part of the HTML5 core language.
>> >
>> > Comments are very welcome. I hope we can agree on during next week's call
>> > and find a volunteer for maintaining the schema and another one for the
>> > mappings.
>> 
>> I volunteer for creating and maintaining schema.
>> 
>> Great, thanks a lot. 
>> 
>> > Regarding the "URIs for element nodes in HTML5" discussion: Ivan said that
>> > our group should consider whether this is really an issue.
>> 
>> I would expected more positioned reply from SW activity lead :-)
>> 
>> Well, to be fair, he was more precise:
>> 
>> "RDFa does not include any definition, as far as the extracted RDF is concerned, on pointing 'back' to the original source structure. This should be done explicitly. I am not sure whether this is a major issue, this is something for the group to consider..."
>> 
>> But the essence is the same: is it important for us?
>>  
> 
> Some things to add (and to shed some light on ACTION-32):
> 
> I think it's important to define a way to do it, but not have it obligatory to serialize because it has zero utility until someone actually uses it in pure RDF. The thing is, as long as the HTML document is available and the RDFa is inlined, the references to the HTML structure in RDF don't add any additional information and can be trivially reconstructed. RDFa consumption tools can likely handle that kind of content as-is.
> 
> The tricky case is if someone at some point wants to get pure RDF from this (dropping the HTML in the process), we should have some specification that they could follow to achieve these references. The use case I can think of is feeding ITS-marked-up input into a NLP pipeline running on something like NIF, which needs URIs for annotated fragments of text. Luckily the conversion itself is pretty mechanical, so I see some strategies for minting URIs that can be dereferenceable directly to the fragment:
> * have the RDF node point back to the HTML element's id, if there is any (<meta property="its:annotates" resource="#id_myElement_bar" />)
> * have the RDF node mint a URI for the fragment using one if the NIF recipes (<meta property="its:annotates" resource="#hash_1_3_12341234123412341_bar" />)
> 
> A question for people consuming RDF/RDFa - is defining this sort of "URI generation recipe" at the RDFa consumption stage breaking too many assumptions? I'd like to avoid having producers generate redundant data.
> 
> .. and back to answering "how much RDF do we need"?
> My reason for considering RDFa was to encode the additional information we might have about the concepts that are behind the text. Right now the most important uses are:
> - the URI of the concept (the "means " relation);
> - the type URI of the concept (see ISSUE-3) (the "this fragment represents a concept of the type" relation);
> - the labels of the concept in other languages;
> 
> Since we can model those via the proposed data categories, we don't need explicit RDF support to represent this - it is however very important that these predicates can point to URIs in the RDF space (as is currently the case with its:termInfoRef, for instance), and that we at least have a process in place for transforming "HTML5+ITS" into HTML5/RDFa , /Microdata, or /RDFa Lite. Right now the examples you submitted look good for that purpose, adding an HTML URI generator should cover that part.
> 
> -- Tadej
> 
> 
> 
>> 
>> Anyway we probably shouldn't spend much time on mappings as I can't
>> imagine anyone using RDFa/microdata in favor of simple attributes.
>> 
>> I hope that the mapping can be fairly mechanical and will not need much time. Even if it is not created by hand, I can imagine tools like Enrycher that easily can generate it. Having then a mapping of Enrycher output as an input to schema.org based SEO is a nice scenario, IMO, but it depends on RDFa/microdata.
>> 
>> Felix
>>  
>> 
>>                                Jirka
>> 
>> --
>> ------------------------------------------------------------------
>>  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
>> ------------------------------------------------------------------
>>       Professional XML consulting and training services
>>  DocBook customization, custom XSLT/XSL-FO document processing
>> ------------------------------------------------------------------
>>  OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
>> ------------------------------------------------------------------
>> 
>> 
>> 
>> 
>> -- 
>> Felix Sasaki
>> DFKI / W3C Fellow
>> 
> 
> 
> 
> 
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
> 

************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.

www.vistatec.com
************************************************************
Received on Thursday, 22 March 2012 21:13:14 UTC