Re: Resolution proposal for ISSUE-2 from Phil Ritchie on 2012-03-23 (public-multilingualweb-lt@w3.org from March 2012)

From: Phil Ritchie <philr@vistatec.ie>
Date: Fri, 23 Mar 2012 08:56:10 +0000
To: Felix Sasaki <fsasaki@w3.org>
Cc: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, Tadej Stajner <tadej.stajner@ijs.si>
Message-ID: <OFFDC8A5E3.3D4D84C1-ON802579CA.0030A8A9-802579CA.003116C2@vistatec.ie>
Felix

After a Chianti-fuelled Internet surfing session last night I am now 
classifying myself as an RDFa/Microdata novice (one level progression from 
newbie).  All now makes sense.  Therefore we can extend the data 
categories in your table as required (in an "informative" way) as our 
reference implementations require.

Phil.





From:   Felix Sasaki <fsasaki@w3.org>
To:     Phil Ritchie <philr@vistatec.ie>, 
Cc:     Tadej Stajner <tadej.stajner@ijs.si>, 
"public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Date:   23/03/2012 08:48
Subject:        Re: Resolution proposal for ISSUE-2



Hi Phil,

thanks a lot for your mail. Actually I don't think that you need to dive 
deeply into RDFa and Microdata. We need just make clear in a conformance 
statement that:

1) An implementation of our standard needs to be able to parse its-* (or 
whatever prefix we have) attributes in HTML, e.g. the HTML "translate" 
attribute, its-locNote, its-term etc.
2) An implementation working in the XLIFF (or general XML) space needs to 
be able to parse the XML counterparts of the its-* attributes, e.g. 
its:translate, its:locNote, its:term etc.  
3) An implementation MAY implement the (to be detailed out) "convert HTML5 
to RDFa or Microdata" algorithm, including the URI generation facility 
Tadej mentioned.

You can boil this down to a table with four columns, see attachment. An 
implementation MUST state: "I implement data category XYZ, in HTML5, or 
XML. If HTML5, then I provide the RDFa / Microdata conversion".  

HTH,

Felix

Am 22. März 2012 22:12 schrieb Phil Ritchie <philr@vistatec.ie>:
I'm afraid I need to do some serious reading over the weekend on RDFa and 
Microdata before I'll feel qualified to contribute properly to the 
discussion.

The important considerations for me would relate to parsability but all of 
the proposals would seem to provide well structured, non-ambiguous, simply 
tokenised format.

Phil



On 22 Mar 2012, at 17:18, "Felix Sasaki" <fsasaki@w3.org> wrote:

Thank you, Tadej. Trying to summarize what you say: we need

1) HTML5 + ITS (or XYZ) schema 
2) Algorithm for transforming "HTML5+ITS" into HTML5/RDFa , /Microdata, or 
/RDFa Lite. Could we say we just cover RDFa lite?
3) Algorithm (what you wrote below) to generate URIs in RDFa

Your question about "A question for people consuming RDF/RDFa" still needs 
an answer, but otherwise I think we are done with this. Any thoughts by 
others, esp. implementors in the group? 

Felix

Am 22. März 2012 15:47 schrieb Tadej Stajner <tadej.stajner@ijs.si>:
On 3/22/2012 2:11 PM, Felix Sasaki wrote: 


Am 22. März 2012 13:52 schrieb Jirka Kosek <jirka@kosek.cz>:
On 22.3.2012 13:09, Felix Sasaki wrote:

> Solution 1) will be user friendly, and we will define an RELAX NG schema
> HTML5+ITS (or + XYZ). The same approach has been taken for Aria in the
> accessibility space, and Aria is now even part of the HTML5 core 
language.
>
> Comments are very welcome. I hope we can agree on during next week's 
call
> and find a volunteer for maintaining the schema and another one for the
> mappings.

I volunteer for creating and maintaining schema.

Great, thanks a lot. 

> Regarding the "URIs for element nodes in HTML5" discussion: Ivan said 
that
> our group should consider whether this is really an issue.

I would expected more positioned reply from SW activity lead :-)

Well, to be fair, he was more precise:

"RDFa does not include any definition, as far as the extracted RDF is 
concerned, on pointing 'back' to the original source structure. This 
should be done explicitly. I am not sure whether this is a major issue, 
this is something for the group to consider..."

But the essence is the same: is it important for us?
 

Some things to add (and to shed some light on ACTION-32):

I think it's important to define a way to do it, but not have it 
obligatory to serialize because it has zero utility until someone actually 
uses it in pure RDF. The thing is, as long as the HTML document is 
available and the RDFa is inlined, the references to the HTML structure in 
RDF don't add any additional information and can be trivially 
reconstructed. RDFa consumption tools can likely handle that kind of 
content as-is.

The tricky case is if someone at some point wants to get pure RDF from 
this (dropping the HTML in the process), we should have some specification 
that they could follow to achieve these references. The use case I can 
think of is feeding ITS-marked-up input into a NLP pipeline running on 
something like NIF, which needs URIs for annotated fragments of text. 
Luckily the conversion itself is pretty mechanical, so I see some 
strategies for minting URIs that can be dereferenceable directly to the 
fragment:
* have the RDF node point back to the HTML element's id, if there is any 
(<meta property="its:annotates" resource="#id_myElement_bar" />)
* have the RDF node mint a URI for the fragment using one if the NIF 
recipes (<meta property="its:annotates" 
resource="#hash_1_3_12341234123412341_bar" />)

A question for people consuming RDF/RDFa - is defining this sort of "URI 
generation recipe" at the RDFa consumption stage breaking too many 
assumptions? I'd like to avoid having producers generate redundant data.

.. and back to answering "how much RDF do we need"?
My reason for considering RDFa was to encode the additional information we 
might have about the concepts that are behind the text. Right now the most 
important uses are:
- the URI of the concept (the "means " relation);
- the type URI of the concept (see ISSUE-3) (the "this fragment represents 
a concept of the type" relation);
- the labels of the concept in other languages;

Since we can model those via the proposed data categories, we don't need 
explicit RDF support to represent this - it is however very important that 
these predicates can point to URIs in the RDF space (as is currently the 
case with its:termInfoRef, for instance), and that we at least have a 
process in place for transforming "HTML5+ITS" into HTML5/RDFa , 
/Microdata, or /RDFa Lite. Right now the examples you submitted look good 
for that purpose, adding an HTML URI generator should cover that part.

-- Tadej




Anyway we probably shouldn't spend much time on mappings as I can't
imagine anyone using RDFa/microdata in favor of simple attributes.

I hope that the mapping can be fairly mechanical and will not need much 
time. Even if it is not created by hand, I can imagine tools like Enrycher 
that easily can generate it. Having then a mapping of Enrycher output as 
an input to schema.org based SEO is a nice scenario, IMO, but it depends 
on RDFa/microdata.

Felix
 

                               Jirka

--
------------------------------------------------------------------
 Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
------------------------------------------------------------------
      Professional XML consulting and training services
 DocBook customization, custom XSLT/XSL-FO document processing
------------------------------------------------------------------
 OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
------------------------------------------------------------------




-- 
Felix Sasaki 
DFKI / W3C Fellow





-- 
Felix Sasaki
DFKI / W3C Fellow


************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.

www.vistatec.com
************************************************************



-- 
Felix Sasaki
DFKI / W3C Fellow
[attachment "mlw-lt_table.pdf" deleted by Phil Ritchie/VISTATEC] 

************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.

www.vistatec.com
************************************************************
Received on Friday, 23 March 2012 08:56:45 UTC