Re: Resolution proposal for ISSUE-2

Hi Phil,

thanks a lot for your mail. Actually I don't think that you need to dive
deeply into RDFa and Microdata. We need just make clear in a conformance
statement that:

1) An implementation of our standard needs to be able to parse its-* (or
whatever prefix we have) attributes in HTML, e.g. the HTML "translate"
attribute, its-locNote, its-term etc.
2) An implementation working in the XLIFF (or general XML) space needs to
be able to parse the XML counterparts of the its-* attributes, e.g.
its:translate, its:locNote, its:term etc.
3) An implementation MAY implement the (to be detailed out) "convert HTML5
to RDFa or Microdata" algorithm, including the URI generation facility
Tadej mentioned.

You can boil this down to a table with four columns, see attachment. An
implementation MUST state: "I implement data category XYZ, in HTML5, or
XML. If HTML5, then I provide the RDFa / Microdata conversion".

HTH,

Felix

Am 22. März 2012 22:12 schrieb Phil Ritchie <philr@vistatec.ie>:

> I'm afraid I need to do some serious reading over the weekend on RDFa and
> Microdata before I'll feel qualified to contribute properly to the
> discussion.
>
> The important considerations for me would relate to parsability but all of
> the proposals would seem to provide well structured, non-ambiguous, simply
> tokenised format.
>
> Phil
>
>
>
> On 22 Mar 2012, at 17:18, "Felix Sasaki" <fsasaki@w3.org> wrote:
>
> Thank you, Tadej. Trying to summarize what you say: we need
>
> 1) HTML5 + ITS (or XYZ) schema
> 2) Algorithm for transforming "HTML5+ITS" into HTML5/RDFa , /Microdata, or
> /RDFa Lite. Could we say we just cover RDFa lite?
> 3) Algorithm (what you wrote below) to generate URIs in RDFa
>
> Your question about "A question for people consuming RDF/RDFa" still needs
> an answer, but otherwise I think we are done with this. Any thoughts by
> others, esp. implementors in the group?
>
> Felix
>
> Am 22. März 2012 15:47 schrieb Tadej Stajner <tadej.stajner@ijs.si>:
>
>>  On 3/22/2012 2:11 PM, Felix Sasaki wrote:
>>
>>
>>
>> Am 22. März 2012 13:52 schrieb Jirka Kosek <jirka@kosek.cz>:
>>
>>> On 22.3.2012 13:09, Felix Sasaki wrote:
>>>
>>> > Solution 1) will be user friendly, and we will define an RELAX NG
>>> schema
>>> > HTML5+ITS (or + XYZ). The same approach has been taken for Aria in the
>>> > accessibility space, and Aria is now even part of the HTML5 core
>>> language.
>>> >
>>> > Comments are very welcome. I hope we can agree on during next week's
>>> call
>>> > and find a volunteer for maintaining the schema and another one for the
>>> > mappings.
>>>
>>>  I volunteer for creating and maintaining schema.
>>>
>>
>>  Great, thanks a lot.
>>
>>>
>>> > Regarding the "URIs for element nodes in HTML5" discussion: Ivan said
>>> that
>>> > our group should consider whether this is really an issue.
>>>
>>>  I would expected more positioned reply from SW activity lead :-)
>>>
>>
>>  Well, to be fair, he was more precise:
>>
>>  "RDFa does not include any definition, as far as the extracted RDF is
>> concerned, on pointing 'back' to the original source structure. This should
>> be done explicitly. I am not sure whether this is a major issue, this is
>> something for the group to consider..."
>>
>>  But the essence is the same: is it important for us?
>>
>>
>>
>> Some things to add (and to shed some light on ACTION-32):
>>
>> I think it's important to define a way to do it, but not have it
>> obligatory to serialize because it has zero utility until someone actually
>> uses it in pure RDF. The thing is, as long as the HTML document is
>> available and the RDFa is inlined, the references to the HTML structure in
>> RDF don't add any additional information and can be trivially
>> reconstructed. RDFa consumption tools can likely handle that kind of
>> content as-is.
>>
>> The tricky case is if someone at some point wants to get pure RDF from
>> this (dropping the HTML in the process), we should have some specification
>> that they could follow to achieve these references. The use case I can
>> think of is feeding ITS-marked-up input into a NLP pipeline running on
>> something like NIF, which needs URIs for annotated fragments of text.
>> Luckily the conversion itself is pretty mechanical, so I see some
>> strategies for minting URIs that can be dereferenceable directly to the
>> fragment:
>> * have the RDF node point back to the HTML element's id, if there is any
>> (<meta property="its:annotates" resource="#id_myElement_bar" />)
>> * have the RDF node mint a URI for the fragment using one if the NIF
>> recipes (<meta property="its:annotates"
>> resource="#hash_1_3_12341234123412341_bar" />)
>>
>> A question for people consuming RDF/RDFa - is defining this sort of "URI
>> generation recipe" at the RDFa consumption stage breaking too many
>> assumptions? I'd like to avoid having producers generate redundant data.
>>
>> .. and back to answering "how much RDF do we need"?
>> My reason for considering RDFa was to encode the additional information
>> we might have about the concepts that are behind the text. Right now the
>> most important uses are:
>> - the URI of the concept (the "means " relation);
>> - the type URI of the concept (see ISSUE-3) (the "this fragment
>> represents a concept of the type" relation);
>> - the labels of the concept in other languages;
>>
>> Since we can model those via the proposed data categories, we don't need
>> explicit RDF support to represent this - it is however very important that
>> these predicates can point to URIs in the RDF space (as is currently the
>> case with its:termInfoRef, for instance), and that we at least have a
>> process in place for transforming "HTML5+ITS" into HTML5/RDFa , /Microdata,
>> or /RDFa Lite. Right now the examples you submitted look good for that
>> purpose, adding an HTML URI generator should cover that part.
>>
>> -- Tadej
>>
>>
>>
>>
>>> Anyway we probably shouldn't spend much time on mappings as I can't
>>> imagine anyone using RDFa/microdata in favor of simple attributes.
>>>
>>
>>  I hope that the mapping can be fairly mechanical and will not need much
>> time. Even if it is not created by hand, I can imagine tools like Enrycher
>> that easily can generate it. Having then a mapping of Enrycher output as an
>> input to schema.org based SEO is a nice scenario, IMO, but it depends on
>> RDFa/microdata.
>>
>>  Felix
>>
>>
>>>
>>>                                Jirka
>>>
>>> --
>>> ------------------------------------------------------------------
>>>  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
>>> ------------------------------------------------------------------
>>>       Professional XML consulting and training services
>>>  DocBook customization, custom XSLT/XSL-FO document processing
>>> ------------------------------------------------------------------
>>>  OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
>>> ------------------------------------------------------------------
>>>
>>>
>>
>>
>>  --
>> Felix Sasaki
>> DFKI / W3C Fellow
>>
>>
>>
>
>
> --
> Felix Sasaki
> DFKI / W3C Fellow
>
>
> ************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the sender immediately by e-mail.
>
> www.vistatec.com
> ************************************************************
>



-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Friday, 23 March 2012 08:49:14 UTC