Re: Suggestion for Microdata to RDF conversion from Benjamin Nowack on 2010-01-21 (public-html@w3.org from January 2010)

From: Benjamin Nowack <bnowack@semsol.com>
Date: Thu, 21 Jan 2010 22:36:24 +0100
To: "Philip Jägenstedt" <philipj@opera.com>, public-html@w3.org
Message-ID: <PM-GA.20100121223624.1CD81.2.1D@semsol.com>
Hi Philip,

The steps would be like this:
* The parser finds itemtype=http://xmlns.com/foaf/0.1/Person
* In common RDF practice, this (slash-)URI represents an RDF
  Class called "Person" from an RDF vocabulary identified by 
  "http://xmlns.com/foaf/0.1/" (The other case are hash-URIs,
  where the term name is separated from the vocabulary URI
  by a #, e.g. as in "http://rdfs.org/sioc/ns#User").
* The parser finds itemprop=name
* According to the Microdata spec (read with RDF glasses on ;),
  this property should be defined within the context of the 
  itemtype ("Person" from the "http://xmlns.com/foaf/0.1/" 
  vocabulary). If the term was from a different vocab, the 
  author would have had to use a full URI instead of the 
  plain "name".
* The plain "name" property can thus be expanded to 
  "http://xmlns.com/foaf/0.1/name" because we saw it's 
  a vocab following the slash-pattern.
* Same with "img", but with the difference that @src is
  a URL property, so the value of src is converted to 
  a non-literal resource identifier.

My parser doesn't do any HTTP operations. It *could* do it and 
warn about undefined predicate URIs (the FOAF URIs lead to RDF 
Schema data when RDF accept headers are sent), but I wouldn't 
do or propose anything like that during the conversion step.
(And there is this OpenWorldAssumption, a compliant RDF client
can't tell if a property is undefined, even if it checks the
data served at the namespace.)

Having to GET information during the extraction process would 
surely be unattractive to spec implementers and RDF vocabs
don't necessarily offer this sort of "follow-your-nose" 
mechanism (it's just "good practice"). I'm not aware of any 
RDF parser that GETs schema info during parsing. Some data 
consumers (like timbl's tabulator) de-ref vocabulary URIs, 
but they do it at the application level, i.e. *after* the 
parsing step.

One conceptual issue I see is that this author-oriented 
convention makes it possible to say one thing in two ways 
(either via the full prop URI or the plain one), which might
be considered bad language design.

There are also some theoretical corner cases where the
RDF Class URI doesn't fall in the slash-xor-hash category,
but I haven't come across any. And if there were, then the
author would just use full itemprop URIs.

IOW: If the itemtype specifies a vocabulary (either by a 
URI or a pre-defined name, such as "vcard"), then authors
can use plain names for the properties of this particular
vocabulary within the typed itemscope, e.g. "fn". As an
author, I would find this rule intuitive and consistent.

Cheers,
Benji


On 21.01.2010 21:00:54, Philip Jägenstedt wrote:
>I misunderstood what you were trying to do. Can you outline the steps  
>needed to get the triples in your example? If you start out with  
>http://xmlns.com/foaf/0.1/Person, name and img, would you HTTP GET  
>http://xmlns.com/foaf/0.1/Person, hope to find an RDF Schema and find  
>within it something relating to "name" and "img"? As far as I can see  
>http://xmlns.com/foaf/0.1/Person redirects to an HTML page that uses RDFa  
>to encode the RDF schema.
>
>Philip
>
>On Thu, 21 Jan 2010 10:17:35 +0100, Benjamin Nowack <bnowack@semsol.com> 
>wrote:
>
>>
>> Hi Philip, thanks for the reply, but you missed my point ;)
>>
>> I know the current conversion algorithm, but it leads to rather
>> ugly and unintuitive RDF. I was making a *suggestion* how
>> Microdata could/should be converted to common RDF using very
>> simple markup that would give RDFa a run for its money, within
>> the boundaries of the Microdata syntax, and in line with RDF
>> concepts.
>>
>> I can still implement it via some easy post-processing and
>> call it Microdata/RDF or some such, but I guess it makes
>> sense if the *RDF* parts of Microdata are based on experience
>> and feedback from the RDF community and support relevant best
>> practices. From an RDF POV, the prefixing/escaping of plain
>> properties isn't required (if the motivation is to avoid
>> the creation of non-resolvable URIs). RDF encourages resolvable
>> term names, but tools can handle any sort of term just fine.
>>
>> I *personally* wouldn't even specify anything RDF-ish in the
>> Microdata spec directly at all. Just define the HTML-relevant
>> syntax and DOM API and let the RDF community figure the
>> mapping out on their own. This worked for microformats, too.
>>
>> Benji
>>
>> --
>> Benjamin Nowack
>> http://bnode.org/
>> http://semsol.com/
>>
>> On 21.01.2010 00:41:15, Philip Jägenstedt wrote:
>>> On Tue, 19 Jan 2010 18:17:19 +0100, Benjamin Nowack <bnowack@semsol.com>
>>> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I'm working on a Microdata parser as part of an RDF toolkit. One thing
>>>> I've implemented but that isn't directly stated in the draft is the
>>>> generation of property URIs in the context of the itemtype. The spec
>>>> mentions that plain properties are to be used "within the context of
>>>> the types for which they are intended". And RDF vocabularies follow
>>>> the {namespace}/{type|prop} or {namespace}#{type|prop} pattern. So,
>>>> I made my parser extract the following RDF triples:
>>>>
>>>>   _:bnode a <http://xmlns.com/foaf/0.1/Person> .
>>>>   _:bnode <http://xmlns.com/foaf/0.1/name> "Alec Tronnick" .
>>>>   _:bnode <http://xmlns.com/foaf/0.1/img> <mypic.jpg> .
>>>>
>>>> from the Microdata snippet:
>>>>
>>>>   <div itemscope="" itemtype="http://xmlns.com/foaf/0.1/Person">
>>>>     My name is <span itemprop="name">Alec Tronnick</span>
>>>>     <img itemprop="img" src="mypic.jpg" alt="" />
>>>>   </div>
>>>>
>>>> as "name" and "img" are supposed to be applicable to the "Person"
>>>> type from the FOAF vocabulary. This sort of rule leads to very
>>>> compact markup in most single-vocab use cases (even more compact
>>>> and readable than RDFa's CURIEs) and simple authoring.
>>>>
>>>> A sophisticated parser *could* GET and check the RDF vocabulary
>>>> for valid use of properties, but RDF does not have instance-level
>>>> validation, so the transparent expansion of plain property names
>>>> does not conflict with the RDF spec(trum). An author can of
>>>> course still use full URIs to mix in terms from other vocabs.
>>>
>>> If you follow the conversion algorithm at
>>> <http://dev.w3.org/html5/md/#rdf>, you'll find that your markup yields
>>> something like these triples:
>>>
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>> _:n0 rdf:type <http://xmlns.com/foaf/0.1/Person> ;
>>>
>>>
><http://www.w3.org/1999/xhtml/microdata#http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FPerson
>>> %23%3Aname>
>>> "Alec Tronnick" ;
>>>
>>>
><http://www.w3.org/1999/xhtml/microdata#http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FPerson
>>> %23%3Aimg>
>>> <http://example.com/mypic.jpg> .
>>> <http://example.com/foo.html>
>>> <http://www.w3.org/1999/xhtml/microdata#item> _:n0 .
>>>
>>> Note especially that mypic.jpg is resolved, here I assumed the markup 
>>> was
>>> from http://example.com/foo.html.
>>>
>>> To produce the triples you wanted, use this markup:
>>>
>>> <div itemscope="" itemtype="http://xmlns.com/foaf/0.1/Person">
>>>   My name is <span itemprop="http://xmlns.com/foaf/0.1/name">Alec
>>> Tronnick</span>
>>>   <img itemprop="http://xmlns.com/foaf/0.1/img" src="mypic.jpg" alt=""  
>>> />
>>> </div>
>>>
>>> As you can see, microdata has no prefix notation. To save yourself some
>>> typing, use this:
>>>
>>> <div itemscope itemtype="http://microformats.org/profile/hcard">
>>>   My name is <span itemprop="fn">Alec Tronnick</span>
>>>   <img itemprop="photo" src="mypic.jpg" alt="">
>>> </div>
>>>
>>> If you actually wanted to use FOAF or care very much about the exact
>>> triples, then the OWL needed to map the above to vCard RDF shouldn't be
>>> very tricky, and I assume the relationship between FOAF and that is
>>> already pretty clear.
>>>
>>> --
>>> Philip Jägenstedt
>>> Core Developer
>>> Opera Software
>>>
>>
>>
>
>
>-- 
>Philip Jägenstedt
>Core Developer
>Opera Software
>
Received on Thursday, 21 January 2010 21:36:56 UTC