W3C home > Mailing lists > Public > public-html-data-tf@w3.org > October 2011

Re: Microdata to RDF: First Editor's Draft (ACTION-6)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Sun, 16 Oct 2011 00:56:56 -0400
To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
CC: "public-html-data-tf@w3.org" <public-html-data-tf@w3.org>
Message-ID: <6CEFCF79-B253-4826-94FD-D6CDC6EB27A7@kellogg-assoc.com>
On Oct 15, 2011, at 12:52 PM, "Martin Hepp" <martin.hepp@ebusiness-unibw.org> wrote:

> Hi Greg, all:
> 
> With respect to
> 
>   https://dvcs.w3.org/hg/htmldata/raw-file/24af1cde0da1/microdata-rdf/index.html
> 
> I strongly suggest to define a "Compatibility parsing method" that produces RDF data as close as possible to the RDF data that the same vocabulary / data patterns would yield from RDFa.

Interesting idea about having a modal processor, but I'd like to see if we can avoid it, unless it could be defined in the markup itself.

> For instance, it should
> 
> 1. produce property URIs by attaching the local property name to the base URI of the vocabulary, and not to the URI of the itemtype or document,

This would require. Means of specifying a vocabulary. In the absence of a means of doing this using defined attributes, we chose to infer the vocabulary from the type. Admittedly not ideal, and as Ivan has noted when this was suggested for RDFa, it conflates to separate concepts. FWIW, I would support having Microdata honor the @vocab attribute and inference rules from RDFa, but this would require action from the HTML WG.

Did you have some idea for establishing the vocabulary? Otherwise, inferring the vocabulary from the type URI seems like the best option to me.

> 2. try to create proper typed RDF literals if the vocabulary defines a single xsd datatype for a datatype property,

Requiring a processor to read a vocabulary to discover rdf:range assertions was considered and rejected by the RDFa WG, because of the burden it places on a processor (difficult to do in JavaScript due to same domain issues).

This could be alleviated if we white listed a limited set of vocabularies, but that might not scale well and would require continuous action to keep current.

As an alternative, I've considered a separate datatype inference pass that could yield a graph with datatyped literals from one with plain literals only. Would that be useful?

If the community feels that the situation is different, and that a processor MUST perform datatype entailment on referenced vocabularies at parse time then we could specify this, but I think we need to hear from those who have considered and rejected this approach before.

We should definitely create an issue to track this. I'll take care of that.

> 3. suppress the generation of RDF collections and other meta-data patterns that make the data break for SPARQL queries that would work for the same pattern in RDFa.

I'm quite sympathetic to this view. Using collections was an attempt to ensure that the semantic interpretation of Microdata was consistent between RDF and JSON conversions, but I also question it's value for RDF.

This also requires an issue.

> For instance, the attached two examples should result in roughly the same triples.
> 
> One idea for implementing this is to define an owl:AnnotationProperty for owl:Ontology that sets the Microdata parsing mode.

Interesting idea, this would make processing for a given vocabulary unambiguous, but it would also require that the vocabulary be processed when parsing.

> Best
> 
> Martin

Thanks very much for your constructive feedback.

Gregg

P.S., I also note that your RDFa example assumes some datatype inference, doing this through post-processing would satisfy both Microdata and RDFa use cases.

> a) Microdata
> <div itemscope itemtype="http://purl.org/goodrelations/v1#Offering" itemid="#offer">
>  <div itemprop="name">Hepp Personal SCSI Controller Card</div>
>  <div itemprop="description">The Hepp Personal SCSI is a 16-bit 
> add-on card that allows attaching up to seven SCSI devices to your computer.</div>
>  <link itemprop="hasBusinessFunction" 
>     href="http://purl.org/goodrelations/v1#Sell" />
>  <div itemscope itemprop="hasPriceSpecification" 
>       itemtype="http://purl.org/goodrelations/v1#UnitPriceSpecification">Price: 
>    <meta itemprop="hasCurrency" content="USD">$
>    <span itemprop="hasCurrencyValue">99.99</span>
>    <time itemprop="validThrough" datetime="2012-11-30T23:59:59Z"></time> 
>  </div>
>  Condition: <div itemprop="condition">used</div>
>  EAN/UPC: <span itemprop="hasEAN_UCC-13">1234567890123</span>
>  MPN: <span itemprop="hasMPN">PSCSI</span>
>  Article No. <span itemprop="hasStockKeepingUnit">123-456</span>
>  Availability: <span itemscope itemprop="hasInventoryLevel" 
>       itemtype="http://purl.org/goodrelations/v1#QuantitativeValue">
>    <meta property="hasMinValueFloat" content="1.0">In-stock
>  </span>
> 
>  <img itemprop="http://schema.org/image" src="http://example.com/images/pscsi.jpg" 
>       alt="text" />
>  <link itemprop="http://xmlns.com/foaf/0.1/page" href="http://example.com/products/pscsi" />
> </div>
> 
> 
> b) RDFa
> 
> <div typeof="gr:Offering" about="#offer">
>  <div property="gr:name">Hepp Personal SCSI Controller Card</div>
>  <div property="gr:description">The Hepp Personal SCSI is a 16-bit add-on card that allows 
> attaching up to seven SCSI devices to your computer.</div>
>  <div rel="gr:hasBusinessFunction" 
>     resource="http://purl.org/goodrelations/v1#Sell"></div>
>  <div rel="gr:hasPriceSpecification">
>    <div typeof="gr:UnitPriceSpecification">Price: 
>     <span property="gr:hasCurrency" content="USD">$</span>
>     <span property="gr:hasCurrencyValue">99.99</span>
>     <div property="gr:validThrough" datatype="xsd:datetime" 
>          content="2012-11-30T23:59:59Z"></div> 
>    </div>
>  </div>
>  Condition: <div property="gr:condition>used</div>
>  EAN/UPC: <span property="gr:hasEAN_UCC-13 datatype="xsd:string">1234567890123</span>
>  MPN: <span property="gr:hasMPN datatype="xsd:string">PSCSI</span>
>  Article No. <span property="gr:hasStockKeepingUnit datatype="xsd:string">123-456</span>
>  Availability: <div rel="gr:hasInventoryLevel"> 
>       <div typeof="gr:QuantitativeValue">
>         <div property="gr:hasMinValueFloat" content="1.0" datatype="xsd:float">In-stock</div>
>       </div>
>  </div>
>  <div rel="schema:image">
>    <img src="http://example.com/images/pscsi.jpg" alt="text" />
>  </div>
>  <div rel="foaf:page" resource="http://example.com/products/pscsi"></div>
> </div>
Received on Sunday, 16 October 2011 04:57:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 16 October 2011 04:57:41 GMT