Re: Mapping Microdata to RDF

Hi Gregg,

On 10 Oct 2011, at 06:42, Gregg Kellogg wrote:
> On Oct 8, 2011, at 2:19 PM, Jeni Tennison wrote:
>> As a general point, in the same way as for the document base URI as discussed above, I think we might do better to base the microdata -> RDF mapping on the microdata/HTML5 DOM API rather than on the HTML5 syntax. For example, refer to the item element's .properties rather than breaking apart the itemprop attribute. If the API doesn't provide enough information to create reasonable RDF, then we will need to raise bugs on it.
> 
> Interesting that Hixie didn't do this himself. I'll tackle that on another pass.

I guess that since he would have been editing both the core spec and the mapping at the same time he didn't feel he needed to. We're doing something more layered here.

>> A few more specific things:
>> 
>>  * the time element can't have a duration value [6]
> 
> Interesting, xsd:duration is used in schema.org examples, and there's some discussion in the WHATWG wiki [7], but it was never pushed forward. I wonder if we should file a bug, it's necessary for Recipe use cases, and there's no other way to get a typed literal into the spec.

I think that this relates to Bug 13240: Consider replacing <time> with <data> [8]. We need to keep an eye on this bug as it impacts on the mapping to RDF.

One thing that might be useful to help the discussion around that bug would be to list the datatypes that are actually being used in the wild with RDFa currently. There are also a bunch of other datatypes being used within the schema.org examples, as you say. I wonder if anyone has access to a corpus of content that could be used to research that?

>>  * I'm not sure we should be ignoring properties that are neither absolute URIs nor on a typed item; perhaps we should be constructing URIs for them that look like {document base URI}#{property}?
> 
> If there is an itemtype, a property value should either by an absolute URI, or something that is appended to the type namespace. The issues about lexical form of that are left to the base Microdata spec.

I suspect that we may be talking at cross purposes. What I'm saying is that if someone has in their page something like:

  <span itemscope><span itemprop="name">Gregg Kellogg</span></span>

(not nested within some other element with an itemtype on it) then it will produce the microdata:

    {
      "properties": {
        "name": [
          "Gregg Kellogg"
        ]
      }
    }

So rather than not producing any RDF at all (which is what I think step 6.2.1 is saying), I think it should produce:

  [] <#name> "Gregg Kellogg" .

where the base URI (which the #name will be concatenated to) is based on the document base URI.

>>  * it's not clear how the algorithm deals with properties whose values are URIs: do these become literal values or identify resources? (I think it should be the latter)
> 
> I'll make it clear that unless specifically identified as a typed literal, all values that are defined as absolute URIs are treated as URI references (if that's still the appropriate nomenclature). This should be clear from the property value section.

OK, thanks.

>>  * in step 3 of generating an RDF Collection, I think the object should be the blank node associated with the next element in the array rather than the next element in the array itself
> 
> Consider the following markup:
> 
> <div itemscope>
>   <span itemprop="http://purl.org/dc/terms/title">foo</span>
>   <span itemprop="http://purl.org/dc/terms/title">bar</span>
> </div>
> 
> I believe that this should produce the following:
> 
> [ dc:title ("foo" "bar") ] .

I agree.

> If I understand you correctly, it would produce the following:
> 
> [ dc:title "foo", ("bar") ] .
> 
> This would have no way to order "foo" relative to ("bar"), and so I believe would not be correct (IMHO).

I think I'm confused over the wording in step 3. We start with an array containing "foo" and "bar". What the text says is:

  1. Create a new array containing a blank node for every value in list

assume this is an array containing _:bn1 and _:bn2.

  2. For each pair of blank node and value from list the following triple is generated:

      subject: blank node
      predicate: http://www.w3.org/1999/02/22-rdf-syntax-ns#first
      object: value

so I now have:

  _:bn1 http://www.w3.org/1999/02/22-rdf-syntax-ns#first "foo" .
  _:bn2 http://www.w3.org/1999/02/22-rdf-syntax-ns#first "bar" .

  3. For each blank node in the array the following triple is generated:

      subject: blank node
      predicate: http://www.w3.org/1999/02/22-rdf-syntax-ns#rest
      object: next element in the array or, if that does not exist,
              http://www.w3.org/1999/02/22-rdf-syntax-ns#nil

The meaning "the array" in this step is ambiguous. When I read it, I assumed it was the initial property value array which contains "foo" and "bar". That would mean generating:

  _:bn1 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest "bar" .
  _:bn2 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest 
        http://www.w3.org/1999/02/22-rdf-syntax-ns#nil .

which is wrong (and what I was objecting to). What I think you mean is the array of blank nodes, which would mean generating:

  _:bn1 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest _:bn2 .
  _:bn2 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest 
        http://www.w3.org/1999/02/22-rdf-syntax-ns#nil .

Perhaps if you used the terms 'value array' and 'blank node array' consistently through the steps then it would avoid this confusion?

>> Having some examples would be really useful. Perhaps you can add links to them from the wiki page?
> 
> I'll add some examples to the ReSpec document, but we should also have a space for them on the wiki. We should probably turn [1] into a reference to the ReSpec document, discussion and examples.

Yes, sounds great :)

Thanks,

Jeni

>>> [1] http://www.w3.org/wiki/Mapping_Microdata_to_RDF
>>> [2] http://www.w3.org/TR/2011/WD-microdata-20110525
>>> [3] https://github.com/gkellogg/rdf-microdata
>> [4] http://dev.w3.org/html5/spec/Overview.html#document-base-url
>> [5] http://www.w3.org/Bugs/Public/show_bug.cgi?id=14233
>> [6] http://dev.w3.org/html5/spec/text-level-semantics.html#the-time-element
> [7] http://wiki.whatwg.org/wiki/Time_element#duration
[8] http://www.w3.org/Bugs/Public/show_bug.cgi?id=13240

-- 
Jeni Tennison
http://www.jenitennison.com

Received on Monday, 10 October 2011 09:24:14 UTC