W3C home > Mailing lists > Public > public-html@w3.org > January 2010

Re: Microdata to RDF conversion

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 29 Jan 2010 09:29:54 +0000 (UTC)
To: Philip Jägenstedt <philipj@opera.com>
Cc: HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.64.1001280113100.22027@ps20323.dreamhostps.com>
On Wed, 20 Jan 2010, Philip Jägenstedt wrote:
> On Tue, 19 Jan 2010 09:22:25 +0100, Ian Hickson <ian@hixie.ch> wrote:
> > On Sun, 17 Jan 2010, Philip Jägenstedt wrote:
> > > This algorithm uses the http://purl.org/dc/terms/ namespace, while 
> > > the mapping at <http://dev.w3.org/html5/md/#conversion-to-rdf> uses 
> > > the http://purl.org/dc/elements/1.1/ namespace. 
> > > http://purl.org/dc/terms/ seems to be the canonical namespace at 
> > > this time, so I suggest just using that.
> > 
> > Wait, what? I'm confused. What exactly are you saying should change?
> 
> The works vocabulary maps itemprop="title" to 
> http://purl.org/dc/elements/1.1/title while the algorithm for converting 
> a document to RDF maps <title>foo</title> to 
> http://purl.org/dc/terms/title. Unless there's some specific reason for 
> this, use http://purl.org/dc/terms/title in both cases, as 
> /elements/1.1/ is apparently a legacy namespace (see 
> http://dublincore.org/documents/dcmi-terms/#H3)

Fixed.


> > > There's an issue with how vocabularies that use subitems are 
> > > currently handled. In short, triples are only generated if the item 
> > > either has a type which is an absolute URL or if the item property 
> > > is an absolute URL. This prevents site-private data from being 
> > > exported as RDF, which is a good thing. However, for vocabularies 
> > > which have an item type for the top-level item but not for subitems 
> > > (which seems quite unnecessary) this means that no triples are 
> > > generated for the subitems, even though the subitem reasonably be 
> > > considered to be using the same vocabulary as the typed top-level 
> > > item. To illustrate the point, here's the output of the RDF 
> > > extraction (as Turtle) from the Jack Bauer example if the current 
> > > spec is honored: [...]
> > > 
> > > As you see, the structured subitems org, adr, etc just point to 
> > > blank nodes with no further triples for those nodes. My fix is to 
> > > pass on the type of the parent item when generating triples for 
> > > subitems as a default, which is overridden if the subitem defines 
> > > its own type (as e.g. agent does in the above). I think this is 
> > > sensible and it certainly produces a more complete RDF graph: [...]
> > 
> > That works if you know the vocabulary and thus know that the nested 
> > subitem is from that vocabulary, but it seems highly suspect in the 
> > case where you don't know that. Also, consider:
> > 
> >   <div itemscope itemtype="http://example.com/person">
> >    <p itemprop="school" itemscope>
> >     I go to school in the <span itemprop="class">middle</span> classroom.
> >    </p>
> >    <p itemprop="demographics" itemscope>
> >     I am <span itemprop="class">middle</span>-classed.
> >    </p>
> >   </div>
> > 
> > (A bit contrived, but you get the idea.) It would be wrong to use the 
> > same predicate for both itemprep="class" cases.
> 
> Since no itemtype is used for itemprop="school", 
> http://example.com/person must define this as part of its vocabulary, 
> unless the above is an example of invalid markup. Since it's all one 
> vocabulary, using the same prefix for the RDF predicates seems quite 
> logical.

I really don't think they're the same predicate, but I agree that we need 
to expose these triples somehow. Consider:

   <div itemscope itemtype="http://example.com/a" itemref="x"></div>
   <div itemscope itemtype="http://example.com/b" itemref="x"></div>
   <div id="x"> <p itemprop="q" itemscope> <span itemprop="r">s</span> </p> </div>

Right now this generates four blank nodes, which is a bug, it should 
generate three. But if we generate three, then what predicate do we use 
for the itemprop?

I ended up going with kind of a compromise solution. The above generates 
_four_ triples, but _three_ nodes:

   @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
   @prefix eg: <http://www.w3.org/1999/xhtml/microdata#http%3A%2F%2Fexample.com%2F> .

   _:n0 rdf:type <http://example.com/a> ;
        <eg:a%23%3Aq> _:n2 .

   _:n1 rdf:type <http://example.com/b> ;
        <eg:b%23%3Aq> _:n2 .

   _:n2 <eg:a%23%3Aq%20r> "s" ;
        <eg:b%23%3Aq%20r> "s" .

Basically, instead of using "type:name", I used "type:parent-name name", 
where the space character is another character that, like ":", cannot 
appear in "name" and thus is usable here without making anything 
ambiguous. Hopefully this solves the problem relatively neatly, if not in 
the most performance-optimal way.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 29 January 2010 09:30:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:00 GMT