Re: Updated Microdata to RDF draft

Hi,

Gregg, That looks really great, thank you so much for your hard work on this (and sorry for not being around this week -- combination of half term holiday and a horrible head cold have kept me quiet).

I need to look through in more detail, but there are four technical things that I think we should bottom out.


= Other Registry Information =

Currently the "registry" contains two properties that control processing into RDF:

  * propertyURI that determines how property URIs are generated
  * multipleValues that determines how multiple values are handled

From the other discussions, it's clear that to get the most valuable RDF, it would also be good if the processor could:

  * label values with datatypes
  * (possibly) determine whether to interpret URI values as references to other resources or as literals

Should this information be part of the registry, or something that processors might do in post-processing?


= Context Property URIs =

IIRC, there were four issues with the mapping that is now the 'context' option:

  1. it didn't match the mapping you would naturally use for RDF vocabularies
  2. the resulting URIs couldn't easily be serialised in RDF/XML
  3. the resulting URIs couldn't be resolved to anything useful
  4. the resulting URIs were ugly

The first issue is addressed by providing other mappings that tools can be configured to use for known vocabularies.

The other three issues are still pertinent, though. Under that mapping, the type http://microformats.org/profile/hcard plus property 'fn' maps to:

  http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23:fn

and http://microformats.org/profile/hcard plus property chain 'foo', 'fn' maps to:

  http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23foo%20fn

The URIs are going to end up being ugly anyway. To avoid clashing with property URIs that other people (ie the vocabulary owners) might define, they have to be within a domain owned by the people defining the mapping (ie w3.org), so that means they have to be constructed by somehow merging the itemtype with that domain.

Use within RDF/XML requires that the local part of the name is something that can be used as an XML element name, which means no colons or spaces as separators within local names. This particularly impacts on the compound names such as the second example above, if you want to avoid having to have multiple namespace declarations for nested properties that don't have their own type, such as:

  xmlns:hcard="http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23:"
  xmlns:hcard.foo="http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23foo%20"

I wonder if we could find a mapping that addresses these issues. What I'd suggest is:

  * using a shorter base URI: we can ask Ivan perhaps if there would be something more appropriate at which to put a web page or script
  * using a query string rather than a fragment identifier: this gives more flexibility in how the server responds to the URI
  * using dots as separators in property names: these aren't allowed in the short-name properties, and don't have to be escaped in URIs

This would give us something like:

  http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard%23&prop=fn
  http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard%23&prop=foo.fn

Note that the type URI and property names are (individually) URI encoded. This would enable a namespace declaration such as:

  xmlns:hcard="http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard%23&prop="

to be used with <hcard:fn> and <hcard:foo.fn> elements.

Would this be an improvement? Are there any snags?


= Types =

We've separately been talking about the possibility of using something other than rdf:type, such as http://schema.org/type to provide multiple types for microdata items. If that were to happen, it would be good for the microdata RDF mapping to give that property special treatment. It would be good to have at least a Note to that effect.


= Language =

Hixie has said [1] that microdata vocabularies need to provide language information explicitly rather than using the HTML language. If that's how the bug is resolved, we need to not create language-tagged literals through the mapping unless it's done through a common type such as http://schema.org/LanguageString or something [2]. We should have at least a Note in the spec that flags this as an issue.

Cheers,

Jeni

[1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=14470#c1
[2] http://lists.w3.org/Archives/Public/public-html-data-tf/2011Oct/0251.html
-- 
Jeni Tennison
http://www.jenitennison.com

Received on Friday, 28 October 2011 20:09:22 UTC