Re: Updated Microdata to RDF draft from Gregg Kellogg on 2011-10-28 (public-html-data-tf@w3.org from October 2011)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Fri, 28 Oct 2011 17:39:46 -0400
To: Jeni Tennison <jeni@jenitennison.com>
CC: Gregg Kellogg <gregg@kellogg-assoc.com>, HTML Data Task Force WG <public-html-data-tf@w3.org>, Richard Cyganiak <richard@cyganiak.de>
Message-ID: <8FECFB9C-D00A-45B6-889F-64C7597B7F0E@greggkellogg.net>
On Oct 28, 2011, at 1:08 PM, Jeni Tennison wrote:

> Hi,
> 
> Gregg, That looks really great, thank you so much for your hard work on this (and sorry for not being around this week -- combination of half term holiday and a horrible head cold have kept me quiet).
> 
> I need to look through in more detail, but there are four technical things that I think we should bottom out.
> 
> 
> = Other Registry Information =
> 
> Currently the "registry" contains two properties that control processing into RDF:
> 
>  * propertyURI that determines how property URIs are generated
>  * multipleValues that determines how multiple values are handled
> 
> From the other discussions, it's clear that to get the most valuable RDF, it would also be good if the processor could:
> 
>  * label values with datatypes

This could be supported by adding more structure to the registry, for example adding a list of properties included in the registry and a datatype to apply to plain literals. However, this does add further complication. The situation is really no different than with RDFa, except that RDFa provides a means to set it explicitly with @datatype.

>  * (possibly) determine whether to interpret URI values as references to other resources or as literals

This can be done now with appropriate elements using @href, @src, or @data. If there's a need to coerce values that would otherwise be seen as literal to a URI ref, I missed it.

> Should this information be part of the registry, or something that processors might do in post-processing?

Both of these can be done in post-processing. For the most part, literal values can be coerced to the appropriate datatype in a consuming application. The exception being rdf:XMLLiteral, which is outside of the Microdata data model anyway.

> = Context Property URIs =
> 
> IIRC, there were four issues with the mapping that is now the 'context' option:
> 
>  1. it didn't match the mapping you would naturally use for RDF vocabularies
>  2. the resulting URIs couldn't easily be serialised in RDF/XML
>  3. the resulting URIs couldn't be resolved to anything useful
>  4. the resulting URIs were ugly
> 
> The first issue is addressed by providing other mappings that tools can be configured to use for known vocabularies.
> 
> The other three issues are still pertinent, though. Under that mapping, the type http://microformats.org/profile/hcard plus property 'fn' maps to:
> 
>  http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23:fn
> 
> and http://microformats.org/profile/hcard plus property chain 'foo', 'fn' maps to:
> 
>  http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23foo%20fn
> 
> The URIs are going to end up being ugly anyway. To avoid clashing with property URIs that other people (ie the vocabulary owners) might define, they have to be within a domain owned by the people defining the mapping (ie w3.org), so that means they have to be constructed by somehow merging the itemtype with that domain.
> 
> Use within RDF/XML requires that the local part of the name is something that can be used as an XML element name, which means no colons or spaces as separators within local names. This particularly impacts on the compound names such as the second example above, if you want to avoid having to have multiple namespace declarations for nested properties that don't have their own type, such as:
> 
>  xmlns:hcard="http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23:"
>  xmlns:hcard.foo="http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23foo%20"
> 
> I wonder if we could find a mapping that addresses these issues. What I'd suggest is:
> 
>  * using a shorter base URI: we can ask Ivan perhaps if there would be something more appropriate at which to put a web page or script
>  * using a query string rather than a fragment identifier: this gives more flexibility in how the server responds to the URI
>  * using dots as separators in property names: these aren't allowed in the short-name properties, and don't have to be escaped in URIs
> 
> This would give us something like:
> 
>  http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard%23&prop=fn
>  http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard%23&prop=foo.fn
> 
> Note that the type URI and property names are (individually) URI encoded. This would enable a namespace declaration such as:
> 
>  xmlns:hcard="http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard%23&prop="
> 
> to be used with <hcard:fn> and <hcard:foo.fn> elements.
> 
> Would this be an improvement? Are there any snags?

My opinion: I question the value of sticking with unique property URIs used by default for creating URIs from non-URI property names. However, if that is how the Microdata spec is to be interpreted, we should come up with something better. I simply put the previous version in as it had already been spec'd out and no reasonable alternative was given.

I think that this would be a good improvement, and would be fairly easy to document. However, by using URI query parameters, I think we can get rid of the %23 part as well, which would create URIs such as the following:

> http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard&prop=fn
> http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard&prop=foo.fn

(Mind you, this assumes that we don't define a more rational mapping of hcard to the _type_ URI scheme).

> = Types =
> 
> We've separately been talking about the possibility of using something other than rdf:type, such as http://schema.org/type to provide multiple types for microdata items. If that were to happen, it would be good for the microdata RDF mapping to give that property special treatment. It would be good to have at least a Note to that effect.

Yes, I've been waiting for this to resolve. I also didn't address Hixie's recent change to allow multiple URIs in @itemtype with the restrictions that they are all somehow related; this also leaves a corner-case where different values match different registry entries and choosing how to pick.If we were to support this, I'd suggest maintaining the lexical order and matching against the first entry only for the purposes of identifying a vocabulary and generation rules. If we adopt this, we should caution about the danger of mixing vocabulary spaces if URIs from conflicting vocabularies (or no vocabulary) are used.

IMO, using an @itemprop with a URI that is to be distinguished that's not controlled by W3C is an issue; I don't think we should be building in processing rules that are defined in terms of outside vocabularies. If we used a value other than rdf:type, presumably we'd actually want to generate an rdf:type triple, so why not simply use rdf:type?

> = Language =
> 
> Hixie has said [1] that microdata vocabularies need to provide language information explicitly rather than using the HTML language. If that's how the bug is resolved, we need to not create language-tagged literals through the mapping unless it's done through a common type such as http://schema.org/LanguageString or something [2]. We should have at least a Note in the spec that flags this as an issue.

I'll add a note, but I don't believe it is necessary to not tag literals with language, as I've said elsewhere [3]. It's really a vocabulary issue when using that vocabulary with Microdata if the vocabulary does not provide a provision for distinguishing different languages. The important thing is that the information is preserved in the JSON representation, which doesn't have the language representation.

The Microdata language handling issue [1] is still open, and could be resolved. It is certainly possible to create a JSON representation that includes language information, as JSON-LD does.

> Cheers,
> 
> Jeni
> 
> [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=14470#c1
> [2] http://lists.w3.org/Archives/Public/public-html-data-tf/2011Oct/0251.html
[3] http://lists.w3.org/Archives/Public/public-html-data-tf/2011Oct/0255.html
> -- 
> Jeni Tennison
> http://www.jenitennison.com
> 

I'll create some new issues to track each of these individually and reference from within the spec. I'll provide another spec update after I receive more feedback.

Gregg
Received on Friday, 28 October 2011 21:40:33 UTC