Re: htmldata-ISSUE-1 (Microdata Vocabulary): Vocabulary specific parsing for Microdata from Gregg Kellogg on 2011-10-21 (public-html-data-tf@w3.org from October 2011)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Fri, 21 Oct 2011 16:48:23 -0400
To: Gregg Kellogg <gregg@kellogg-assoc.com>
CC: Jeni Tennison <jeni@jenitennison.com>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-ID: <6F7CD981-1A16-4B8D-A3CA-A7D9DE1D7633@greggkellogg.net>

I updated the wiki [1] with some options for URI generation schemes.

Gregg

[1] http://www.w3.org/wiki/Mapping_Microdata_to_RDF

On Oct 21, 2011, at 12:21 PM, Gregg Kellogg wrote:

On Oct 21, 2011, at 7:02 AM, Jeni Tennison wrote:

Gregg, all,

[snip]

We should also remind ourselves that our goal within the TF is not to produce a finished specification but to provide something that others could take forward to Recommendation. We should make sure that we capture options, rationale and recommendations, but we do not have to make any final decisions.

We have a range of different options about how processors determine what mapping to use:

1. all processors use the same (default) mapping for all vocabularies
2. all processors use a default mapping for unknown vocabularies and a customised mapping for known vocabularies where the known vocabulary mappings are:
a. a pre-defined set of popular vocabularies
b. drawn from a registry
c. determined by resolving the vocabulary's schema
3. different processors have different sets of mappings and must specify how they are set
a. all processors have the same default mapping for unknown vocabularies
b. processors must also specify what default mapping they use

I propose that we document these possibilities as an editorial note within the document and have a straw poll about which method we recommend and rule out. I will start a separate thread to do that.

At this point, I think the way to do this might be to define these parameters and place them in the initial evaluation context. Then we can describe how different choices might affect these parameters for any given item. A future group can then make choices by eliminating or fixing some parameters or processing steps.

Whichever option is chosen, we need to have a complete list of the things about a vocabulary that a processor needs to know (ie an environment) in order to generate "natural RDF" from microdata; this includes the ranges of properties, how property URIs are derived from the type + defined property name, and error-handling behaviour. It would help if the microdata->RDF spec were written in terms of using environment properties.

Agreed, I'll take this direction in the next spec update.

There are some items on that list where we need to specify the options in detail. For example:

* three(?) possible methods of generating a URI from a type + defined property name:
* the "natural RDF" mapping currently defined in Gregg's spec
* the type#property mapping
* a variant on Hixie's mapping

An processor parameter affecting the initial evaluation context, possibly modified along the way by options for configured vocabularies or domain/range information from the vocabulary.

* four(?) possible ways of handling multi-valued properties:
* separate triples
* rdf:List values
* rdf:Seq values
* Ordered Lists

The first two are useful, I don't really see value in rdf:Seq or Ordered Lists at this point, but it might interesting documenting them anyway.

If we go for 1, 2 or 3a for how processors determine how to map items of a particular type, we also have to define what the default mapping should be. We can't do that until as have an absolute list of the things that a mapping needs to define and what the options are for each of those, so we'll defer this until we have that list.

If we go for 2 or 3 then we probably will want two other resources:

* as Gregg suggested, the mappings for a set of popular vocabularies (probably those whose prefixes are built-in to RDFa), probably in a separate document, wiki or registry

Machine readable, so that they can be used to define the evaluation context, much as RDFa does for it's initial evaluation context.

4. as Martin suggested, a small vocabulary that enables vocabulary owners to describe the mapping for their vocabulary within an RDFS schema or OWL ontology

This would be much easier to specify than an OWL description, although it could probably be derived from one, as could the other direction. Something like the following:

<http://schema.org/> a :Vocabulary;
:propertyURIscheme :slashHash;
:property schema:Thing;
:class schema:Thing .
schema:Thing a rdfs:Class .
schema:name a rdfs:Property;
rdfs:range rdfs:PlainLiteral;
rdfs:domain schema:Thing;
:maxCardinality 1 .

<http://microformats.org/profile/hcard> a :Vocabulary
:propertyURIscheme :contextual .

Whether these need to be done within this TF, I'm not sure. They are both probably useful anyway as examples of (a) what vocabulary mappings need to look like and (b) for expressing them in a machine-readable way.

Gregg,

Does that give you a way forward? Could you write up either in the spec or on the wiki:

* the list of the things that needs to be known to map microdata to RDF
* the options for the type+property name and list value handling

I'm on it.

Gregg

so that we have a clear record of them to refer to? (I'm sure if anyone wants to help Gregg he'd appreciate it.)

I will create a separate thread on the straw poll.

Thank you,

Jeni
--
Jeni Tennison
http://www.jenitennison.com<http://www.jenitennison.com/>

Received on Friday, 21 October 2011 20:49:08 UTC