Re: @itemid and URL properties in schema.org

Hi Peter,

On Nov 6, 2011, at 6:13 AM, Peter Mika wrote:

> Hi Gregg,
> 
> It is stated in the textual description that 'url', 'image' and in fact 
> many other properties (e.g. discussionURL of CreativeWork) must have a 
> value of type 'URL'. Microdata doesn't have a schema language, so there 
> is no way to formalize this constraint. In OWL, the only way to express 
> this constraint is to use a datatype-property: using an object property 
> would allow any resource (including bnodes) as value.

If a goal is to transform the data expressed in microdata into RDF, and you would like to consider the *url properties are non-literal resources, then you really need to stick with an owl:objectProperty model. As you know, literals, even typed literals that are xsd:anyURI, for example, can't be used in the subject position.

In the context of microdata, the attributes taking a URL will naturally create a resource in RDF. In fact, HTML5/microdata don't restrict you from using an IRI, but will transform the IRI into a URL. Using a textual attribute, such as @context or the element textual value, there is no content model imposed by microdata. I'm not exactly sure what they'll do with a URN.

It seems to me that any constraint schema.org would like to place on *url property values, if you want to treat them as RDF, is something that needs to be done at a meta-level beyond processing. For example, you could create a SPARQL query to cast URI references to strings and perform lexical analysis to determine if they're valid in your content model.

Assuming that you really don't want to go to the extent of validating input, relying on publishing rules in the schema.org documentation seems like the right way to go.

Does this make sense?

Gregg

> Do I see this wrong?
> 
> Thanks,
> Peter
> 
> p.s. It's a different matter that the particular syntax (HTML5 + 
> microdata) limits the values of particular attributes to URLs.
> 
> 
> On 11/5/11 8:03 PM, Gregg Kellogg wrote:
>> On Nov 5, 2011, at 11:29 AM, Peter Mika wrote:
>> 
>>> Hi Jeni,
>>> 
>>>> This is part of the point of my posting about this :) We have a problem here, in that the mapping of properties typed URL to literal values in the schema.org OWL ontology clashes with the assumption in the microdata-to-RDF mapping that we've been developing [1], which assumes that any @href, @src etc provides the identifier of a resource, giving the property an object value rather than a literal value.
>>> Thanks, I haven't seen this mapping yet. I'm happy to change the
>>> generated OWL file to bring it in line with this mapping.
>>> 
>>> One technical point: this new mapping seems to consistently talk about
>>> URI references, while the microdata spec talks about URLs. Further, if I
>>> say in the OWL ontology that 'url', 'image' etc. are object-properties,
>>> then the value can be any URI. However, the intent is to restrict the
>>> publisher to providing URLs. Am I nitpicking?
>> The spec does describe URI references, and these might not be valid for schema.org, so this should be stated in your vocabulary (although why HTML5 would not allow international identifiers/locators is another discussion). Also, even though the discussion is on URI references instead of URLs, the HTML5 content model does restricted @href, @src and @data to be URLs. The microdata spec restricts @itemid to be a URL, but not @itemtype or @itemprop (as they are not resolved WRT document base). As some vocabularies may, in fact, want to use IRIs, adding registry info that allows properties to take on these ranges is a way to allow for greater expression of internationalized identifiers using microdata.
>> 
>> Note the recently raised ISSUE-4 [1], which describes a means of supporting both URIs (could be IRIs) and to use the schema 'url' property as the subject.
>> 
>> Gregg
>> 
>> [1] http://www.w3.org/2011/htmldata/track/issues/4
>> 
>>> Thanks,
>>> Peter
>>> 
> 

Received on Sunday, 6 November 2011 20:42:44 UTC