W3C home > Mailing lists > Public > public-vocabs@w3.org > November 2011

Re: @itemid and URL properties in schema.org

From: Jason Douglas <jasondouglas@google.com>
Date: Fri, 4 Nov 2011 12:31:27 -0700
Message-ID: <CAEiKvUCVQ+=Zjv857MqyexoxWhdjD5wPCRZdBcxGUvj23KoVfg@mail.gmail.com>
To: Jeni Tennison <jeni@jenitennison.com>
Cc: public-vocabs@w3.org, Guha <guha@google.com>, Dan Brickley <danbri@danbri.org>, HTML Data Task Force WG <public-html-data-tf@w3.org>
On Fri, Nov 4, 2011 at 11:48 AM, Jeni Tennison <jeni@jenitennison.com>wrote:

> Hi schema.orgers
>
> I'd like to get some clarification around the use of URLs within
> schema.org, particularly help the HTML Data TF frame guidelines around
> microdata and RDFa usage of the vocabulary. I'm afraid there are rather a
> lot of questions here...
>
> First, am I correct that schema.org does not use the @itemid attribute? I
> can't see it used in any examples, but I haven't found a clear statement
> about its use either. The microdata spec states that the vocabulary
> determines both whether an item can have an id and what the meaning of that
> id is (eg whether items with the same id can be considered the same item by
> consumers). To avoid doubt, it would be helpful for schema.org to
> explicitly state that @itemid is not allowed if that's the case.
>
> Second, it looks as though the 'url' property acts as a kind of identifier
> for an item. It would be helpful for other vocabulary authors to understand
> why schema.org uses the 'url' property rather than @itemid, as the
> reasoning behind that design would probably apply elsewhere as well. If two
> items have the same value for their 'url' property (within a page or across
> pages), should they be considered to be the same item by consumers?
>

I agree that this is a very important and currently under-specified issue.

There were two considerations that led to @itemid not being included in the
original schema.org definition:

   - As few concepts as possible.  As Guha commented on a previous thread,
   we've generally found that the more syntactical options given to
   webmasters, the higher the error rate.  Not including itemid in the docs
   meant we only had to talk about itemscope, itemtype, itemprop (and itemref).
   - itemid as an attribute is a bit awkward in real-world markup as
   there's no way to re-use the URL value of an existing anchor, which is a
   very common use case.  You end up having to repeat the URL.

So using the url itemprop as an alternative to itemid was primarily viewed
as an option to improve usability.  That said, @itemid is not unsupported,
just undocumented.  If you specify an itemid it should still get parsed.

It turns out, however, that because we also accept (itemid) URLs as values
for any property that expects an object, this doesn't happen nearly as
often as I think drove the initial concern.  The primary place where it
might happen is that if you have a "stub" of an item -- for example in a
search results page or a category browse page -- that links to the
full-page for the item, using Thing/url results in less incremental markup
than using @itemid (don't have to introduce any additional wrapping tags).

To complicate things further, I think the distinction between the canonical
URL for an item on an individual website and "sameAs" URLs that establish
identity equivalence across websites isn't really discussed.  If people are
trying to use schema.org markup in more LOD-ish use cases, this is a pretty
major short-coming.

So to throw a strawman out there, maybe we could:

   1. State on schema.org that Thing/url is equivalent to itemid and either
   is accepted.
   2. Add Thing/sameAs for stating item equivalences (via URLs) across data
   sources/sites.


-jason


>
> Third, there are a number of properties in schema.org that look like they
> can take a URL. There are discrepancies between the schema.org ontology
> [1] and the schema.org microdata description [2] (which I believe is
> consistent with what appears on the pages themselves) and the examples.
>
>                     OWL          OWL desc       MD            example
> url                  Literal      -              URL
> @href/@content
> contentURL           Literal      -              URL
> @src/@content
> embedURL             Literal      -              URL           -
> audio                AudioObject  object or URL  AudioObject   @href
> video                VideoObject  object or URL  VideoObject   VideoObject
> image                URL          -              URL           @href/@src
> acceptsReservations  Literal      Yes/No or URL  Text/URL      -
> menu                 Literal      menu or URL    Text/URL      -
> breadcrumb           Literal      set of links   Text          HTML
> maps                 Literal      URL            URL           -
> significantLinks     URL          -              URL           -
> discussionUrl        -            -              URL           -
> publishingPrinciples -            -              URL           -
> thumbnailUrl         -            -              URL           -
> replyToUrl           -            -              URL           -
>
> The OWL ontology is clearly out of date with the current spec, so perhaps
> it's just best to ignore what it says.
>
> The latter four ('discussionUrl', 'publishingPrinciples', 'thumbnailUrl'
> and 'replyToUrl') have obviously been added recently; they follow a
> different naming scheme (*Url rather than *URL). These, 'maps' and
> 'significantLinks' are all described as links to another document; there
> aren't examples of any of them on the site.
>
> 'breadcrumb' is described as "A set of links that can help a user
> understand and navigate a website hierarchy." The sole example of it is:
>
>  <div itemprop="breadcrumb">
>    <a href="category/books.html">Books</a> >
>    <a href="category/books-literature.html">Literature & Fiction</a> >
>    <a href="category/books-classics">Classics</a>
>  </div>
>
> Microdata processing dictates that the value of the 'breadcrumb' property
> in this case is (normalising whitespace for brevity) "Books Literature &
> Fiction Classics". The HTML content of this property isn't preserved by
> microdata processing, so it isn't actually a set of links but rather a
> textual description of the context of the page.
>
> Perhaps it would be better for this property to be called 'breadcrumbs'
> (pluralised to indicate the expectation that there will be more than one
> value) which could be given a type of URL, in which case the example would
> be rewritten as:
>
>  <div>
>    <a itemprop="breadcrumbs" href="category/books.html">Books</a> >
>    <a itemprop="breadcrumbs"
>       href="category/books-literature.html">Literature & Fiction</a> >
>    <a itemprop="breadcrumbs" href="category/books-classics">Classics</a>
>  </div>
>
> resulting in (assuming a base URI of http://books.example.org) the
> 'breadcrumbs' property having the values:
>
>  ["http://books.example.org/category/books.html",
>   "http://books.example.org/category/books-literature.html",
>   "http://books.example.org/category/books-classics"]
>
> Alternatively, it might be that you do want to retain the HTML content of
> this property, in which case it would be good to make a comment on the bug
> [3] about supporting structured HTML content for microdata values, citing
> this use case. Let me know if you want me to do that on your behalf.
>
> There are discrepancies in the 'audio', 'video' and 'image' properties.
> All three have within schema.org a related object type (AudioObject,
> VideoObject, ImageObject), but only 'audio' and 'video' are defined to take
> object values, while 'image' takes a URL. But then, in the examples, the
> 'audio' property takes a URL rather than an AudioObject. This makes me
> think that schema.org might allow a property that takes an object to be
> given a URL instead, in which case perhaps it's treated the same as
> providing an object whose 'url' property is that URL? So for example:
>
>  <a href="foo-fighters-rope-play.html" itemprop="audio">Play</a>
>
> is equivalent to:
>
>  <span itemprop="audio" itemscope itemtype="http://schema.org/AudioObject
> ">
>    <a href="foo-fighters-rope-play.html" itemprop="url">Play</a>
>  </span>
>
> Is that the case?
>
> Finally, there are a few examples where the @content attribute is being
> used instead of the @href or @src attribute to provide a URL, for example
> the 'url' property in:
>
>  <div itemprop="tracks" itemscope itemtype="
> http://schema.org/MusicRecording">
>    <span itemprop="name">Rope</span>
>    <meta itemprop="url" content ="foo-fighters-rope.html">
>    ...
>  </div>
>
> This isn't conformant with the microdata spec, which states:
>
>  The URL property elements are the a, area, audio, embed, iframe, img,
>  link, object, source, track, and video elements.
>
>  If a property's value, as defined by the property's definition, is an
>  absolute URL, the property must be specified using a URL property
>  element.
>
> I imagine that these are bugs in the documentation rather than anything
> more, in which case it would be good to correct them if possible. On the
> other hand, if schema.org URL properties are meant to be resolved even
> when they don't appear in a URL property element, then it would be good to
> have that documented.
>
> Thanks, and sorry for the rather long email!
>
> Cheers,
>
> Jeni
>
> P.S. These questions arose from mails by Jayson Lorenzen [4] about RDF
> generated from schema.org microdata
>
> [1] http://schema.org/docs/schemaorg.owl
> [2] http://schema.org/docs/full_md.html
> [3] http://dev.w3.org/html5/md/Overview.html#url-property-elements
> [4]
> http://lists.w3.org/Archives/Public/public-html-data-tf/2011Oct/0197.html
> --
> Jeni Tennison
> http://www.jenitennison.com
>
>
>
Received on Friday, 4 November 2011 19:37:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 22 May 2012 06:48:57 GMT