W3C home > Mailing lists > Public > public-html-data-tf@w3.org > October 2011

Re: Microdata itemid and src / href

From: Jayson Lorenzen <Jayson.Lorenzen@businesswire.com>
Date: Sat, 22 Oct 2011 14:04:28 -0700
Message-Id: <4EA2CD6B0200007E000A1E15@sfgwia1.businesswire.com>
To: <jeni@jenitennison.com>
Cc: <public-html-data-tf@w3.org>
Sorry to go on and on about this but I just thought (while driving, dangerous) that this situation is an interesting example of of a Vocabulary specific parser behaving differently than a generic parser (that does not know about the Vocabulary). Here is what I mean. Using a generic parser (like Mr. Kellog's) 

  <div itemscope="itemscope"  itemtype="http://schema.org/Organization" itemid="http://example.com">
    <meta itemprop="name" content="Example"/>
  </div>

   <a itemprop="myCompany" href="http://example.com">

 and 

  <div itemprop="myCompany" itemscope="itemscope"  itemtype="http://schema.org/Organization"
       itemid="http://example.com">
      <meta itemprop="name" content="Example"/>
   </div>

Produce the exact same RDF in a generic parser, but completely different results in the Google Rich Snippets Test tool. 

See my two test cases, "Version One" and "Version Two", below and their results in both parsers for more details.


*************** Version One ********************************

  <body itemscope="itemscope" itemtype="http://schema.org/NewsArticle" itemid="http://www.example.com/news/20110415123/">
      <div itemprop="articleBody">
        NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level
        in a year Thursday.... </div>
        </div>

    <div itemscope="itemscope"  itemtype="http://schema.org/Organization" itemid="http://businesswire.com">
      <meta itemprop="name" content="Business Wire"/>
    </div>
    
    <div>
    <a itemprop="copyrightHolder" href="http://www.businesswire.com">&#160;</a>
    </div>
  </body>

-----------------------RDF Distiller --------------------

<http://businesswire.com> a schema:Organization;
   schema:name "Business Wire" .

<http://www.example.com/news/20110415123/> a schema:NewsArticle;
   schema:articleBody """
        NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level
        in a year Thursday.... """;
   schema:copyrightHolder <http://www.businesswire.com> .

----------------------- Rich Snip Test tool ------------------

Item 
Id:http://www.example.com/news/20110415123/
Type: http://schema.org/newsarticle
articlebody = NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level in a year Thursday.... 
copyrightholder = http://www.businesswire.com/ 

Item 
Id:http://businesswire.com
Type: http://schema.org/organization
name = Business Wire 

********************************* end Version One **************************

**********************************Version Two ******************************

  <body itemscope="itemscope" itemtype="http://schema.org/NewsArticle" itemid="http://www.example.com/news/20110415123/">
      <div itemprop="articleBody">
        NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level
        in a year Thursday.... </div>
        </div>

    <div itemprop="copyrightHolder" itemscope="itemscope"
      itemtype="http://schema.org/Organization" itemid="http://businesswire.com">
      <meta itemprop="name" content="Business Wire"/>
    </div>

  </body>


-----------------------RDF Distiller --------------------

<http://businesswire.com> a schema:Organization;
   schema:name "Business Wire" .

<http://www.example.com/news/20110415123/> a schema:NewsArticle;
   schema:articleBody """
        NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level
        in a year Thursday.... """;
   schema:copyrightHolder <http://businesswire.com> .


----------------------- Rich Snip Test tool ------------------

Item 
Id:http://www.example.com/news/20110415123/
Type: http://schema.org/newsarticle
articlebody = NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level in a year Thursday.... 
copyrightholder = Item( 1 ) 

Item 1 
Id:http://businesswire.com
Type: http://schema.org/organization
name = Business Wire 


********************************* end Version Two **************************






Jayson Lorenzen
Senior Software Engineer
____________________________ 
B  U  S  I  N  E  S  S       W  I  R  E 
A Berkshire Hathaway Company
 
+1.415.986.4422, ext. 766 
+1.415.956.2609 (fax) 
www.BusinessWire.com
 
Business Wire/San Francisco 
44 Montgomery St. 39th Floor
San Francisco, CA 94104

>>> Jeni Tennison <jeni@jenitennison.com> 10/21/11 12:30 PM >>>
Jayson,

Thanks for writing about this, and particularly for the examples. I'm going to snip them out for brevity...

On 21 Oct 2011, at 18:48, Jayson Lorenzen wrote:
> When using the *src* or *href*, RDF distillers create a
> relation, and the resulting RDF (at least in Turtle) can look like an
> endless loop waiting to happen, or just an odd relation. Here is an
> endless loop example:

In RDF, it is absolutely fine to have a statement like:

  <http://businesswire.com> schema:url <http://businesswire.com> .

It's like an object having a property whose value is a pointer to that same object. The only endless loop would come if an application that traversed the data didn't account for potential loops, which is more to do with the application than the data.

> Changing from using <a> , to a hidden <meta> tag 
[snip]
> produces a URL property that is just a text string. 
[snip]
> For properties that require a URL (like the contentURL from
> Schema.org), which is correct?

The microdata spec [1] says:

  "If a property's value, as defined by the property's definition, 
   is an absolute URL, the property must be specified using a URL 
   property element."

The reason for this constraint is that values within the href or src attribute will be resolved into absolute URLs, whereas though that are put in a content attribute or just embedded within the content of the page will not. So if you had (assuming we're on the businesswire.com site):

  <a itemprop="url" href="/">
    <img itemprop="image"
         src="/images/Powered-by-Business-Wire.gif" />
  </a>

then the microdata would include:

  url:   http://www.businesswire.com
  image: http://www.businesswire.com/images/Powered-by-Business-Wire.gif

whereas if you use a meta element as in:

  <a href="/">
    <meta itemprop="url" content="/" />
    <img itemprop="image"
         src="/images/Powered-by-Business-Wire.gif" />
  </a>

then the properties would be:

  url:   /
  image: http://www.businesswire.com/images/Powered-by-Business-Wire.gif

The RDF generation is another step on top of that. If the value comes from a URL property element, then it creates a reference to a resource, which is how we usually treat URLs in RDF, otherwise a plain literal string.

So, assuming that schema.org mean the url attribute to hold an absolute URL then the right thing to do is to use the href attribute, not a meta element.

> Other examples of this are with Schem.org/ImageObject s where the
> *itemid* is the same as the contentURL (interestingly URL is
> upper case for this property but camel case for thumbnailUrl :)

I think that these are things to take up on the public-vocabs@w3.org mailing list. I don't know what the relationship between the microdata itemid and the schema.org url or contentURL properties is supposed to be.

> I imagine I, or other new to RDF implementors of Microdata, just use
> the *itemid* and or *href*/*src* wrongly for  these cases,
> but if a guide is produced to help them/me/us, and it had an explanation
> of how to do this correctly, it would be a big help. 


Does the above help? Would you like to add your example to the wiki, perhaps at [2]?

It would also be great to hear about how you're using microdata, and particularly why you're looking at the RDF that's extracted from it.

Thanks,

Jeni

[1] http://dev.w3.org/html5/md/Overview.html#url-property-elements
[2] http://www.w3.org/wiki/Mapping_Microdata_to_RDF
-- 
Jeni Tennison
http://www.jenitennison.com




Please Note:  

The information in this Business Wire e-mail message, and any files transmitted with it, is confidential and may be legally privileged. It is intended only for the use of the individual(s) named above. If you are the intended recipient, be aware that your use of any confidential or personal information may be restricted by state and federal privacy laws. If you, the reader of this message, are not the intended recipient, you are hereby notified that you should not further disseminate, distribute, or forward this e-mail message. If you have received this e-mail in error, please notify the sender and delete the material from any computer.
Received on Saturday, 22 October 2011 21:09:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 22 October 2011 21:09:02 GMT