W3C home > Mailing lists > Public > public-html-data-tf@w3.org > October 2011

Re: Microdata itemid and src / href

From: Jayson Lorenzen <Jayson.Lorenzen@businesswire.com>
Date: Mon, 24 Oct 2011 08:34:03 -0700
Message-Id: <4EA522FB0200007E000A1EFB@sfgwia1.businesswire.com>
To: "Philip Jägenstedt" <philipj@opera.com>, <public-html-data-tf@w3.org>
Couple more apologies. First, sorry to go on and on about this, with
long examples and test output and ranting again. Second, I think what I
meant to say before was not about Vocabulary but maybe .... just
"Different parsers can see the same thing very differently" and maybe
the recommendations from this group should steer newbees in the
direction that causes the least confusion. Now, onto more rant :)

In reply to Philip, actually there was a copy paste mistake (extra
closing div) in the HTML I had used as examples before as well as the
odd way the <a> was added as you pointed out. I found that, indeed,
testing with one more parser (any23) the outcome was wrong. HOWEVER, a
little change in the V1 HTML and it works in both parsers, I mean
produces the very same RDF, but with really different HTML. The Rich
Snip Test Tool, however, follows the HTML and produces different output.
See the two sets of examples below. These are labeled V1 and V2, and
each set includes the HTML, extracts from two RDF distillers and the
Rich Snip Test tool. 



************************** V1 **************************
************************** V1 **************************

    <body>
        <div itemscope="itemscope"
itemtype="http://schema.org/Organization"
            itemid="http://businesswire.com">
            <meta itemprop="name" content="Business Wire"/>
        </div>

        <div itemscope="itemscope"
itemtype="http://schema.org/NewsArticle"
            itemid="http://www.example.com/news/20110415123/">
            <div itemprop="articleBody"> NEW YORK, NY--(Test
News)--Stocks were mixed and bond
                yields were at their lowest level in a year
Thursday.... </div>
            <a itemprop="copyrightHolder"
href="http://www.businesswire.com">&#160;</a>
        </div>
    </body>


********************* any23 ******************

<http://businesswire.com> a <http://schema.org/Organization> ;
	<http://schema.org/Organization/name> "Business Wire" .

<http://www.example.com/news/20110415123/> a
<http://schema.org/NewsArticle> ;
	<http://schema.org/NewsArticle/copyrightHolder>
<http://www.businesswire.com> ;
	<http://schema.org/NewsArticle/articleBody> """ NEW YORK,
NY--(Test News)--Stocks were mixed and bond
                yields were at their lowest level in a year
Thursday.... """ .

********************* greggKellogg *********

<http://businesswire.com> a schema:Organization;
   schema:name "Business Wire" .

<http://www.example.com/news/20110415123/> a schema:NewsArticle;
   schema:articleBody """ NEW YORK, NY--(Test News)--Stocks were mixed
and bond
                yields were at their lowest level in a year
Thursday.... """;
   schema:copyrightHolder <http://www.businesswire.com> .

********************* rich snip tool *********
Item 
Id:http://businesswire.com
Type: http://schema.org/organization
name = Business Wire 

Item 
Id:http://www.example.com/news/20110415123/
Type: http://schema.org/newsarticle
articlebody = NEW YORK, NY--(Test News)--Stocks were mixed and bond
yields were at their lowest level in a year Thursday.... 
copyrightholder = http://www.businesswire.com/ 


************************** V2 **************************
************************** V2 **************************

 <body itemscope="itemscope" itemtype="http://schema.org/NewsArticle"
itemid="http://www.example.com/news/20110415123/">
      <div itemprop="articleBody">
        NEW YORK, NY--(Test News)--Stocks were mixed and bond yields
were at their lowest level
        in a year Thursday....
      </div>

    <div itemprop="copyrightHolder" itemscope="itemscope"
      itemtype="http://schema.org/Organization"
itemid="http://businesswire.com">
      <meta itemprop="name" content="Business Wire"/>
    </div>

  </body>

********************* any23 ******************

<http://www.example.com/news/20110415123/> a
<http://schema.org/NewsArticle> .

<http://businesswire.com> a <http://schema.org/Organization> ;
	<http://schema.org/Organization/name> "Business Wire" .

<http://www.example.com/news/20110415123/>
<http://schema.org/NewsArticle/copyrightHolder>
<http://businesswire.com> ;
	<http://schema.org/NewsArticle/articleBody> "\n      NEW YORK,
NY--(Test News)--Stocks were mixed and bond yields were at their lowest
level\n      in a year Thursday....\n    " .


********************* greggKellogg *********

<http://businesswire.com> a schema:Organization;
   schema:name "Business Wire" .

<http://www.example.com/news/20110415123/> a schema:NewsArticle;
   schema:articleBody """
      NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were
at their lowest level
      in a year Thursday....
    """;
   schema:copyrightHolder <http://businesswire.com> .

********************* rich snip tool *********

Item 
Id:http://www.example.com/news/20110415123/
Type: http://schema.org/newsarticle
articlebody = NEW YORK, NY--(Test News)--Stocks were mixed and bond
yields were at their lowest level in a year Thursday.... 
copyrightholder = Item( 1 ) 

Item 1 
Id:http://businesswire.com
Type: http://schema.org/organization
name = Business Wire 
















Jayson Lorenzen
Senior Software Engineer
____________________________ 
B  U  S  I  N  E  S  S       W  I  R  E 
A Berkshire Hathaway Company
 
+1.415.986.4422, ext. 766 
+1.415.956.2609 (fax) 
www.BusinessWire.com
 
Business Wire/San Francisco 
44 Montgomery St. 39th Floor
San Francisco, CA 94104



>>> 
From: 	Philip Jägenstedt<philipj@opera.com>
To:	<public-html-data-tf@w3.org>
Date: 	10/24/2011 2:42 AM
Subject: 	Re: Microdata itemid and src / href

On Mon, 24 Oct 2011 11:26:47 +0200, Philip Jägenstedt
<philipj@opera.com>  
wrote:

> On Sat, 22 Oct 2011 23:04:28 +0200, Jayson Lorenzen  
> <Jayson.Lorenzen@businesswire.com> wrote:
>
>> Sorry to go on and on about this but I just thought (while driving, 

>> dangerous) that this situation is an interesting example of of a  
>> Vocabulary specific parser behaving differently than a generic
parser  
>> (that does not know about the Vocabulary). Here is what I mean.
Using a  
>> generic parser (like Mr. Kellog's)
>
> Both of the following examples are invalid microdata and they don't 

> represent the same things. Details inline.
>
>>   <div itemscope="itemscope" 
itemtype="http://schema.org/Organization"  
>> itemid="http://example.com">
>>     <meta itemprop="name" content="Example"/>
>>   </div>
>>
>>    <a itemprop="myCompany" href="http://example.com">
>
> Validator.nu [1] will complain about the <a> element that "The
itemprop  
> attribute was specified, but the element is not a property of any
item."  
> The problem is that the <a> element is not a child of the <div>, so
it's  
> just ignored. Live Microdata [2] gives this JSON output:
>
> {
>    "items":[
>      {
>        "type":"http://schema.org/Organization",
>        "id":"http://example.com/",
>        "properties":{
>          "name":[
>            "Example"
>          ]
>        }
>      }
>    ]
> }
>
> (Note that there is no myCompany property.)
>
>>   <div itemprop="myCompany" itemscope="itemscope"   
>> itemtype="http://schema.org/Organization"
>>        itemid="http://example.com">
>>       <meta itemprop="name" content="Example"/>
>>    </div>
>
> Validator.nu will complain that "The itemprop attribute was
specified,  
> but the element is not a property of any item." The issue here is
that  
> there is no top-level item, since the outer item has an itemprop  
> attribute. Consequently, Live Microdata gives no output, simply
noting  
> that "No top-level items found."
>
>> Produce the exact same RDF in a generic parser, but completely  
>> different results in the Google Rich Snippets Test tool.
>
> It sounds like there are bugs in the microdata parser used. Gregg,
can  
> you take a look at this?
>
> [1] http://validator.nu/
> [2] http://foolip.org/microdatajs/live/
> [3]  
>
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#top-level-microdata-items

Oops, the examples were expanded further down in the original mail:

The JSON representation of those are:

Version One:

{
   "items":[
     {
       "type":"http://schema.org/NewsArticle",
       "id":"http://www.example.com/news/20110415123/",
       "properties":{
         "articleBody":[
           "\n        NEW YORK, NY--(Test News)--Stocks were mixed and
bond  
yields were at their lowest level\n        in a year Thursday.... "
         ],
         "copyrightHolder":[
           "http://www.businesswire.com/"
         ]
       }
     },
     {
       "type":"http://schema.org/Organization",
       "id":"http://businesswire.com/",
       "properties":{
         "name":[
           "Business Wire"
         ]
       }
     }

Version Two:

{
   "items":[
     {
       "type":"http://schema.org/NewsArticle",
       "id":"http://www.example.com/news/20110415123/",
       "properties":{
         "articleBody":[
           "\n        NEW YORK, NY--(Test News)--Stocks were mixed and
bond  
yields were at their lowest level\n        in a year Thursday.... "
         ],
         "copyrightHolder":[
           {
             "type":"http://schema.org/Organization",
             "id":"http://businesswire.com/",
             "properties":{
               "name":[
                 "Business Wire"
               ]
             }
           }
         ]
       }
     }
   ]
}

Given this, it's pretty plain to see why the RDF representation is the 

same: the use of itemid. I would suggest using Version Two.

I'll also take the opportunity to complain (again) that the meaning of 

itemid is still not defined for schema.org, so strictly speaking using
it  
is invalid.

-- 
Philip Jägenstedt
Core Developer
Opera Software




Please Note:  

The information in this Business Wire e-mail message, and any files
transmitted with it, is confidential and may be legally privileged. It
is intended only for the use of the individual(s) named above. If you
are the intended recipient, be aware that your use of any confidential
or personal information may be restricted by state and federal privacy
laws. If you, the reader of this message, are not the intended
recipient, you are hereby notified that you should not further
disseminate, distribute, or forward this e-mail message. If you have
received this e-mail in error, please notify the sender and delete the
material from any computer.
Received on Monday, 24 October 2011 15:35:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 24 October 2011 15:35:50 GMT