- From: Jayson Lorenzen <Jayson.Lorenzen@businesswire.com>
- Date: Mon, 24 Oct 2011 08:34:03 -0700
- To: "Philip Jägenstedt" <philipj@opera.com>, <public-html-data-tf@w3.org>
Couple more apologies. First, sorry to go on and on about this, with long examples and test output and ranting again. Second, I think what I meant to say before was not about Vocabulary but maybe .... just "Different parsers can see the same thing very differently" and maybe the recommendations from this group should steer newbees in the direction that causes the least confusion. Now, onto more rant :) In reply to Philip, actually there was a copy paste mistake (extra closing div) in the HTML I had used as examples before as well as the odd way the <a> was added as you pointed out. I found that, indeed, testing with one more parser (any23) the outcome was wrong. HOWEVER, a little change in the V1 HTML and it works in both parsers, I mean produces the very same RDF, but with really different HTML. The Rich Snip Test Tool, however, follows the HTML and produces different output. See the two sets of examples below. These are labeled V1 and V2, and each set includes the HTML, extracts from two RDF distillers and the Rich Snip Test tool. ************************** V1 ************************** ************************** V1 ************************** <body> <div itemscope="itemscope" itemtype="http://schema.org/Organization" itemid="http://businesswire.com"> <meta itemprop="name" content="Business Wire"/> </div> <div itemscope="itemscope" itemtype="http://schema.org/NewsArticle" itemid="http://www.example.com/news/20110415123/"> <div itemprop="articleBody"> NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level in a year Thursday.... </div> <a itemprop="copyrightHolder" href="http://www.businesswire.com"> </a> </div> </body> ********************* any23 ****************** <http://businesswire.com> a <http://schema.org/Organization> ; <http://schema.org/Organization/name> "Business Wire" . <http://www.example.com/news/20110415123/> a <http://schema.org/NewsArticle> ; <http://schema.org/NewsArticle/copyrightHolder> <http://www.businesswire.com> ; <http://schema.org/NewsArticle/articleBody> """ NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level in a year Thursday.... """ . ********************* greggKellogg ********* <http://businesswire.com> a schema:Organization; schema:name "Business Wire" . <http://www.example.com/news/20110415123/> a schema:NewsArticle; schema:articleBody """ NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level in a year Thursday.... """; schema:copyrightHolder <http://www.businesswire.com> . ********************* rich snip tool ********* Item Id:http://businesswire.com Type: http://schema.org/organization name = Business Wire Item Id:http://www.example.com/news/20110415123/ Type: http://schema.org/newsarticle articlebody = NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level in a year Thursday.... copyrightholder = http://www.businesswire.com/ ************************** V2 ************************** ************************** V2 ************************** <body itemscope="itemscope" itemtype="http://schema.org/NewsArticle" itemid="http://www.example.com/news/20110415123/"> <div itemprop="articleBody"> NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level in a year Thursday.... </div> <div itemprop="copyrightHolder" itemscope="itemscope" itemtype="http://schema.org/Organization" itemid="http://businesswire.com"> <meta itemprop="name" content="Business Wire"/> </div> </body> ********************* any23 ****************** <http://www.example.com/news/20110415123/> a <http://schema.org/NewsArticle> . <http://businesswire.com> a <http://schema.org/Organization> ; <http://schema.org/Organization/name> "Business Wire" . <http://www.example.com/news/20110415123/> <http://schema.org/NewsArticle/copyrightHolder> <http://businesswire.com> ; <http://schema.org/NewsArticle/articleBody> "\n NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level\n in a year Thursday....\n " . ********************* greggKellogg ********* <http://businesswire.com> a schema:Organization; schema:name "Business Wire" . <http://www.example.com/news/20110415123/> a schema:NewsArticle; schema:articleBody """ NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level in a year Thursday.... """; schema:copyrightHolder <http://businesswire.com> . ********************* rich snip tool ********* Item Id:http://www.example.com/news/20110415123/ Type: http://schema.org/newsarticle articlebody = NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level in a year Thursday.... copyrightholder = Item( 1 ) Item 1 Id:http://businesswire.com Type: http://schema.org/organization name = Business Wire Jayson Lorenzen Senior Software Engineer ____________________________ B U S I N E S S W I R E A Berkshire Hathaway Company +1.415.986.4422, ext. 766 +1.415.956.2609 (fax) www.BusinessWire.com Business Wire/San Francisco 44 Montgomery St. 39th Floor San Francisco, CA 94104 >>> From: Philip Jägenstedt<philipj@opera.com> To: <public-html-data-tf@w3.org> Date: 10/24/2011 2:42 AM Subject: Re: Microdata itemid and src / href On Mon, 24 Oct 2011 11:26:47 +0200, Philip Jägenstedt <philipj@opera.com> wrote: > On Sat, 22 Oct 2011 23:04:28 +0200, Jayson Lorenzen > <Jayson.Lorenzen@businesswire.com> wrote: > >> Sorry to go on and on about this but I just thought (while driving, >> dangerous) that this situation is an interesting example of of a >> Vocabulary specific parser behaving differently than a generic parser >> (that does not know about the Vocabulary). Here is what I mean. Using a >> generic parser (like Mr. Kellog's) > > Both of the following examples are invalid microdata and they don't > represent the same things. Details inline. > >> <div itemscope="itemscope" itemtype="http://schema.org/Organization" >> itemid="http://example.com"> >> <meta itemprop="name" content="Example"/> >> </div> >> >> <a itemprop="myCompany" href="http://example.com"> > > Validator.nu [1] will complain about the <a> element that "The itemprop > attribute was specified, but the element is not a property of any item." > The problem is that the <a> element is not a child of the <div>, so it's > just ignored. Live Microdata [2] gives this JSON output: > > { > "items":[ > { > "type":"http://schema.org/Organization", > "id":"http://example.com/", > "properties":{ > "name":[ > "Example" > ] > } > } > ] > } > > (Note that there is no myCompany property.) > >> <div itemprop="myCompany" itemscope="itemscope" >> itemtype="http://schema.org/Organization" >> itemid="http://example.com"> >> <meta itemprop="name" content="Example"/> >> </div> > > Validator.nu will complain that "The itemprop attribute was specified, > but the element is not a property of any item." The issue here is that > there is no top-level item, since the outer item has an itemprop > attribute. Consequently, Live Microdata gives no output, simply noting > that "No top-level items found." > >> Produce the exact same RDF in a generic parser, but completely >> different results in the Google Rich Snippets Test tool. > > It sounds like there are bugs in the microdata parser used. Gregg, can > you take a look at this? > > [1] http://validator.nu/ > [2] http://foolip.org/microdatajs/live/ > [3] > http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#top-level-microdata-items Oops, the examples were expanded further down in the original mail: The JSON representation of those are: Version One: { "items":[ { "type":"http://schema.org/NewsArticle", "id":"http://www.example.com/news/20110415123/", "properties":{ "articleBody":[ "\n NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level\n in a year Thursday.... " ], "copyrightHolder":[ "http://www.businesswire.com/" ] } }, { "type":"http://schema.org/Organization", "id":"http://businesswire.com/", "properties":{ "name":[ "Business Wire" ] } } Version Two: { "items":[ { "type":"http://schema.org/NewsArticle", "id":"http://www.example.com/news/20110415123/", "properties":{ "articleBody":[ "\n NEW YORK, NY--(Test News)--Stocks were mixed and bond yields were at their lowest level\n in a year Thursday.... " ], "copyrightHolder":[ { "type":"http://schema.org/Organization", "id":"http://businesswire.com/", "properties":{ "name":[ "Business Wire" ] } } ] } } ] } Given this, it's pretty plain to see why the RDF representation is the same: the use of itemid. I would suggest using Version Two. I'll also take the opportunity to complain (again) that the meaning of itemid is still not defined for schema.org, so strictly speaking using it is invalid. -- Philip Jägenstedt Core Developer Opera Software Please Note: The information in this Business Wire e-mail message, and any files transmitted with it, is confidential and may be legally privileged. It is intended only for the use of the individual(s) named above. If you are the intended recipient, be aware that your use of any confidential or personal information may be restricted by state and federal privacy laws. If you, the reader of this message, are not the intended recipient, you are hereby notified that you should not further disseminate, distribute, or forward this e-mail message. If you have received this e-mail in error, please notify the sender and delete the material from any computer.
Received on Monday, 24 October 2011 15:35:48 UTC