W3C home > Mailing lists > Public > public-vocabs@w3.org > August 2012

Re: Flattening Microdata

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Thu, 9 Aug 2012 09:54:12 +0200
Cc: Evan Sandhaus <sandhes@nytimes.com>
Message-Id: <CA9698B8-3EE2-4EED-A760-2E6F8EB1B274@ebusiness-unibw.org>
To: Stéphane Corlosquet <scorlosquet@gmail.com>, Public Vocabs <public-vocabs@w3.org>
Hi all,

Why not simply put a block of data markup containing just meta and link elements (and maybe additional divs) inside a div element without visible content somewhere in the *body* of the page, similar to the "RDFa Snippet Style", advocated in [1]? 

Example:

<html>
<head>
	<title>untitled</title>
</head>
<body>
<div itemscope itemtype="http://schema.org/Organization">
  <meta itemprop="name" content ="ACME Bagel Bakery Ltd">
  <div itemscope itemprop="address" itemtype="http://schema.org/PostalAddress">
      <meta itemprop="streetAddress" content="Bagel Street 1234">
      <meta itemprop="postalCode" content="12345">
      <meta itemprop="addressLocality" content="Munich, Germany">
  </div>
    <link itemprop="url" href="http://www.acme-bagels.com/" />
</div>

<!-- ... regular content comes here ... -->
</body>
</html>

For RDF, there is a mature tool for generating such snippets from e.g. Turtle syntax [2]. We started to work on a similar one for Microdata [3], but this is far from being mature. (In particular, it does not use meta and link properly yet).

The only obstacle is that search engines claim (!) they do not like invisible markup due to the risk of spam. In practice, however, most of them accept this modeling style if the site itself is considered trustworthy.
Also, I think the argument of "more spam in invisible content" is weak, since people can easily spam with visible content, as described in [4].


Best wishes

Martin Hepp

[1] Hepp, Martin; García, Roberto; Radinger, Andreas: RDF2RDFa: Turning RDF into Snippets for Copy-and-Paste, Technical Report TR-2009-01, 2009. PDF at http://www.heppnetz.de/files/RDF2RDFa-TR.pdf
[2] http://www.ebusiness-unibw.org/tools/rdf2rdfa/
[3] http://www.ebusiness-unibw.org/tools/rdf2microdata/
[4] http://www.cpcstrategy.com/blog/2012/04/google-rich-snippets-schema-org/

On Aug 8, 2012, at 6:10 PM, Stéphane Corlosquet wrote:

> Evan,
> 
> On Wed, Aug 8, 2012 at 11:43 AM, Sandhaus, Evan <sandhes@nytimes.com> wrote:
> Hello all!
> 
> I'm interested in 'flattening' schema.org object markup into the <head> element using <meta> elements.  In theory one should be able to use the "itemref" and "id" attributes to 'flatten' an object hierarchy into a set of metatags - but in practice this leads to unexpected results.  
> 
> For example:
> 
> Suppose we have a NewsArticle with the headline 'A Test Headline' that has a creator that is a Person that has the name 'Evan S Sandhaus' and the url 'http://sandha.us'.  Here is an example of how to flatten that out in the <head> using id and itemref:
> 
> <html itemid='the_article_id' itemscope itemtype='http://schema.org/NewsArticle'>
> 	<head>
> 		<!-- Article properties in global scope -->
> 		<meta itemprop='headline' content='A Test Headline'/>
> 		
> 		<!-- Author Properties Flattened with itemref and ids -->
> 		<meta itemprop='creator' itemscope itemtype='http://schema.org/Person' itemid='the_creator_id' itemref='author_name author_url'/>
> 		<meta id='author_name' itemprop='name' content='Evan S Sandhaus'/>		
> 		<meta id='author_url' itemprop='url' content='http://sandha.us'/>		
> 	</head>
> 	<body>
> 	</body>
> </html>
> 
> So that's the theory.
> 
> In practice, however, both the Rich Snippets Tool and the Python microdata libraries I'm using locally (http://pypi.python.org/pypi/microdata) both insist on adding the creator-specific properties to both the scope of both the creator and the NewsItem.
> 
> yes, and I don't see why they would not, since the meta elements for name and url are within the scope of the NewsArticle data item. Normally they would be inside the Person data item and not affect NewsArticle, but since you can't nest elements in head, this is not possible to the best of my knowledge.
> 
> Steph.
>  
> 
> More concretely - my local tools give me this: 
> [{
>     "id": "the_article_id",
>     "properties": {
>         "creator": [{
>             "id": "the_creator_id",
>             "properties": {
>                 "name": ["Evan S Sandhaus"],
>                 "url": ["http://sandha.us"]
>             },
>             "type": "http://schema.org/Person"
>         }],
>         "headline": ["A Test Headline"],
>         "name": ["Evan S Sandhaus"],
>         "url": ["http://sandha.us"]
>     },
>     "type": "http://schema.org/NewsItem"
> }]
> 
> And the Rich Snippets tool gives me this:
> Item 
> Type: http://schema.org/newsarticle
> headline = A Test Headline 
> creator = Item( 1 ) 
> name = Evan S Sandhaus 
> url = http://sandha.us 
> Item 1 
> Type: http://schema.org/person
> name = Evan S Sandhaus 
> url = http://sandha.us 
> 
> So the question is: is this expected behavior?  If so, is there anything I could do besides this to "flatten" the markup into the <head> element?
> 
> Thanks!
> 
> ~Evan
> --
> Evan Sandhaus
> Lead Architect, Semantic Platforms
> The New York Times Company
> @kansandhaus
> 

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/
Received on Thursday, 9 August 2012 07:54:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 9 August 2012 07:54:43 GMT