W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2014

Re: working of schema.org/WebPage

From: Jarno van Driel <jarno@quantumspork.nl>
Date: Thu, 17 Apr 2014 20:44:26 +0200
Message-ID: <CAFQgrbZWEtwJhNuaY04kHK5wKZ7QDZfMC958aqkOfiWJKCB0rA@mail.gmail.com>
To: Jason Douglas <jasondouglas@google.com>
Cc: Public Vocabs <public-vocabs@w3.org>
"Let's fix it!"
You said it, hehe.

The simplest way I can come up with is by having a 'mainType' (or
mainContent, mainEntity, mainItem, etc) property with an expected type of
Thing, with which one can express:
WebPage > mainType > Product
CollectionPage > mainType > ItemList > itemListElement > ImageObject

Question then becomes, what to do with a type like WebPageElement (and it's
subClasses)?
Do we connect entities it contains with @about, @mentions, or a new
property like @hasType or @hasPart?

Or how do we connect (subClasses of) WebPageElement to WebPage?


On Thu, Apr 17, 2014 at 8:25 PM, Jason Douglas <jasondouglas@google.com>wrote:

> Yup, that's messed up.  Let's fix it!
>
>
> On Thu Apr 17 2014 at 11:15:43 AM, Jarno van Driel <jarno@quantumspork.nl>
> wrote:
>
>> "...if a relation is declared without an explicit subject, then the
>> subject will be assumed to be the current WebPage."
>> Got it.
>>
>> "It is legal for there to be multiple top-level entities." + "Current
>> clients make up their own heuristics for this..."
>> Brainfreeze!
>> How am I, as a developer, to deal with this? Does this mean I have to
>> somehow figure out which heuristics every parser/search engine uses, to be
>> able to have control or do I need to try to chain everything together such
>> that only one top-level entity is left?
>>
>> And how would I do this for a category page on for example an eCommerce
>> site. Which shows a range of Product entities on a CollectionPage, which
>> together form the main-content and where the CollectionPage, for lack of a
>> better term, only functions as a 'wrapper' for the list of products.
>>
>> "we probably do need a mechanism for indicating the "primary entity" of
>> a webpage when there is one..."
>> One the reasons why I asked my questions is because I encounter quite a
>> lot of markup on websites where people use @mainContentPage on entities
>> like Product. Now @mainContentOfPage has the expected type WebPageElement,
>> but many aren't aware of this. And since there is no property to indicate
>> which entity is the primary one I actually can completely understand they
>> try to resolve it like this. And frankly, I'm confused as well.
>>
>>
>>
>> On Thu, Apr 17, 2014 at 7:51 PM, Jason Douglas <jasondouglas@google.com>wrote:
>>
>>> It is legal for there to be multiple top-level entities.  That
>>> description of WebPage is not meant to imply anything about the
>>> relationship of those top-level objects... all that is saying is that if a
>>> relation is declared without an explicit subject, then the subject will be
>>> assumed to be the current WebPage.
>>>
>>> That said, we probably do need a mechanism for indicating the "primary
>>> entity" of a webpage when there is one.  Current clients make up their own
>>> heuristics for this, but I think it would be better to have an explicit way
>>> of stating that.
>>>
>>> -jason
>>>
>>>
>>> On Thu Apr 17 2014 at 10:41:47 AM, Jarno van Driel <
>>> jarno@quantumspork.nl> wrote:
>>>
>>>> I'm trying to understand semantic mechanisms better but am a bit
>>>> confused about schema.org/WebPage and I'd like to know how it works.
>>>>
>>>> Now it could well be I understand certain terminologies wrong, so
>>>> please bare with me and be so nice to correct me when needed.
>>>>
>>>> 1] The description of http://schema.org/WebPage says:
>>>> "Every web page is implicitly assumed to be declared to be of type
>>>> WebPage, so the various properties about that webpage, such as breadcrumb
>>>> may be used. We recommend explicit declaration if these properties are
>>>> specified, but if they are found outside of an itemscope, they will be
>>>> assumed to be about the page."
>>>>
>>>> code example:
>>>> <body itemscope itemtype="http://schema.org/WebPage">
>>>>   <!-- Content -->
>>>> </body>
>>>>
>>>> Now if the WebPage is the only entity is it then considered to be the
>>>> 'Subject', the 'Object' or both?
>>>>
>>>> 2] If the WebPage contains an entity, let's say a Product, without
>>>> specifying a property on the Product and I check this with Google's SDTT, I
>>>> see 2 'root' entities, since there is no property to chain the two
>>>> together. Yet I get the impression the Product gets treated as the
>>>> 'Object', since it's the Product that gets used for Rich snippet
>>>> extraction, and that therefore the WebPage is the 'Subject' :
>>>>
>>>> code example:
>>>> <body itemscope itemtype="http://schema.org/WebPage">
>>>>   <span itemprop="name">Page title</span>
>>>>
>>>>   <div itemscope itemtype="http://schema.org/Product">
>>>>     <span itemprop="name">Product name</span>
>>>>     <!-- Product properties -->
>>>>   </div>
>>>> </body>
>>>>
>>>> Now since "Every web page is implicitly assumed to be declared to be of
>>>> type WebPage" I was wondering if there also is a property that is
>>>> 'implicitly assumed to be declared' (something like @contains) on the first
>>>> entity that comes after it, like Product in this case, which indicates that
>>>> the Product is the 'Object'?
>>>>
>>>> And if not, than how does a parser 'know' which of the entities is the
>>>> 'Subject' and which is the 'Object', shouldn't there be a predicate for
>>>> this?
>>>>
>>>> 3] When a WebPage contains a bunch of 'root' entities, how does a
>>>> parser make sense of this, does the DOM have anything to do with this?
>>>>
>>>> <body itemscope itemtype="http://schema.org/WebPage">
>>>>   <span itemprop="name">Page title</span>
>>>>
>>>>   <div itemscope itemtype="http://schema.org/Product">
>>>>     <span itemprop="name">Product 1 name</span>
>>>>     <!-- Product properties -->
>>>>   </div>
>>>>
>>>>   <div itemscope itemtype="http://schema.org/Product">
>>>>     <span itemprop="name">Product 2 name</span>
>>>>     <!-- Product properties -->
>>>>   </div>
>>>>
>>>>   <div itemscope itemtype="http://schema.org/LocalBusiness">
>>>>     <span itemprop="name">Business name</span>
>>>>     <!-- Product properties -->
>>>>   </div>
>>>> </body>
>>>>
>>>> Now the above could be full of misunderstandings because I lack in
>>>> theoretical knowledge still, but that's exactly the thing I'm hoping to
>>>> change. Who can enlighten me?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
Received on Thursday, 17 April 2014 18:44:54 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:29:39 UTC