W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2014

Re: working of schema.org/WebPage

From: Jarno van Driel <jarno@quantumspork.nl>
Date: Sun, 20 Apr 2014 03:05:16 +0200
Message-ID: <CAFQgrbZQ+TLU2qOT6OSy2mtwLGw7Lo0xtVpwq5Ss29kDpy2Row@mail.gmail.com>
To: Jocelyn Fournier <jocelyn.fournier@gmail.com>
Cc: Jason Douglas <jasondouglas@google.com>, Public Vocabs <public-vocabs@w3.org>
"all that is saying is that if a relation is declared without an explicit
subject, then the subject will be assumed to be the current WebPage"

So if the WebPage is only there to function as a sort of 'catch all' than
why does it even have subClasses?

I actually use WebPage and it's subClasses in an attempt to do the right
thing, but is there a point in doing this, does it actually matter if I
declare the page type?


On Thu, Apr 17, 2014 at 9:03 PM, Jocelyn Fournier <
jocelyn.fournier@gmail.com> wrote:

> Le 17/04/2014 20:25, Jason Douglas a écrit :
>
>  Yup, that's messed up.  Let's fix it!
>>
>
> Hi,
>
> Note that examples regarding mainContentOfPage on schema.org are also
> misleading.
> E.g. : http://schema.org/Table
> =>  <meta itemprop="mainContentOfPage" content="true"/>
>
> I would have expected
> <div itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/
> Table">
>
>
> But I fully agree it would be much more usefull to have mainContentOfPage
> as a 'Thing' rather than 'WebPageElement'
>
>   Jocelyn
>
>
>> On Thu Apr 17 2014 at 11:15:43 AM, Jarno van Driel
>> <jarno@quantumspork.nl <mailto:jarno@quantumspork.nl>> wrote:
>>
>>     "...if a relation is declared without an explicit subject, then the
>>     subject will be assumed to be the current WebPage."
>>     Got it.
>>
>>     "It is legal for there to be multiple top-level entities." +
>>     "Current clients make up their own heuristics for this..."
>>     Brainfreeze!
>>     How am I, as a developer, to deal with this? Does this mean I have
>>     to somehow figure out which heuristics every parser/search engine
>>     uses, to be able to have control or do I need to try to chain
>>     everything together such that only one top-level entity is left?
>>
>>     And how would I do this for a category page on for example an
>>     eCommerce site. Which shows a range of Product entities on a
>>     CollectionPage, which together form the main-content and where the
>>     CollectionPage, for lack of a better term, only functions as a
>>     'wrapper' for the list of products.
>>
>>     "we probably do need a mechanism for indicating the "primary entity"
>>     of a webpage when there is one..."
>>     One the reasons why I asked my questions is because I encounter
>>     quite a lot of markup on websites where people use @mainContentPage
>>     on entities like Product. Now @mainContentOfPage has the expected
>>     type WebPageElement, but many aren't aware of this. And since there
>>     is no property to indicate which entity is the primary one I
>>     actually can completely understand they try to resolve it like this.
>>     And frankly, I'm confused as well.
>>
>>
>>
>>     On Thu, Apr 17, 2014 at 7:51 PM, Jason Douglas
>>     <jasondouglas@google.com <mailto:jasondouglas@google.com>> wrote:
>>
>>         It is legal for there to be multiple top-level entities.  That
>>         description of WebPage is not meant to imply anything about the
>>         relationship of those top-level objects... all that is saying is
>>         that if a relation is declared without an explicit subject, then
>>         the subject will be assumed to be the current WebPage.
>>
>>         That said, we probably do need a mechanism for indicating the
>>         "primary entity" of a webpage when there is one.  Current
>>         clients make up their own heuristics for this, but I think it
>>         would be better to have an explicit way of stating that.
>>
>>         -jason
>>
>>
>>         On Thu Apr 17 2014 at 10:41:47 AM, Jarno van Driel
>>         <jarno@quantumspork.nl <mailto:jarno@quantumspork.nl>> wrote:
>>
>>             I'm trying to understand semantic mechanisms better but am a
>>             bit confused about schema.org/WebPage
>>             <http://schema.org/WebPage> and I'd like to know how it
>> works.
>>
>>
>>             Now it could well be I understand certain terminologies
>>             wrong, so please bare with me and be so nice to correct me
>>             when needed.
>>
>>             1] The description of http://schema.org/WebPage says:
>>             "Every web page is implicitly assumed to be declared to be
>>             of type WebPage, so the various properties about that
>>             webpage, such as breadcrumb may be used. We recommend
>>             explicit declaration if these properties are specified, but
>>             if they are found outside of an itemscope, they will be
>>             assumed to be about the page."
>>
>>             code example:
>>             <body itemscope itemtype="http://schema.org/WebPage">
>>                <!-- Content -->
>>             </body>
>>
>>             Now if the WebPage is the only entity is it then considered
>>             to be the 'Subject', the 'Object' or both?
>>
>>             2] If the WebPage contains an entity, let's say a Product,
>>             without specifying a property on the Product and I check
>>             this with Google's SDTT, I see 2 'root' entities, since
>>             there is no property to chain the two together. Yet I get
>>             the impression the Product gets treated as the 'Object',
>>             since it's the Product that gets used for Rich snippet
>>             extraction, and that therefore the WebPage is the 'Subject' :
>>
>>             code example:
>>             <body itemscope itemtype="http://schema.org/WebPage">
>>                <span itemprop="name">Page title</span>
>>
>>                <div itemscope itemtype="http://schema.org/Product">
>>                  <span itemprop="name">Product name</span>
>>                  <!-- Product properties -->
>>                </div>
>>             </body>
>>
>>             Now since "Every web page is implicitly assumed to be
>>             declared to be of type WebPage" I was wondering if there
>>             also is a property that is 'implicitly assumed to be
>>             declared' (something like @contains) on the first entity
>>             that comes after it, like Product in this case, which
>>             indicates that the Product is the 'Object'?
>>
>>             And if not, than how does a parser 'know' which of the
>>             entities is the 'Subject' and which is the 'Object',
>>             shouldn't there be a predicate for this?
>>
>>             3] When a WebPage contains a bunch of 'root' entities, how
>>             does a parser make sense of this, does the DOM have anything
>>             to do with this?
>>
>>             <body itemscope itemtype="http://schema.org/WebPage">
>>                <span itemprop="name">Page title</span>
>>
>>                <div itemscope itemtype="http://schema.org/Product">
>>                  <span itemprop="name">Product 1 name</span>
>>                  <!-- Product properties -->
>>                </div>
>>
>>                <div itemscope itemtype="http://schema.org/Product">
>>                  <span itemprop="name">Product 2 name</span>
>>                  <!-- Product properties -->
>>                </div>
>>
>>                <div itemscope itemtype="http://schema.org/LocalBusiness">
>>                  <span itemprop="name">Business name</span>
>>                  <!-- Product properties -->
>>                </div>
>>             </body>
>>
>>             Now the above could be full of misunderstandings because I
>>             lack in theoretical knowledge still, but that's exactly the
>>             thing I'm hoping to change. Who can enlighten me?
>>
>>
>>
>>
>>
>>
>>
>
Received on Sunday, 20 April 2014 01:05:44 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:29:39 UTC