W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2014

Re: working of schema.org/WebPage

From: Jarno van Driel <jarno@quantumspork.nl>
Date: Sun, 20 Apr 2014 19:04:54 +0200
Message-ID: <CAFQgrba0K-TpKBc3gJHjXEm6V1t=ij57cHoWC7_S8QDryHL5xA@mail.gmail.com>
To: Thad Guidry <thadguidry@gmail.com>
Cc: "Wallis,Richard" <Richard.Wallis@oclc.org>, Jocelyn Fournier <jocelyn.fournier@gmail.com>, Jason Douglas <jasondouglas@google.com>, "<public-vocabs@w3.org>" <public-vocabs@w3.org>
What I find confusing about WebPage (+ it's subclasses) and it's relation
to WebPageElement(s) (+ it's subClasses) is where does it fit into the
whole (literally)?

What I understand about semantic techniques until thus far is that they're
a way to explain what a page is about, what entities it contains, what
their individual meaning is and what their relations are relative to each
other. All quite abstract concepts since they're supposed to describe
'meaning'.

Yet when I'm marking up a WebPage, including it's WebPageElement(s), trying
to chain it all together, to me it seems like I'm describing the domtree.
Now one can argue that when I'm doing this I actually am defining the
relations between entities, but there's nothing abstract about it. The only
thing it describes is the order of the DOM: a WebPage, its
WebPageElement(s) and the Things it/they contain.

Now some folks argue I'm overdoing it (which could be very much true) when
I mark up Everything on a page, but my response to that always is: "The
entity types exist, so they're probable meant to be used". But at the same
time it's the only argument I have to mark up everything. I don't have any
arguments that illustrate why I should describe the DOM as well. Like Jason
said: "It is legal for there to be multiple top-level entities". So why
should I even bother with marking up the WebPage or it's WebPageElement(s)?

And to complicate things even further, not everything needed to chain
everything together, while expressing the right context, is available:

1] Being able to define the main (or a collection of) entity of a WebPage.
@mainContentOfPage only accepts a text value??? (I agree with Jason;'s:
"Let's fix it!")

2] No clear properties are available to connect WebPageElements to a
WebPage. So far I have been using @mentions to connect Things together, eg.
WebPage > mentions > WPSideBar, because there's nothing else for this. A
better property (eg. @child, @contains, @has, @part, @DOMElement) seems
missing to me since I'm describing the DOM and not a relation like: Guha >
colleagueOf > Dan.

3] No clear properties are available to connect Entities to
WebPageElements. So far I have been using @about for that, eg. WPFooter >
about > LocalBusiness, where my gut feeling says there should be a property
like mentioned in [2].

Now since no information can be found on schema.org itself, I have tried to
search in the darkest corners of the internet trying to find any reference
about the ideas behind entities like WebPage and WebPageElement but I can't
find anything of meaning. And since these issues go way beyond the WebPage
entity itself, I wouldn't even know where to start if I were to write a
proposal to fix this.

And thus my main question is: Now what?

"Thanks for asking these kinds of questions, Jarno.  Keep 'em coming!"
No problem, I got a whole bag full of 'm. And probable more than you lot
will like.   :)



On Sun, Apr 20, 2014 at 5:10 PM, Thad Guidry <thadguidry@gmail.com> wrote:

> Least we forget...
>
> A webpage IS A html document.
> It can contain multiple sections about multiple subjects.
>
> Reminder FYI.
> Thanks for asking these kinds of questions, Jarno.  Keep 'em coming!
>
>
>
> On Sun, Apr 20, 2014 at 4:13 AM, Wallis,Richard <Richard.Wallis@oclc.org>wrote:
>
>>  I would suggest that it should read “… then the subject will be assumed
>> to be the current *webpage*”.
>>
>>  That subject of that webpage can be of any Type - Person, Place,
>> Organization, WebPage or one of its more focused subtypes - that is being
>> described.
>>
>>  By defining the type, you are doing the right thing otherwise some
>> process or person parsing the data would just have to assume that it
>> is ‘just a WebPage’.
>>
>>  ~Richard
>>
>>  On 20 Apr 2014, at 02:05, Jarno van Driel <jarno@quantumspork.nl> wrote:
>>
>>  "all that is saying is that if a relation is declared without an
>> explicit subject, then the subject will be assumed to be the current
>> WebPage"
>>
>>  So if the WebPage is only there to function as a sort of 'catch all'
>> than why does it even have subClasses?
>>
>>  I actually use WebPage and it's subClasses in an attempt to do the
>> right thing, but is there a point in doing this, does it actually matter if
>> I declare the page type?
>>
>>
>> On Thu, Apr 17, 2014 at 9:03 PM, Jocelyn Fournier <
>> jocelyn.fournier@gmail.com> wrote:
>>
>>> Le 17/04/2014 20:25, Jason Douglas a écrit :
>>>
>>>  Yup, that's messed up.  Let's fix it!
>>>>
>>>
>>>  Hi,
>>>
>>> Note that examples regarding mainContentOfPage on schema.org are also
>>> misleading.
>>> E.g. : http://schema.org/Table
>>> =>  <meta itemprop="mainContentOfPage" content="true"/>
>>>
>>> I would have expected
>>> <div itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/
>>> Table">
>>>
>>>
>>> But I fully agree it would be much more usefull to have
>>> mainContentOfPage as a 'Thing' rather than 'WebPageElement'
>>>
>>>   Jocelyn
>>>
>>>
>>>> On Thu Apr 17 2014 at 11:15:43 AM, Jarno van Driel
>>>>  <jarno@quantumspork.nl <mailto:jarno@quantumspork.nl>> wrote:
>>>>
>>>>     "...if a relation is declared without an explicit subject, then the
>>>>     subject will be assumed to be the current WebPage."
>>>>     Got it.
>>>>
>>>>     "It is legal for there to be multiple top-level entities." +
>>>>     "Current clients make up their own heuristics for this..."
>>>>     Brainfreeze!
>>>>     How am I, as a developer, to deal with this? Does this mean I have
>>>>     to somehow figure out which heuristics every parser/search engine
>>>>     uses, to be able to have control or do I need to try to chain
>>>>     everything together such that only one top-level entity is left?
>>>>
>>>>     And how would I do this for a category page on for example an
>>>>     eCommerce site. Which shows a range of Product entities on a
>>>>     CollectionPage, which together form the main-content and where the
>>>>     CollectionPage, for lack of a better term, only functions as a
>>>>     'wrapper' for the list of products.
>>>>
>>>>     "we probably do need a mechanism for indicating the "primary entity"
>>>>     of a webpage when there is one..."
>>>>     One the reasons why I asked my questions is because I encounter
>>>>     quite a lot of markup on websites where people use @mainContentPage
>>>>     on entities like Product. Now @mainContentOfPage has the expected
>>>>     type WebPageElement, but many aren't aware of this. And since there
>>>>     is no property to indicate which entity is the primary one I
>>>>     actually can completely understand they try to resolve it like this.
>>>>     And frankly, I'm confused as well.
>>>>
>>>>
>>>>
>>>>     On Thu, Apr 17, 2014 at 7:51 PM, Jason Douglas
>>>>      <jasondouglas@google.com <mailto:jasondouglas@google.com>> wrote:
>>>>
>>>>         It is legal for there to be multiple top-level entities.  That
>>>>         description of WebPage is not meant to imply anything about the
>>>>         relationship of those top-level objects... all that is saying is
>>>>         that if a relation is declared without an explicit subject, then
>>>>         the subject will be assumed to be the current WebPage.
>>>>
>>>>         That said, we probably do need a mechanism for indicating the
>>>>         "primary entity" of a webpage when there is one.  Current
>>>>         clients make up their own heuristics for this, but I think it
>>>>         would be better to have an explicit way of stating that.
>>>>
>>>>         -jason
>>>>
>>>>
>>>>         On Thu Apr 17 2014 at 10:41:47 AM, Jarno van Driel
>>>>          <jarno@quantumspork.nl <mailto:jarno@quantumspork.nl>> wrote:
>>>>
>>>>             I'm trying to understand semantic mechanisms better but am a
>>>>             bit confused about schema.org/WebPage
>>>>              <http://schema.org/WebPage> and I'd like to know how it
>>>> works.
>>>>
>>>>
>>>>             Now it could well be I understand certain terminologies
>>>>             wrong, so please bare with me and be so nice to correct me
>>>>             when needed.
>>>>
>>>>             1] The description of http://schema.org/WebPage says:
>>>>             "Every web page is implicitly assumed to be declared to be
>>>>             of type WebPage, so the various properties about that
>>>>             webpage, such as breadcrumb may be used. We recommend
>>>>             explicit declaration if these properties are specified, but
>>>>             if they are found outside of an itemscope, they will be
>>>>             assumed to be about the page."
>>>>
>>>>             code example:
>>>>             <body itemscope itemtype="http://schema.org/WebPage">
>>>>                <!-- Content -->
>>>>             </body>
>>>>
>>>>             Now if the WebPage is the only entity is it then considered
>>>>             to be the 'Subject', the 'Object' or both?
>>>>
>>>>             2] If the WebPage contains an entity, let's say a Product,
>>>>             without specifying a property on the Product and I check
>>>>             this with Google's SDTT, I see 2 'root' entities, since
>>>>             there is no property to chain the two together. Yet I get
>>>>             the impression the Product gets treated as the 'Object',
>>>>             since it's the Product that gets used for Rich snippet
>>>>             extraction, and that therefore the WebPage is the 'Subject'
>>>> :
>>>>
>>>>             code example:
>>>>             <body itemscope itemtype="http://schema.org/WebPage">
>>>>                <span itemprop="name">Page title</span>
>>>>
>>>>                <div itemscope itemtype="http://schema.org/Product">
>>>>                  <span itemprop="name">Product name</span>
>>>>                  <!-- Product properties -->
>>>>                </div>
>>>>             </body>
>>>>
>>>>             Now since "Every web page is implicitly assumed to be
>>>>             declared to be of type WebPage" I was wondering if there
>>>>             also is a property that is 'implicitly assumed to be
>>>>             declared' (something like @contains) on the first entity
>>>>             that comes after it, like Product in this case, which
>>>>             indicates that the Product is the 'Object'?
>>>>
>>>>             And if not, than how does a parser 'know' which of the
>>>>             entities is the 'Subject' and which is the 'Object',
>>>>             shouldn't there be a predicate for this?
>>>>
>>>>             3] When a WebPage contains a bunch of 'root' entities, how
>>>>             does a parser make sense of this, does the DOM have anything
>>>>             to do with this?
>>>>
>>>>             <body itemscope itemtype="http://schema.org/WebPage">
>>>>                <span itemprop="name">Page title</span>
>>>>
>>>>                <div itemscope itemtype="http://schema.org/Product">
>>>>                  <span itemprop="name">Product 1 name</span>
>>>>                  <!-- Product properties -->
>>>>                </div>
>>>>
>>>>                <div itemscope itemtype="http://schema.org/Product">
>>>>                  <span itemprop="name">Product 2 name</span>
>>>>                  <!-- Product properties -->
>>>>                </div>
>>>>
>>>>                <div itemscope itemtype="http://schema.org/LocalBusiness
>>>> ">
>>>>                  <span itemprop="name">Business name</span>
>>>>                  <!-- Product properties -->
>>>>                </div>
>>>>             </body>
>>>>
>>>>             Now the above could be full of misunderstandings because I
>>>>             lack in theoretical knowledge still, but that's exactly the
>>>>             thing I'm hoping to change. Who can enlighten me?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>
>
> --
> -Thad
> +ThadGuidry <https://www.google.com/+ThadGuidry>
> Thad on LinkedIn <http://www.linkedin.com/in/thadguidry/>
>
Received on Sunday, 20 April 2014 17:05:23 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:29:39 UTC