Precision of data and specifications in schema.org design (was Re: Article Proposal) from Dan Scott on 2013-12-13 (public-schemabibex@w3.org from December 2013)

From: Dan Scott <denials@gmail.com>
Date: Fri, 13 Dec 2013 10:36:51 -0500
To: Karen Coyle <kcoyle@kcoyle.net>
Cc: "public-schemabibex@w3.org" <public-schemabibex@w3.org>
Message-ID: <CAAY5AM3XVRKzGpTe+vD6JqDocH2p5rUP2g8=snJn2UvO-UZkeQ@mail.gmail.com>

On Thu, Dec 12, 2013 at 6:40 PM, Karen Coyle <kcoyle@kcoyle.net> wrote:
>
>
> On 12/12/13, 2:38 PM, Dan Scott wrote:
>
>> Hmm, I don't think possible limitations of the software creating the
>> data should be a strong rationale for designing the vocabulary. If a
>> developer claimed they couldn't parse extremely simple cases like
>> "123-456" or "pp. 34-45" into the appropriate pageStart / pageEnd
>> properties, they should probably start looking for a new set of tools
>> or a new job :)
>
>
> Dan, It is not up to us to decide what people's programs will or will not do
> nor what their data is or should be, or whose code is not "good enough." Our
> task here is NOT to determine what people SHOULD be doing, but to help them
> mark up what they do have. And when what they have could be just about
> *anything*, it is best to be generous.

I don't think this perspective can survive first contact with
http://schema.org/OpeningHoursSpecification, the affiliated primitive
types http://schema.org/DateTime and http://schema.org/Time, and the
http://schema.org/DayOfWeek enumeration on which it relies. Those all
have clear definitions of what people should be doing in their
schema.org markup if they hope to be understood correctly by search
engines.

I'm sure you know that in many cases the human-readable content will
remain as-is, but there will be a @content property or link@href
embedded in the markup that provides the machine-readable version of
the same content. As an example of where numberOfPages could handle
library-centric description/display practices and thus not have to be
conflated with the proposed "pagination" property, one could mark up
the number of pages for a book with 21 pages of preface/intro/ToC or
whatever and 356 pages of core content to conform to the current
definition of the property like so:

<span itemprop="numberOfPages" content="377">xxi, 356p.</span>
<-- Good for the humans, good for the machines! -->

schema.org sets standards for data quality because that makes it
possible for search engines and other processors to treat the data
uniformly across sites. We know, of course, that the search engines
will apply Postel's Law and try to deal with what they come across (in
areas where they are motivated to do so). But that's not a reason for
our group to conflate property definitions or put forth an ambiguous
specification in the first place. It is up to us to set the targets
for web publishers with the hopes that the web will offer better
quality machine-readable data as a result.

> I find this kind of remark to be ... unfortunate, let's say.

On the contrary, I think it's really important to have a clear idea of
what schema.org is about so we have a common understanding and common
goals in our efforts as a group.

Received on Friday, 13 December 2013 15:37:22 UTC