Re: revisiting 'position', 'episodeNumber', 'seasonNumber' modeling and Periodicals from Peter F. Patel-Schneider on 2014-08-12 (public-vocabs@w3.org from August 2014)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Tue, 12 Aug 2014 14:16:57 -0700
To: Thad Guidry <thadguidry@gmail.com>, "public-vocabs@w3.org" <public-vocabs@w3.org>
Message-ID: <53EA8449.5040404@gmail.com>
On 08/12/2014 12:47 PM, Thad Guidry wrote:
> Someone has to absorb the costs of interpreting the data.  This should always
> fall on the client side.  This always means me, and I am used to it, and
> gladly pay the costs.

> Schema.org already does quite a bit to help with interpretation, but only so
> much as Martin aptly puts with regard to "web-scale" in some cases, but that
> is OK for now.
>
> The costs of interpreting data should try to be kept to a minimum.  This
> happens now and if all of us continue to do our homework and make Schema.org a
> viable publishing standard for structured data as much as possible.
>
> I live and breathe (and work) as a Data Architect, and I am one of those
> stakeholders absorbing much of all that we put together.  So far, it has been
> rather trivial for my own tool sets to handle.
>
> Please, do not put any more burden on the Publishers...they do pretty darn
> great so far, and I want them to publish faster and more...and I will happily
> pay the costs and develop my own client processing rules, when and where I
> need to. (and I am one of the small stakeholders...size: 1 - me) :-)

The question is how to best to have usable information published on the web. 
On one side any web document (including HTML, svg, gif, etc.) could be 
considered to contain relevant information and all consumers should be able to 
extract this information.  This minimizes the burden on publishers in some 
sense, but probably doesn't end up with much usable information in reality 
(web search notwithstanding) as it is currently too hard to construct the 
required information consumers.  The opposite situation would be where all 
information is presented in some formal machine-understandable fashion that 
permits all the relevant consequences to be drawn.  This actually might also 
be an easy situation for publishers, but is not currently viable as we do not 
know how to set up this kind of information representation in general.  So we 
are stuck in some middle ground where publishers are expected to map their 
information into some language that partially captures the information that 
they are trying make available.  What should this language look like?  How 
should it work?  How should it be described?  These are hard questions, and 
opinions vary on what is best given our current situation.

> The real problem and hard part is getting Publishers to always use a property
> like http://schema.org/episodeNumber !  (having 'Number' as a term within the
> property might cause some confusion) Make it easy for them to always use the
> property by explaining well how it can be used, good definitions, good
> examples, etc (whatever it may be called), and you make my job / life easier
> as a consumer of all their data.

Precisely.  But how to make it easy for publishers to transmit the correct (or 
at least close-to-correct) information?  Is episodeNumber with type integer 
good for this?  I expect not, particularly as there are lots of exceptions to 
episode numbers being integers.  Is episode with type string good for this? 
Maybe.  At least it allows for just about every conceivable situation, at the 
price of maybe losing too much of the desired meaning of episode ordering. 
Perhaps instead there should be a new type, something like 
partially-numeric-string that would have most of the freedom of string but 
preserve some of the meaning of integer.  Perhaps there should instead be a 
division into providing episode IDs, which can be numbers or strings, and also 
providing explicit ordering information.  Or perhaps episode ordering is even 
more complex and should be published in some other fashion.   (Consider the 
various orderings of The Prisoner episodes.)

> My personal preference for all of these kind of datatype issues where "Numeric
> values that might NOT ALWAYS GET PUBLISHED as numeric values" has always been
> "string", since I (and the Publishers) will be hanging off of a really cool
> "Thing property" we never had before, then the interpretation is very trivial
> typically for me.

One problem with just using string for everything is that the meaning of the 
values can be hard to decipher.   How, for example, should you sort the values 
you get?

> --
> -Thad

peter
Received on Tuesday, 12 August 2014 21:17:32 UTC