- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Tue, 12 Aug 2014 14:16:57 -0700
- To: Thad Guidry <thadguidry@gmail.com>, "public-vocabs@w3.org" <public-vocabs@w3.org>
On 08/12/2014 12:47 PM, Thad Guidry wrote: > Someone has to absorb the costs of interpreting the data. This should always > fall on the client side. This always means me, and I am used to it, and > gladly pay the costs. > Schema.org already does quite a bit to help with interpretation, but only so > much as Martin aptly puts with regard to "web-scale" in some cases, but that > is OK for now. > > The costs of interpreting data should try to be kept to a minimum. This > happens now and if all of us continue to do our homework and make Schema.org a > viable publishing standard for structured data as much as possible. > > I live and breathe (and work) as a Data Architect, and I am one of those > stakeholders absorbing much of all that we put together. So far, it has been > rather trivial for my own tool sets to handle. > > Please, do not put any more burden on the Publishers...they do pretty darn > great so far, and I want them to publish faster and more...and I will happily > pay the costs and develop my own client processing rules, when and where I > need to. (and I am one of the small stakeholders...size: 1 - me) :-) The question is how to best to have usable information published on the web. On one side any web document (including HTML, svg, gif, etc.) could be considered to contain relevant information and all consumers should be able to extract this information. This minimizes the burden on publishers in some sense, but probably doesn't end up with much usable information in reality (web search notwithstanding) as it is currently too hard to construct the required information consumers. The opposite situation would be where all information is presented in some formal machine-understandable fashion that permits all the relevant consequences to be drawn. This actually might also be an easy situation for publishers, but is not currently viable as we do not know how to set up this kind of information representation in general. So we are stuck in some middle ground where publishers are expected to map their information into some language that partially captures the information that they are trying make available. What should this language look like? How should it work? How should it be described? These are hard questions, and opinions vary on what is best given our current situation. > The real problem and hard part is getting Publishers to always use a property > like http://schema.org/episodeNumber ! (having 'Number' as a term within the > property might cause some confusion) Make it easy for them to always use the > property by explaining well how it can be used, good definitions, good > examples, etc (whatever it may be called), and you make my job / life easier > as a consumer of all their data. Precisely. But how to make it easy for publishers to transmit the correct (or at least close-to-correct) information? Is episodeNumber with type integer good for this? I expect not, particularly as there are lots of exceptions to episode numbers being integers. Is episode with type string good for this? Maybe. At least it allows for just about every conceivable situation, at the price of maybe losing too much of the desired meaning of episode ordering. Perhaps instead there should be a new type, something like partially-numeric-string that would have most of the freedom of string but preserve some of the meaning of integer. Perhaps there should instead be a division into providing episode IDs, which can be numbers or strings, and also providing explicit ordering information. Or perhaps episode ordering is even more complex and should be published in some other fashion. (Consider the various orderings of The Prisoner episodes.) > My personal preference for all of these kind of datatype issues where "Numeric > values that might NOT ALWAYS GET PUBLISHED as numeric values" has always been > "string", since I (and the Publishers) will be hanging off of a really cool > "Thing property" we never had before, then the interpretation is very trivial > typically for me. One problem with just using string for everything is that the meaning of the values can be hard to decipher. How, for example, should you sort the values you get? > -- > -Thad peter
Received on Tuesday, 12 August 2014 21:17:32 UTC