Re: revisiting 'position', 'episodeNumber', 'seasonNumber' modeling and Periodicals

Peter:
Quite clearly, more granular and better specified  data representation makes the consumption of data easier for any client in any environment. However, the Web is, as we all know, a vast, distributed data environment. It is not possible to turn the Web into a cleanroom for data. Simply raising the bar for data will simply reduce the number of sites that publish data and reduce the compliance of published data with the specs.

The challenge in building Web vocabularies lies in finding sweet spots in terms of entity types, relationship types, and data granularity that are easy to be populated from existing data sources and that reduce the effort for the consuming client over pure information extraction from HTML.

And of course 10.2 can be a float, decimal or other numerical datatype, but it can also be something entirely different, e.g. a numerical identifier with delimiters. 

Classroom 10.3 minus classroom 10.2 is not a valid operation on classroom identifiers and does not return 0.1.

Telephone "numbers" are not really integer numbers, but numerical identifiers, this is why +49-89-6004-4217 is not -10261 but my office phone number.

So let's not be too normative when defining conceptual data structures for data interchange at Web scale.

Martin


On 12 Aug 2014, at 17:19, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:

> That's putting quite a burden on the client.
> 
> Big clients, such as those that can be produced by Google and Yahoo, might have access to sufficient information to handle situations where data routinely does not match the schema.  However, small and medium clients are going to be in a much worse situation.  How are they to proceed?
> 
> peter
> 
> 
> On 08/12/2014 06:44 AM, Martin Hepp wrote:
>> In general, I would leave the interpretation / cleansing to the client in here rather than constraining the publication of data too much. In the end, data quality and data semantics are twins, see
>> 
>> http://link.springer.com/chapter/10.1007%2F978-3-540-39648-2_2
>> 
>> Best
>> 
>> Martin
>> 
>> 
>> 
>> On 12 Aug 2014, at 15:08, Dan Brickley <danbri@google.com> wrote:
>> 
>>> On 12 August 2014 14:03, Evain, Jean-Pierre <evain@ebu.ch> wrote:
>>>> By non-integer I guess you mean e.g. string (or else) - then fine
>>> 
>>> Yes. It's possible people sometimes number with 10.1, 10.2 etc too,
>>> which technically looks like a Float though they're not intended for
>>> mathematical use really...
>>> 
>>> Dan
>>> 
>> 
>> 

Received on Tuesday, 12 August 2014 15:56:49 UTC