Re: schema.org and proto-data, was Re: schema.org as reconstructed from the human-readable information at schema.org

I think our ongoing discussion in this thread is evidence that consensus and thus consistent representations are hard at Web scale.

Every single query on Google is non-deterministic, in terms of results, ranking, timing, etc.

Anyway, my feeling is that this discussion is getting more and more useless, so I will step out.

---------------------------------------
martin hepp
www:  http://www.heppnetz.de/
email: mhepp@computer.org


> On 30.10.2013, at 19:56, Peter Patel-Schneider <pfpschneider@gmail.com> wrote:
> 
> I don't understand.
> 
> You appear to be saying that there is some non-determinism in the processing of Web data.  That doesn't make sense.
> 
> Perhaps you meant to say that there is always going to be some non-determinism in the creation of Web data, perhaps because the creators aren't attuned to the same concerns that specifiers of Web schemas and other formalisms are.   Well, maybe, and maybe this is also unavoidable, but certainly there are good avenues to reduce this problem, including both good examples of use and good descriptions of the formalism itself.   
> 
> Absent both of these it is certain that there is going to be a lot of seemingly random data and seemingly random processing.   My hope is that there will very soon be much better information available on schema.org that will help a lot.
> 
> peter
> 
> 
> 
>> On Wed, Oct 30, 2013 at 8:24 AM, Martin Hepp <martin.hepp@unibw.de> wrote:
>> > Peter,
>> >
>> >  I don't think Martin implied that there was some kind of mystical, non-deterministic process involved in using schema.org markup or that it could only be consumed by major search players.
>> >
>> >  I believe what he was saying (and I concur) is that when you have millions of authors providing data, there are so many errors and misinterpretations (i.e., noise) that consuming it and constructing something meaningful out of it could be non-trivial. Expecting all authors to make the kind of subtle (but important to certain academic communities) distinctions might be too much.
>> >
>> > guha
>> 
>> Indeed, this is what I tried to say.
>> 
>> With "non-deterministic" I mean that schemas at Web scale do not "guarantee" the outcomes of computational operations over the respective instance data in any way near to how schemas in closed, controlled database settings do (at least in theory). Instead, they are limited to influencing the probabilities of the respective operations.
>> 
>> My main claim, outlined in my EKAW 2012 keynote (video here: https://vimeo.com/51152934) is that consuming data based on shared conceptual structures at Web scale is a probabilistic setting. There are no guarantees of which results will come out of computational operations over the data.
>> 
>> In essence, I state that shared conceptual structures at Web scale do to the nature of data processing something similar to what Heisenberg's uncertainty principle [2] did to the world of physics.
>> 
>> For instance, the more precisely you define the semantics of a conceptual element, the less likely will it become that the provider and consumer of data associate the exact same set of entities with that type.
>> 
>> I hope to elaborate that a little bit further in writing, but that is what I can contribute at that point.
>> 
>> Note that this view goes radically further than the idea of "noise", "data quality issues", and "data provenance", because those terms are rooted in the notion of a controlled, relatively static setting, which the Web is clearly not.
>> 
>> Martin
>> 
>> 
>> [1] From Ontologies to Web Ontologies: Lessons learned from Conceptual Modeling for the WWW
>> Keynote talk at EKAW 2012, Galway, Ireland. https://vimeo.com/51152934
>> [2] http://en.wikipedia.org/wiki/Uncertainty_principle

Received on Wednesday, 30 October 2013 22:46:48 UTC