Re: schema.org and proto-data, was Re: schema.org as reconstructed from the human-readable information at schema.org from Peter Patel-Schneider on 2013-10-30 (public-vocabs@w3.org from October 2013)

From: Peter Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 30 Oct 2013 11:56:55 -0700
To: Martin Hepp <martin.hepp@unibw.de>
Cc: Guha <guha@google.com>, W3C Vocabularies <public-vocabs@w3.org>
Message-ID: <CAMpDgVzAj0yscELnhQLG3w3OwQuWSQevexJdWjjoQjOjyx1f8Q@mail.gmail.com>

I don't understand.

You appear to be saying that there is some non-determinism in the
processing of Web data.  That doesn't make sense.

Perhaps you meant to say that there is always going to be some
non-determinism in the creation of Web data, perhaps because the creators
aren't attuned to the same concerns that specifiers of Web schemas and
other formalisms are.   Well, maybe, and maybe this is also unavoidable,
but certainly there are good avenues to reduce this problem, including both
good examples of use and good descriptions of the formalism itself.

Absent both of these it is certain that there is going to be a lot of
seemingly random data and seemingly random processing.   My hope is that
there will very soon be much better information available on
schema.orgthat will help a lot.

peter



On Wed, Oct 30, 2013 at 8:24 AM, Martin Hepp <martin.hepp@unibw.de> wrote:

> > Peter,
> >
> >  I don't think Martin implied that there was some kind of mystical,
> non-deterministic process involved in using schema.org markup or that it
> could only be consumed by major search players.
> >
> >  I believe what he was saying (and I concur) is that when you have
> millions of authors providing data, there are so many errors and
> misinterpretations (i.e., noise) that consuming it and constructing
> something meaningful out of it could be non-trivial. Expecting all authors
> to make the kind of subtle (but important to certain academic communities)
> distinctions might be too much.
> >
> > guha
>
> Indeed, this is what I tried to say.
>
> With "non-deterministic" I mean that schemas at Web scale do not
> "guarantee" the outcomes of computational operations over the respective
> instance data in any way near to how schemas in closed, controlled database
> settings do (at least in theory). Instead, they are limited to influencing
> the probabilities of the respective operations.
>
> My main claim, outlined in my EKAW 2012 keynote (video here:
> https://vimeo.com/51152934) is that consuming data based on shared
> conceptual structures at Web scale is a probabilistic setting. There are no
> guarantees of which results will come out of computational operations over
> the data.
>
> In essence, I state that shared conceptual structures at Web scale do to
> the nature of data processing something similar to what Heisenberg's
> uncertainty principle [2] did to the world of physics.
>
> For instance, the more precisely you define the semantics of a conceptual
> element, the less likely will it become that the provider and consumer of
> data associate the exact same set of entities with that type.
>
> I hope to elaborate that a little bit further in writing, but that is what
> I can contribute at that point.
>
> Note that this view goes radically further than the idea of "noise", "data
> quality issues", and "data provenance", because those terms are rooted in
> the notion of a controlled, relatively static setting, which the Web is
> clearly not.
>
> Martin
>
>
> [1] From Ontologies to Web Ontologies: Lessons learned from Conceptual
> Modeling for the WWW
> Keynote talk at EKAW 2012, Galway, Ireland. https://vimeo.com/51152934
> [2] http://en.wikipedia.org/wiki/Uncertainty_principle
>
>

Received on Wednesday, 30 October 2013 18:57:23 UTC