Re: schema.org and proto-data, was Re: schema.org as reconstructed from the human-readable information at schema.org from Peter Patel-Schneider on 2013-10-28 (public-vocabs@w3.org from October 2013)

From: Peter Patel-Schneider <pfpschneider@gmail.com>
Date: Mon, 28 Oct 2013 08:31:12 -0700
To: Martin Hepp <martin.hepp@unibw.de>
Cc: Guha <guha@google.com>, W3C Vocabularies <public-vocabs@w3.org>
Message-ID: <CAMpDgVyBQEj52oyG-P71qjA0_oKktbPFBEAhOpcqRzC+jNCZqw@mail.gmail.com>

That's an awfully depressing view of schema.org.  Do you really mean to say
that there is no role for small or medium players in this game at all, even
if they are only interested in some of the data?  Just why do you say
this?  Is it because there is something inherent in the data that requires
this processing?  Is it because there is something inherent in the
producers of the data that requires this processing?  Is it because there
is something inherent in the current consumers of the data the requires
this processing?  Is there something inherent in the
schema.orgspecification that requires this processing?  Is there
something that can
be fixed that will allow small or medium players to consume schema.org data?

My hope here, and maybe it is a forlorn one, is precisely that more
consumers can use the information that is being put into web pages using
the schema.org setup.  Right now it appears that only the major search
players know enough about schema.org to be able to consume the
information.  Of course, I do think that conceptual clarity will help here,
but I do realize that in an endeavor like this one there are always going
to be problems with underspecified or incorrectly specified or incorrect
data.  I don't think, however, that this prevents small and medium players
from being consumers of schema.org data.

Knowledge representation, just like databases, has from the beginning been
concerned with more than just simple notions of entailment and
computational complexity, so I would not say that issues related to data
quality and intent are outside of traditional knowledge representation.

peter

PS: Do you really mean to say that processing schema.org data requires
non-deterministic computation?   What would require this?  What sorts of
non-traditional computation is required?

On Mon, Oct 28, 2013 at 2:46 AM, Martin Hepp <martin.hepp@unibw.de> wrote:

> Peter:
> Note that schema.org sits between millions of owners of data (Web
> masters) and large, centralized consumers of Big Data, who apply hundreds
> of heuristics before using the data.
> Schema.org is an interface between Webmaster minds, data structures in
> back-end RDBMS driving Web sites, and search engines (and maybe other types
> of consumers).
>
> The whole environment heavily relies on
> 1. probabilistic processing
> 2. the quality of the understanding between the many minds (developers) in
> this eco-system.
>
> Traditional measures of conceptual clarity and guarantees / deterministic
> data processing are of very limited relevance in that setting.
>
> For instance, if you introduce a conceptual distinction which is very
> valuable and justified from an experts perspective, this may often not lead
> to more reliable data processing, since the element may be used more
> inconsistently among Web developers (or the distinctions may not be
> reliable represented in the databases driving the sites).
>
> Looking at schema.org from the perspective of knowledge representation in
> the traditional sense is insufficient, IMO. YOu have to look at the data
> ecosystem as a whole.
>
> Information exposed using schema.org meta-data is what I would call
> proto-data, not ready for direct consumption by deterministic computational
> operations.
>
> Martin
>

Received on Monday, 28 October 2013 15:31:40 UTC