[whatwg] RDFa statement consistency (was: RDFa Basics Video (8 minutes)) from Manu Sporny on 2008-08-27 (public-whatwg-archive@w3.org from August 2008)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Wed, 27 Aug 2008 18:42:08 -0400
Message-ID: <48B5D840.5000704@digitalbazaar.com>
Kristof Zelechovski wrote:
> We have two options for having both human-readable and
> machine-readable information in a document: write the structure and
> generate the text or write the text and recover the structure.  At the
> very least, if you insist on having both, there must be a mechanism to
> verify that they are consistent.

As both Ben and Toby already pointed out, there doesn't have to be any
sort of consistency mechanism. The web today works without a logical
consistency checking mechanism in any of the current publishing languages.

We are not striving for perfect semantic data with RDFa on day one - we
are striving for an easy mechanism for expressing semantic data in HTML
family languages.

It is understood that there may be data that is corrupt and that is
okay. There is an area of semantic web development that deals with the
concept of provenance and validation. You can even apply statistical
models to catch logical inconsistencies, but those models need a core
set of semantic information to be of any use. RDFa must happen before
any sort of statistical model can be created for checking logical
consistency between HTML text and semantic data.

The other approach is the use of Natural Language Processing (NLP) to
address the HTML/RDFa logical consistency issue. Solving the problem of
NLP has proven to be quite a tough nut to crack. Computer scientists
have been working on the problem for decades and a general purpose
solution is nowhere in sight. The good news is that we don't need to
solve this problem, as the web contains logical inconsistencies in it's
content today without affecting the positive global utility of the system.

Assume, however, that the NLP problem is solved. RDFa still provides a
mechanism that is useful in a post-NLP web. That is, one of content
validation. There will come a time where we mis-state facts on a web
page but the machine generated semantic data on the page is correct. The
counter to this scenario is also possible - where we state the correct
facts on a web page, but the machine generated semantic data on the page
is incorrect. In both cases, the consistency verification process would
use both the RDFa data as well as the NLP data in determining the
location of the inconsistency. A human could then be brought in to
correct the inconsistency - or a solution could be reasoned out by the
computer using the same statistical method as mentioned previously.

The possibilities don't stop there - we can utilize RDFa to get to the
stage of NLP by using RDFa markup to teach computers how to reason.
Think of RDFa as a NLP stepping-stone that could be used by researchers
in the field. Building NLP databases are a very expensive, manual and
thus and time consuming process. However, humans are publishing data
that would be useful to a NLP system every day - in blog posts,
wikipedia entries, e-commerce sites, news feeds, maps and photo streams.
 RDFa-assisted NLP would be one such approach that researchers could use
to get us to the eventual goal of true NLP.

I hope that outlines the possibilities and shows how RDFa is a major
part in realizing each scenario outlined above. Any questions on the
concepts outlined in this e-mail?

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.0 Website Launches
http://blog.digitalbazaar.com/2008/07/03/bitmunk-3-website-launches
Received on Wednesday, 27 August 2008 15:42:08 UTC