- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Wed, 27 Aug 2008 18:42:08 -0400
Kristof Zelechovski wrote: > We have two options for having both human-readable and > machine-readable information in a document: write the structure and > generate the text or write the text and recover the structure. At the > very least, if you insist on having both, there must be a mechanism to > verify that they are consistent. As both Ben and Toby already pointed out, there doesn't have to be any sort of consistency mechanism. The web today works without a logical consistency checking mechanism in any of the current publishing languages. We are not striving for perfect semantic data with RDFa on day one - we are striving for an easy mechanism for expressing semantic data in HTML family languages. It is understood that there may be data that is corrupt and that is okay. There is an area of semantic web development that deals with the concept of provenance and validation. You can even apply statistical models to catch logical inconsistencies, but those models need a core set of semantic information to be of any use. RDFa must happen before any sort of statistical model can be created for checking logical consistency between HTML text and semantic data. The other approach is the use of Natural Language Processing (NLP) to address the HTML/RDFa logical consistency issue. Solving the problem of NLP has proven to be quite a tough nut to crack. Computer scientists have been working on the problem for decades and a general purpose solution is nowhere in sight. The good news is that we don't need to solve this problem, as the web contains logical inconsistencies in it's content today without affecting the positive global utility of the system. Assume, however, that the NLP problem is solved. RDFa still provides a mechanism that is useful in a post-NLP web. That is, one of content validation. There will come a time where we mis-state facts on a web page but the machine generated semantic data on the page is correct. The counter to this scenario is also possible - where we state the correct facts on a web page, but the machine generated semantic data on the page is incorrect. In both cases, the consistency verification process would use both the RDFa data as well as the NLP data in determining the location of the inconsistency. A human could then be brought in to correct the inconsistency - or a solution could be reasoned out by the computer using the same statistical method as mentioned previously. The possibilities don't stop there - we can utilize RDFa to get to the stage of NLP by using RDFa markup to teach computers how to reason. Think of RDFa as a NLP stepping-stone that could be used by researchers in the field. Building NLP databases are a very expensive, manual and thus and time consuming process. However, humans are publishing data that would be useful to a NLP system every day - in blog posts, wikipedia entries, e-commerce sites, news feeds, maps and photo streams. RDFa-assisted NLP would be one such approach that researchers could use to get us to the eventual goal of true NLP. I hope that outlines the possibilities and shows how RDFa is a major part in realizing each scenario outlined above. Any questions on the concepts outlined in this e-mail? -- manu -- Manu Sporny President/CEO - Digital Bazaar, Inc. blog: Bitmunk 3.0 Website Launches http://blog.digitalbazaar.com/2008/07/03/bitmunk-3-website-launches
Received on Wednesday, 27 August 2008 15:42:08 UTC