GRDDL (split off from: Structured vs. Unstructured)

> When I first raise my "confusion" (not "objection") over Davide's
> "unstructured-to-structured" wording, my intension was to clarify  
> what kind
> of problems that the bioRDF group attempts to tackle More  
> specifically, I
> was refering it within the context of "GRDDL" because IMHO, I don't  
> think
> GRDDL is designed to help RDF-ize natural language; GRDDL is  
> designed to
> specifically target at the XML-based documents. Because the draft  
> proposal
> of the bioRDF only says: "Learn about GRDDL, SPARQL, OWL, etc.", I  
> want to
> clarify where they are heading.

There are several threads ongoing here, and I'm going to split this  
one off from "Structured vs. Unstructured".

Xiaoshu, like you, my focus of interest here is GRDDL specifically.  
Let me just give you "take" on GRDDL and hopefully Eric Miller and/or  
others can help correct any misconceptions I have.

My understanding of GRDDL is that it was originally proposed in the  
(X)HTML community. The problem it was intended to address is that  
there is no way of validating arbitrary RDF using XML schema (in  
other words, there is no XSD for RDF, because XML schema is  
insufficiently expressive). Consequently for XML instances that are  
intended to be validated according to some schema--and this could  
include (X)HTML--RDF embedding requires some kind of "expedient",  
otherwise the RDF will "break" the schema and render the instance non- 
validatable.

Many "expedients" for embedding the RDF will work--for example  
separating out the RDF into an appinfo element, attaching it as a  
separate file, hiding it inside CDATA--and all of these have been  
tried successfully in one or another application setting. But the (X) 
HTML community wanted a *web-standard* way of embedding RDF in such a  
way that the semantic intent ("I hereby officially declare to the WWW  
that this RDF is inseparably part of the semantics of this XML  
instance.") would be clear.

GRDDL allows the instance author to make the public declaration above  
by referencing the URL of some xml transform, that the author thereby  
publicly identifies as the "key" to extract the intended RDF from his  
instance. In this very nice way, GRDDL allows the instance author the  
freedom to package his/her RDF any way he/she pleases, so long as he/ 
she also provides the "decoder ring" of an xml transform to extract  
it. Furthermore, the author's statement of semantic inseparability is  
explicitly entailed by his/her use of the GRDDL standard to render  
the RDF.

Eric, once again, if I'm getting any of this wrong, correct me...

It's always been my understanding that the primary use case for GRDDL  
is the one where the instance author explicitly has in mind a  
"finished" set of RDF triples that he/she wants to embed. He/she  
"encodes" these triples, packages them into the instance XML, assigns  
the intended extraction transform a url, attaches that, and sends the  
resulting instance document off into the world. Easy peasy.

But now here's the part that I (and I think maybe also Xiaoshu)  
aren't sure about.

Question #1 (which Eric has already answered in the affirmative):  
Will this work for non-(X)HTML too? Answer: yes. And this is  
important because most healthcare records documents aren't (X)HTML.

Question #2: Will this work for the case where the instance author  
**doesn't** explicitly know the actual RDF triple set up front, and  
the referenced extraction transform is actually acting as a "language  
processor" to generate triples "that thereby see the light for the  
first time"?

Question #3: If the answer to #2 is "yes", then is there a  
conceivable extension to GRDDL where the GRDDL url is not just an xml  
transform, but ---for example-- a web service fronting for some kind  
of natural language processor??

Received on Tuesday, 14 February 2006 17:48:03 UTC