embedding RDF in XHTML from Sandro Hawke on 2003-03-20 (www-archive@w3.org from March 2003)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 19 Mar 2003 22:48:07 -0500
To: Joseph Reagle <reagle@w3.org>
cc: www-archive@w3.org
Message-Id: <200303200348.h2K3m8G29830@wadimousa.hawke.org>
[taken to www-archive, bcc'd to one or more private lists]

I want to argue for a position which I think EricP has espoused but
which does not fit into your strawman deliverable.  It seems aligned
with something you said recently about XML schema's "any" being
counterproductive.  Basically, it is that using open/wildcard grammar
productions is not much better than using no grammar, so let's solve
this problem within the constraints of DTDs.

I can't quite grasp the 5-year perspective on XML right now, but I do
know that it is nice when emacs (in XML-mode) tells me what elements
and attributes are legal at any given point, and fills in the required
ones.  It's nice when C-c C-v (validate) tells me about any syntax
errors in my data/document.

How can we support that kind of behavior in a mixed-namespace
environment?  Validation comes from having a grammar for the language
being parsed.  If the language is extensible in an unknown way, you
can't validate it!  At best you can validate the parts you happen to
recognize; how predictable is that going to be?  The only way I can
see to get validation in an extensible language is to get the grammars
for the extensions via the network.  If we want validation, syntax
directed editing, PSVI datatype information, etc, we need to either
(1) give up ad-hoc extensibility (manually configure your system for
each DOCTYPE) or (2) use the web.

[ I guess there's a whole camp of XML folks who have no use for
validation.  Well-formedness is plenty of syntax checking for them.  I
think that view out of scope here, though, since they can just stick
the RDF/XML in their HTML and everything works fine, more or less. ]

Using the web to download grammars can be done in two ways, depending
on whether the document consumer or producer gets the burden.  Someone
has to gather up the grammars for each sub-language (namespace) and
find the intersection language.  That should probably be producer (the
instance publisher).  

In short, if you want to author using HTML, SVG, and MathML, you
should construct a downloadable DTD/schema/etc for that combined
language.  This is of course being done already for certain
namespaces, but Eric's idea (as I understand it) is to make this
trivially easy and recommended for the consumer.  How many XML
vocabularies are there?   Obviously constructing every combination by
hand is not feasible.

RDF complicates this furthur, but can be greatly helped by the same
grammar-merge/construction idea.

RDF/XML of course has no grammar in the XML sense, because it's kind
of off a level.  Or half a level.  To compare apples to apples you
need to compare HTML, SVG, and MathML with RDF/XML-DublinCore,
RDF/XML-FOAF, and RDF/XML-CreativeCommons.  For each RDF vocabulary
it's possible to construct a DTD.  And if you do that, then people
authoring the RDF information get validation, syntax-directed editing,
etc.  (Arguably they also get datatyping, but I'm too afraid to think
through the implication of that.)

Yeah, RDF/XML is stuck between levels.  The obvious, clean levels are

   <Triple>
      <subject><URI>http://www.w3.org/People/EM/#me</URI></subject>
      <predicate><URI>http://example.com/LastModified</URI></predicate>
      <object><Literal>2003-03-04<Literal></object>
   </Triple>

and

   <PageInformationRecord>
      <page>http://www.w3.org/People/EM/#me</page>
      <lastMod>2003-03-04</lastMod>
   </PageInformationRecord>

which are both easy to handle with DTDs.  In the second, the DTD
depends on the domain of discourse, so there's XML support for
validating the data itself.  The first is domain-neutral and
validation does much less for you.

RDF/XML makes you straddle the fence.  This is just about as painful
as it sounds.  (But changing this fact is chartered to be out-of-scope
for the RDF Core WG.   It was supposed to be follow-on work, but RDF
Core WG was supposed to be done long ago.)

The proposal, then, is that when you author in XML, you should
identify all the vocabularies you are going to use, use some tool to
construct a DTD/schema/grammar for a markup language which combines
all their elements, and then use that markup language.  As with all
XML content, if you want computer systems to be able to use data from
multiple sources using different markup languages, you'll need to
program them with the semantics of those languages.  That programming
can often be automatic (ie annotated or translation grammars, schema
annotation), and isn't really any harder with merged grammars than the
original separated grammars.

Now I just need to get my prototype of that integration program
(xmortar [1]) working....  I'm approaching it backwards, generating a
DTDs from a merged ontology.  That solves the merging problem and
makes grammar-annotation fairly easy, at the cost of some hairy issues
about handling collections, defaults, cardinality, ....  Of course any
attempt to convey RDF in XML runs afoul of the doc#name naming
convention conflicting with XPointer.


     -- sandro

[1] http://www.w3.org/2003/04/xmortar
Received on Wednesday, 19 March 2003 22:48:21 UTC