- From: Sandro Hawke <sandro@w3.org>
- Date: Wed, 19 Mar 2003 22:48:07 -0500
- To: Joseph Reagle <reagle@w3.org>
- cc: www-archive@w3.org
[taken to www-archive, bcc'd to one or more private lists] I want to argue for a position which I think EricP has espoused but which does not fit into your strawman deliverable. It seems aligned with something you said recently about XML schema's "any" being counterproductive. Basically, it is that using open/wildcard grammar productions is not much better than using no grammar, so let's solve this problem within the constraints of DTDs. I can't quite grasp the 5-year perspective on XML right now, but I do know that it is nice when emacs (in XML-mode) tells me what elements and attributes are legal at any given point, and fills in the required ones. It's nice when C-c C-v (validate) tells me about any syntax errors in my data/document. How can we support that kind of behavior in a mixed-namespace environment? Validation comes from having a grammar for the language being parsed. If the language is extensible in an unknown way, you can't validate it! At best you can validate the parts you happen to recognize; how predictable is that going to be? The only way I can see to get validation in an extensible language is to get the grammars for the extensions via the network. If we want validation, syntax directed editing, PSVI datatype information, etc, we need to either (1) give up ad-hoc extensibility (manually configure your system for each DOCTYPE) or (2) use the web. [ I guess there's a whole camp of XML folks who have no use for validation. Well-formedness is plenty of syntax checking for them. I think that view out of scope here, though, since they can just stick the RDF/XML in their HTML and everything works fine, more or less. ] Using the web to download grammars can be done in two ways, depending on whether the document consumer or producer gets the burden. Someone has to gather up the grammars for each sub-language (namespace) and find the intersection language. That should probably be producer (the instance publisher). In short, if you want to author using HTML, SVG, and MathML, you should construct a downloadable DTD/schema/etc for that combined language. This is of course being done already for certain namespaces, but Eric's idea (as I understand it) is to make this trivially easy and recommended for the consumer. How many XML vocabularies are there? Obviously constructing every combination by hand is not feasible. RDF complicates this furthur, but can be greatly helped by the same grammar-merge/construction idea. RDF/XML of course has no grammar in the XML sense, because it's kind of off a level. Or half a level. To compare apples to apples you need to compare HTML, SVG, and MathML with RDF/XML-DublinCore, RDF/XML-FOAF, and RDF/XML-CreativeCommons. For each RDF vocabulary it's possible to construct a DTD. And if you do that, then people authoring the RDF information get validation, syntax-directed editing, etc. (Arguably they also get datatyping, but I'm too afraid to think through the implication of that.) Yeah, RDF/XML is stuck between levels. The obvious, clean levels are <Triple> <subject><URI>http://www.w3.org/People/EM/#me</URI></subject> <predicate><URI>http://example.com/LastModified</URI></predicate> <object><Literal>2003-03-04<Literal></object> </Triple> and <PageInformationRecord> <page>http://www.w3.org/People/EM/#me</page> <lastMod>2003-03-04</lastMod> </PageInformationRecord> which are both easy to handle with DTDs. In the second, the DTD depends on the domain of discourse, so there's XML support for validating the data itself. The first is domain-neutral and validation does much less for you. RDF/XML makes you straddle the fence. This is just about as painful as it sounds. (But changing this fact is chartered to be out-of-scope for the RDF Core WG. It was supposed to be follow-on work, but RDF Core WG was supposed to be done long ago.) The proposal, then, is that when you author in XML, you should identify all the vocabularies you are going to use, use some tool to construct a DTD/schema/grammar for a markup language which combines all their elements, and then use that markup language. As with all XML content, if you want computer systems to be able to use data from multiple sources using different markup languages, you'll need to program them with the semantics of those languages. That programming can often be automatic (ie annotated or translation grammars, schema annotation), and isn't really any harder with merged grammars than the original separated grammars. Now I just need to get my prototype of that integration program (xmortar [1]) working.... I'm approaching it backwards, generating a DTDs from a merged ontology. That solves the merging problem and makes grammar-annotation fairly easy, at the cost of some hairy issues about handling collections, defaults, cardinality, .... Of course any attempt to convey RDF in XML runs afoul of the doc#name naming convention conflicting with XPointer. -- sandro [1] http://www.w3.org/2003/04/xmortar
Received on Wednesday, 19 March 2003 22:48:21 UTC