- From: Paul Tyson <phtyson@sbcglobal.net>
- Date: Tue, 15 Jul 2008 22:10:26 -0500
- To: SW-forum Web <semantic-web@w3.org>
- CC: public-lod@w3.org
Mark Birbeck wrote: > > I did think though, that one of the things about the RDF/XML structure > was an attempt to enable many XML layouts to be interpreted as RDF. > But obviously that's enormously difficult. > The striping design of RDF/XML, by design or accident, makes it very well suited to be the target of XSLT transformations. See http://lists.w3.org/Archives/Public/semantic-web/2008Jul/0037.html for a stylesheet that will transform any XML document to Infoset RDF/XML. You could of course write out the RDF graph in any other notation you choose, but RDF/XML is no more difficult than another. Infoset RDF might not be a big step forward, but at least it puts you into the RDF world where you can merge graphs and do whatever semantic processing you like. What we would really like to do is vivify the meaning that the XML author was aiming for when he marked up the character stream in the first place. We won't get at that meaning from the grammar alone; we must look at the semantics of the markup itself. The direction was pointed years ago in this article: http://xml.coverpages.org/xmlAndSemantics.html, and possibly in other articles undiscovered to me. In this discussion I will set aside DTDs and XML Schemas and all other such tools of the grammarians and computer scientists; for I wish to focus on the basic semantic gestures of markup itself. Structural markup, as in SGML and XML, is a means of breaking up a sequence of characters into components of interest. The syntactical rules for well-formed XML enable a primitive--yet reliable and robust--set of semantic gestures, to wit: - naming (components of interest can be named) - attributing (components can have properties) - sequence (a component can have a positional predecessor) - containment (a component can be contained in another) Nothing could be easier than making an RDFS vocabulary of these notions. And it is only slightly harder to modify the stylesheet referenced above to emit RDF/XML using this vocabulary. (If I were to implement this I would add a "Chunk" class to contain character strings, instead of representing them as sequences of named things with a common parent.) So you can have, with very little effort, a system that reveals, for any XML instance, the fundamental semantic gestures of its author. In XML, as in natural language, we have many ways of expressing nearly the same meaning. If we must decide if two utterances have the same meaning, we cannot do it by comparing the sounds of the utterances--we must consult some rules about the language: word definitions, grammatical rules, and usage conventions. Just so with XML--it is useless to compare the surface structure. We must first of all expose the semantic structure of each instance, then apply some rules of synonymy. Putting an XML document into some such RDF as described above makes it easier to apply these rules. --Paul
Received on Wednesday, 16 July 2008 03:09:50 UTC