easy RDF from XML (was RDFa + RDF/XML Considered Harmful?) from Paul Tyson on 2008-07-16 (public-lod@w3.org from July 2008)

From: Paul Tyson <phtyson@sbcglobal.net>
Date: Tue, 15 Jul 2008 22:10:26 -0500
To: SW-forum Web <semantic-web@w3.org>
CC: public-lod@w3.org
Message-ID: <487D66A2.4080105@sbcglobal.net>

Mark Birbeck wrote:
> 
> I did think though, that one of the things about the RDF/XML structure
> was an attempt to enable many XML layouts to be interpreted as RDF.
> But obviously that's enormously difficult.
> 

The striping design of RDF/XML, by design or accident, makes it very 
well suited to be the target of XSLT transformations.  See 
http://lists.w3.org/Archives/Public/semantic-web/2008Jul/0037.html for a 
stylesheet that will transform any XML document to Infoset RDF/XML.  You 
could of course write out the RDF graph in any other notation you 
choose, but RDF/XML is no more difficult than another.

Infoset RDF might not be a big step forward, but at least it puts you 
into the RDF world where you can merge graphs and do whatever semantic 
processing you like.

What we would really like to do is vivify the meaning that the XML 
author was aiming for when he marked up the character stream in the 
first place.  We won't get at that meaning from the grammar alone; we 
must look at the semantics of the markup itself.  The direction was 
pointed years ago in this article: 
http://xml.coverpages.org/xmlAndSemantics.html, and possibly in other 
articles undiscovered to me.

In this discussion I will set aside DTDs and XML Schemas and all other 
such tools of the grammarians and computer scientists; for I wish to 
focus on the basic semantic gestures of markup itself.  Structural 
markup, as in SGML and XML, is a means of breaking up a sequence of 
characters into components of interest.  The syntactical rules for 
well-formed XML enable a primitive--yet reliable and robust--set of 
semantic gestures, to wit:
	- naming (components of interest can be named)
	- attributing (components can have properties)
	- sequence (a component can have a positional predecessor)
	- containment (a component can be contained in another)

Nothing could be easier than making an RDFS vocabulary of these notions.
And it is only slightly harder to modify the stylesheet referenced above 
to emit RDF/XML using this vocabulary.  (If I were to implement this I 
would add a "Chunk" class to contain character strings, instead of 
representing them as sequences of named things with a common parent.) So 
you can have, with very little effort, a system that reveals, for any 
XML instance, the fundamental semantic gestures of its author.

In XML, as in natural language, we have many ways of expressing nearly 
the same meaning.  If we must decide if two utterances have the same 
meaning, we cannot do it by comparing the sounds of the utterances--we 
must consult some rules about the language: word definitions, 
grammatical rules, and usage conventions.  Just so with XML--it is 
useless to compare the surface structure.  We must first of all expose 
the semantic structure of each instance, then apply some rules of 
synonymy.  Putting an XML document into some such RDF as described above 
makes it easier to apply these rules.

--Paul

Received on Wednesday, 16 July 2008 03:09:50 UTC