Proposal: "Canonical" RDF/XML

Hi all,

I've been thinking about how to "canonicalize" the RDF/XML syntax, so 
that the same graph (with namespaces/anonymous nodes labeled the same 
way) always produces the same output file. A major application would be 
to interact well with textual 'diff'/'merge' and versioning systems like 
CVS-- if the RDF is formatted differently on every save, these tools 
lose their value.

Does anybody know whether there are proposals/implementations for 
something like this already?

My idea is to use the following rules:

- All triples with the same subject are collected in a single 
<rdf:Description> element (which is a child of the <rdf:RDF> element). 
Each <rdf:Description> has a rdf:about or rdf:nodeID attribute.
- A triple "a x:prop b" is represented as <x:prop rdf:resource="b"/> 
inside the <rdf:Description> of a. Similar for triples with literal 
values. Blank node values are identified through rdf:nodeID.
- The <rdf:Description> elements are ordered by subject.
- The property elements inside an <rdf:Description> are ordered first by 
property, then by object of the triple.
- Each <rdf:Description> and </rdf:Description> is on its own line, not 
indented. Each property element is on its own, single line (except for 
multiline literals), indented two spaces.
- All namespace declarations are on the <rdf:RDF> element.
- Canonical XML is applied.

For example, the following graph:

     <http://example.org/DOC/12>   dc:author   _:lucia
     <http://example.org/DOC/12>   dc:title    "Kitchen Can Openers (II)"
     <http://example.org/DOC/24>   dc:author   _:lucia
     <http://example.org/DOC/24>   dc:title    "About Frogs"

     _:lucia                       rdf:type    ex:Person
     _:lucia                       ex:age      "27"

would be serialized like this:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
xmlns:dc="http://purl.org/dc/elements/1.1/" 
xmlns:ex="http://example.org/stuff/1.0/">
<rdf:Description rdf:nodeID="lucia">
   <ex:age>27</ex:age>
   <rdf:type rdf:resource="http://example.org/stuff/1.0/Person"/>
</rdf:Description>
<rdf:Description rdf:about="http://example.org/DOC/12">
   <dc:author rdf:nodeID="lucia"/>
   <dc:title>Kitchen Can Openers (II)</dc:title>
</rdf:Description>
<rdf:Description rdf:about="http://example.org/DOC/24">
   <dc:author rdf:nodeID="lucia"/>
   <dc:title>About Frogs</dc:title>
</rdf:Description>
</rdf:RDF>


What do you think, is this a sensible approach? (Can it serialize 
everything that can be serialized in RDF/XML? -- I think so.)

Thanks,
- Benja

Received on Saturday, 28 June 2003 22:37:15 UTC