Change tracking processing instructions and deletion from Nigel Whitaker on 2014-08-27 (public-change@w3.org from August 2014)

From: Nigel Whitaker <nigel.whitaker@deltaxml.com>
Date: Wed, 27 Aug 2014 15:59:04 +0100
To: public-change@w3.org
Message-Id: <36B04505-DBD9-47C9-9E7E-F273A4AD594C@deltaxml.com>

Hello everyone,

There's an aspect of the existing change tracking PIs that are used in a number of systems that I've often wondered about:

The PIs that are used follow the convention of using an attribute-like syntax.  Its a convention that's been adopted for standard PIs such as xml-stylesheet and xml-model.
While its a convention, the XML spec itself doesn't say a lot about what you can/can't do in a PI

When content including elements and attributes is deleted in change tracking systems the content is typically escaped so that its a legal attribute.

Suppose I was to delete this paragraph:  <p xml:lang="en">Hello World</p>

We may see something like this (I'm generalising from what I've seen in a number of systems):

<?change user="nigel" time="2014-08-27 15:12:00" delete="&lt;p xml:lang=&quot;en&quot;&gt;Hello World&lt;/p&gt;" ?>


The angle brackets and quotes have been 'escaped' to make it a legal attribute.  I've got code to deal with this process, but I do wonder if its necessary and if things could be simplified?

If we don't use attributes we could perhaps do this:

<?change
  <delete>
    <dc:creator>nigel</dc:creator>
    <dc:time>2014-08-27 15:12:00</dc:time>
    <deletedContent><p xml:lang="en">Hello World</p></deletedContent>
  </delete>
?>

The pseudo-attribute based representation is more compact for small cases certainly, but if there's a large amount of deleted content then the size needed for escaping grows.

And with the XPath3/XLST3 parse/serialize functions coming soon (and saxon:parse()) would an 'element based PI'  be easier for new-comers to read and process?  And perhaps there could be a convention of using a .xsd or .rng grammar to specify the PI content.


It's not a big issue for me - I've written the code for handling the escaping, but I've often wondered if things could be easier.

I wonder if there are any advantages to the attribute like notation, other than its a convention that's always been followed.  Does anyone know the history here?

Thanks,

Nigel

 
-- 
Nigel Whitaker - DeltaXML Ltd - nigel.whitaker@deltaxml.com

Received on Wednesday, 27 August 2014 14:59:33 UTC