RE: Change tracking processing instructions and deletion from Dennis E. Hamilton on 2014-08-27 (public-change@w3.org from August 2014)

From: Dennis E. Hamilton <dennis.hamilton@acm.org>
Date: Wed, 27 Aug 2014 09:21:00 -0700
To: <public-change@w3.org>
Cc: <nigel.whitaker@deltaxml.com>
Message-ID: <005001cfc212$e4664060$ad32c120$@acm.org>

Hi Nigel,

I think one reason for the escaped attribute content (although it could just as easily be escaped element content) is that the deletion need not be well-formed, something that matters for cross-cutting deletions in schemes such as whatever it is that ODF implementations actually do.  

Furthermore, XML forbids “<” in attributes and forbids “>” in PIs.

So you can’t avoid some escaping scheme in PIs and if attribute-like components are to follow the XML rules for attributes, I think you end up having to do &amp;lt; and &amp;gt; to keep things straight. 

Although ODF does not use Processing Instructions, its <text:deletion> element is along the lines of your <delete> example (although the provenance information is separated from the deleted material differently and well-formedness of cross-cutting extractions is achieved by adding start and end tags as necessary).

- Dennis

PS: Some list-management systems turn plaintext into HTML without properly escaping literal appearances of angle brackets and ampersands.  I am hopeful this list does better than that, considering where we are.

 - - - - Original Message - - - -
From: Nigel Whitaker [mailto:nigel.whitaker@deltaxml.com] 
Sent: Wednesday, August 27, 2014 07:59
To: public-change@w3.org
Subject: Change tracking processing instructions and deletion

Hello everyone,

There's an aspect of the existing change tracking PIs that are used in a number of systems that I've often wondered about:

The PIs that are used follow the convention of using an attribute-like syntax.  Its a convention that's been adopted for standard PIs such as xml-stylesheet and xml-model.
While its a convention, the XML spec itself doesn't say a lot about what you can/can't do in a PI

When content including elements and attributes is deleted in change tracking systems the content is typically escaped so that its a legal attribute.

Suppose I was to delete this paragraph:  <p xml:lang="en">Hello World</p>

We may see something like this (I'm generalising from what I've seen in a number of systems):

<?change user="nigel" time="2014-08-27 15:12:00" delete="&lt;p xml:lang=&quot;en&quot;&gt;Hello World&lt;/p&gt;" ?>


The angle brackets and quotes have been 'escaped' to make it a legal attribute.  I've got code to deal with this process, but I do wonder if its necessary and if things could be simplified?

If we don't use attributes we could perhaps do this:

<?change
  <delete>
    <dc:creator>nigel</dc:creator>
    <dc:time>2014-08-27 15:12:00</dc:time>
    <deletedContent><p xml:lang="en">Hello World</p></deletedContent>
  </delete>
?>

[ ... ]

Received on Wednesday, 27 August 2014 16:32:42 UTC