Integrated versus Injected Change-Tracking (was RE: Change tracking processing instructions and deletion) from Dennis E. Hamilton on 2014-09-20 (public-change@w3.org from September 2014)

From: Dennis E. Hamilton <dennis.hamilton@acm.org>
Date: Sat, 20 Sep 2014 11:49:08 -0700
To: <public-change@w3.org>
Message-ID: <006401cfd503$8fd2f8a0$af78e9e0$@acm.org>
There is something that has been nagging at me about the use of XML Processing Instructions to inject change-tracking into the XML employed in expression of documents where the XML documents and application do not provide for change-tracking.
 
I can’t rule out its usage, it is clearly successful, but I wonder about the scope in which it is applicable.  
 
For example, there are cases where it is not possible to produce application-valid change-tracking without awareness of the schema and the additional document model/structure that is imposed on the way XML documents are used as a carrier of a higher-level document-file format.
 
Let’s call this schema awareness, for simplicity.  Now, in deriving the injected change-tracking by comparison of two files, this is not a problem because the two files are each presumably for valid documents.  However, the presentation of the tracked changes in human-readable form, and the constraints that may apply on acceptance and rejection of changes, do require schema awareness in this sense.
 
Furthermore, although XML says nothing about processing instructions beyond forbidding “?>” and indicating what is needed for a prelude to the rest of the PI, there will eventually have to be a schema for whatever is in the PIs.  There may likely be need for coupling among them.  (A change could consume a PI and there may be needs for couplings that are neither hierarchical or tied to the XML hierarchy.)
 
I suppose the point is that there are many cases where injection is not an orthogonal act.  In that case, there needs to be some sort of indication of the scope of application of injected changes that makes it clear that some integrative profile applies.  To avoid arriving at the previously-unsolved problem, there needs to be a lot of engineering attention around failure modes where injected material is mangled or removed, and where changes are made post-change-tracking that are not themselves change-tracked.
 
I suppose there is at least one use case in here.  I have not teased it out.
 
It would be good to know whether orthogonality and injectability of change-tracking is an assumption or not, though.
 
-   Dennis
 
From: Robin LaFontaine [mailto:robin.lafontaine@deltaxml.com] 
Sent: Thursday, September 4, 2014 08:45
To: public-change@w3.org
Subject: Re: Change tracking processing instructions and deletion
 
Hi Claudius,

I think one issue with XQuery Update for change tracking is that although it looks good for one transaction, it does not work well for multiple transactions because the XPaths will change as the document is changed. In theory you could undo the changes in order (though only if every change had been tracked), but it would be almost impossible to undo in any other order with predictable results.

Consider also a document with a series of XQuery Update changes, it would be quite hard to work out how to display these changes, but much easier if the changes are embedded in PIs within the document (or indeed in markup).

There is a design decision about where to keep the changes:
1. Within the document
2. Separate, e.g. in a series of transactions in XQuery Update

Most current systems I believe use choice 1, either as markup (e.g. Word, ODF, Arbortext) or PIs (e.g. oXygen, XMetaL, Xopus). I think there are good reasons for this choice.

Robin

-- 
Robin La Fontaine
Director
DeltaXML Ltd "Experts in information change"
 
T: +44 1684 592 144 
E: robin.lafontaine@deltaxml.com <mailto:robin.lafontaine@deltaxml.com>  
W: http://www.deltaxml.com
Malvern Hills Science Park, Malvern, Worcs, WR14 3SZ, UK
Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK
 
On 28/08/2014 17:49, Claudius Teodorescu wrote:
Hi,


It is very nice to see PIs used to track changes. Last year, when I mention PIs as a method for track changes, on this list, I had no clue they were used.
As to syntax for effectively track changes, why not use XQuery Update?
Your example would become: 

<?change user="nigel" time="2014-08-27 15:12:00" change="delete node //p[1]" ?>
for example.
Another example 

<?change user="nigel" time="2014-08-27 15:12:00" change="insert node <p xml:lang="en">Hello World</p> into //body" ?>.
Thus, it is plain what p element was deleted (as the respective XPath expression identifies it clearly for the respective version of document)
 
The only overhead is that the deleted content has to be stored, too, in order to revert the changes, etc.


Claudius
 
On Wed, Aug 27, 2014 at 5:59 PM, Nigel Whitaker <nigel.whitaker@deltaxml.com <mailto:nigel.whitaker@deltaxml.com> > wrote:
Hello everyone,
 
There's an aspect of the existing change tracking PIs that are used in a number of systems that I've often wondered about:
 
The PIs that are used follow the convention of using an attribute-like syntax.  Its a convention that's been adopted for standard PIs such as xml-stylesheet and xml-model.
While its a convention, the XML spec itself doesn't say a lot about what you can/can't do in a PI
 
When content including elements and attributes is deleted in change tracking systems the content is typically escaped so that its a legal attribute.
 
Suppose I was to delete this paragraph:  <p xml:lang="en">Hello World</p>
 
We may see something like this (I'm generalising from what I've seen in a number of systems):
 
<?change user="nigel" time="2014-08-27 15:12:00" delete="&lt;p xml:lang=&quot;en&quot;&gt;Hello World&lt;/p&gt;" ?>
 
 
The angle brackets and quotes have been 'escaped' to make it a legal attribute.  I've got code to deal with this process, but I do wonder if its necessary and if things could be simplified?
 
If we don't use attributes we could perhaps do this:
 
<?change
  <delete>
    <dc:creator>nigel</dc:creator>
    <dc:time>2014-08-27 15:12:00</dc:time>
    <deletedContent><p xml:lang="en">Hello World</p></deletedContent>
  </delete>
?>
 
The pseudo-attribute based representation is more compact for small cases certainly, but if there's a large amount of deleted content then the size needed for escaping grows.
 
And with the XPath3/XLST3 parse/serialize functions coming soon (and saxon:parse()) would an 'element based PI'  be easier for new-comers to read and process?  And perhaps there could be a convention of using a .xsd or .rng grammar to specify the PI content.
 
 
It's not a big issue for me - I've written the code for handling the escaping, but I've often wondered if things could be easier.
 
I wonder if there are any advantages to the attribute like notation, other than its a convention that's always been followed.  Does anyone know the history here?
 
Thanks,
 
Nigel
 
 
-- 
Nigel Whitaker - DeltaXML Ltd - nigel.whitaker@deltaxml.com <mailto:nigel.whitaker@deltaxml.com> 
 



-- 
http://kuberam.ro
http://kuberam.ro/art
Received on Saturday, 20 September 2014 18:49:31 UTC