Re: Representation of strings and characters in XML version of ixml

My off-the-cuff reaction is to think that when the ixml spec says
the output should be XML, the escaping of < and & is covered
implicitly.  (It’s easy for me to be cavalier about it, since my underlying
XQuery or XSLT engine will do all the dirty work.  It’s very slightly
trickier for processors written in other languages.)

I hope everyone had a wonderful Christmas Day!  I certainly did: I 
spent part of the day building rudimentary support for alternate 
versions of code into my literate programming system, and have an
idea for improving performance.  (I’d really like to be able to run
Steven’s test suite in less than five hours.)

Michael


> On 25,Dec2021, at 10:32 AM, John Lumley <john@saxonica.com> wrote:
> 
> There is still a problem regarding some characters that need escaping into the XML representation that I encountered in the grammar for XPath, e.g. the production for a value comparison, which uses ‘>=‘ which needs to be serialised as ‘&lt;=‘., and similar cases with apostrophe and double quote. I’m not sure how that should be handled…
> 
> Forgive me, but I am a thousand miles away from code-full laptop so cannot get to the editable examples…..
> 
> John Lumley 
> 
> Sent from my iPad
> 
>> On 25 Dec 2021, at 16:38, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
>> 
>> On 24,Dec2021, at 11:29 PM, Liam R. E. Quin <liam@fromoldbooks.org> wrote:
>>> 
>>>> On Fri, 2021-12-24 at 11:53 -0700, C. M. Sperberg-McQueen wrote:
>>>> 
>>>> That is, I like knowing, from whether something is a dstring
>>>> or an sstring, … hmm.  I don’t know, from that, what I was
>>>> about to say I knew. 
>>> 
>>> i've not been very active here (sorry) so maybe this is off kilter a
>>> little - some languages/formats distinguish, though - e.g. in the Unix
>>> shells, "$$" is the process ID of the current shell, and '$$" is two
>>> dollar signs: interpolation happens in "..." and not in '...' so they
>>> have to be treated differently.
>>> 
>>> But maybe that's not what's going on here?
>>> 
>> 
>> Good point, but you are right that what is going on here is 
>> different.  In ixml double and single quotes are used interchangeably;
>> the only difference is that within single quotes, double quotes
>> do not need to be escaped while single quotes do, and vice
>> versa.
>> 
>> The only task affected by the decisions about whether the
>> XML representation should preserve or lose the distinction between 
>> single and double quotes in the ixml representation are, as far as
>> I know, re-serializing the grammar from XML to ixml, or serializing
>> in ixml a grammar first composed in XML.
>> 
>> If the distinctions are preserved, then when we re-serialize a
>> grammar, the quotation marks will be unchanged, and when we
>> serialize a born-XML grammar the grammar author can control
>> whether the ixml grammar uses single or double quotes.
>> 
>> However, since we have agreed to suppress whitespace outside
>> of literal strings and to lose the distinctions between ‘:’ and ‘=‘
>> and between ‘;’ and ‘|’, re-serializing is not going to reproduce the
>> original character stream in all cases.
>> 
>> The more I think about it, the more I think that preserving
>> the distinction between dstring and sstring is just a relic of the
>> time when the design wanted to preserve the accidentals of the
>> ixml grammar, and is at best misleading.  So I now lean towards
>> “let us mark them both as @string”.
>> 
>> Hex notation, on the other hand, I continue to regard as 
>> something I’d like to preserve.
>> 
>> Michael
>> 
>> 
> 

Received on Sunday, 26 December 2021 01:21:04 UTC