- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Mon, 27 Dec 2021 09:27:44 -0700
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "Liam R. E. Quin" <liam@fromoldbooks.org>, ixml <public-ixml@w3.org>
> On 27,Dec2021, at 6:40 AM, Steven Pemberton <steven.pemberton@cwi.nl> wrote: > >> The more I think about it, the more I think that preserving >> the distinction between dstring and sstring is just a relic of the >> time when the design wanted to preserve the accidentals of the >> ixml grammar, and is at best misleading. So I now lean towards >> “let us mark them both as @string”. > This discussion has persuaded me of that too. > >> Hex notation, on the other hand, I continue to regard as >> something I’d like to preserve. > > I agree, but now you need to explain to me why that doesn't count for @from and @to. Because in @from and @to, the string ‘#a’ is unambiguously a reference to the character U+000A and the strings ‘#’ and ‘a’ are unambiguously references to characters U+0023 and U+0061, respectively. If we reduced sstring, dstring, and hex encoded strings all to the same attribute, we would need to find a way to determine whether the string ‘#a’ denoted - the character sequence U+0023, U0061, or - the character sequence U+000A We could introduce some escaping mechanism for ‘#’, I suppose. By analogy with the escaping mechanisms for single and double quotes we might say that a single hash mark introduces a hex sequence denoting a single character, and ‘##’ denotes a literal hash mark. And then we need a way to signal the end of the hex sequence. Maybe another hash mark? Is the end-of sequence marker obligatory or can it be omitted if the next character is not a legal hex character? I.e. can we write “yes,#asir!” or must we write “yes,#a#sir!”? You will note that without even perceiving it myself I have shifted from thinking of hex-encoded strings as an alternative to literal strings, as they are now, to hex-encoding as something we can embed in a larger string. And suddenly I find myself at the bottom of a slope that turned out to be a little slippery at the top. Not too bad a slope, but still … If we serialize both dstring and sstring as @string, and hex as @hex, the spec becomes simpler. If we serialize all three as @string, the spec becomes slightly more complex because of the rules about how to tell when hex escaping is being used. When I started this mail I was preparing to say that if we did the same for literals as we do for from and to — allow hex encoding or conventional encoding — that would be OK, too. But I have persuaded myself that unless there is a simpler way to do it than I have come up with so far, it would not be an improvement, because the simplification is only apparent, not real. The reason from and to don’t need distinct attributes for their conventional form and their hex-encoded form is that the length of the value reliably distinguishes them. The reason literals do need distinct attributes is that the distinct attributes are a simple way to carry the distinction, and appear to be simpler than any alternative. I think I’ve persuaded myself; did I also persuade you? Michael
Received on Monday, 27 December 2021 16:28:04 UTC