Re: DOM L3 Core spec.: textContent specification ambiguity from Daniel Barclay on 2010-06-21 (www-dom@w3.org from April to June 2010)

From: Daniel Barclay <daniel@fgm.com>
Date: Mon, 21 Jun 2010 11:42:55 -0400
To: Robin Berjon <robin@berjon.com>
CC: <www-dom@w3.org>
Message-ID: <4C1F887F.5080603@fgm.com>
Robin Berjon wrote:
> Hi Daniel,
> 
> On Jun 8, 2010, at 19:43 , Daniel Barclay wrote:
>> The wording in the definition of the textContext attribute of the
>> Note interface seems to be ambiguous (or at least misleading).
>>
>> The text says:
>>
>>  "On getting, no serialization is performed, the returned string
>>   does not contain any markup."
>>
>> The intent of the latter part of that sentence is to say that the
>> string does not contain any added markup to represent any child
>> elements, etc.
>>
>> However, that wording sounds like it's saying that the string cannot
>> contain any text that looks like markup.
> 
> I find the sentence to be rather clear in fact. 

Note that how you find the sentence isn't necessarily the issue.
If it's ambiguous, some are going to find the sentence to be saying
one thing, and some are going to find it to be saying something else.
Such ambiguity is inappropriate for a technical specification.

For this particular "contains no markup" phrase, note how that might
be used in, say, a description of a database field or web service
parameter for a web application to try to specify that it is
restricted to values that can be inserted into an HTML page without
having to encode the value.  Yes, it might not be good practice to
skip encoding in such cases because you might then forget to encode
in other cases, but that existing wording usage certainly could
influence how readers interpret that same phrase in the the DOM
specification.


 > It says that the
 > returned string contains no markup, which to me sounds like it's
 > saying that it contains no markup;

Huh?  Saying that "contains no markup" sounds like "contains no markup"
isn't much of an argument; it certainly doesn't address the issue.
The ambiguity is in the phrase "contains no markup."

It means to say that it contains no markup at the relevant level of
interpretation, but doesn't limit itself to sounding like only that.


 > if it said that the returned
 > string doesn't contain anything that could be mistakenly interpreted
>  as containing markup, then it'd probably sound like it's saying that
 > the string cannot contain any text that might perhaps look like markup.
 > But it doesn't :)

If A implies B and A is false, trying to argue that therefore B is
false is an invalid argument.  (Other things can imply B.)


>> If the difference isn't clear, consider getting the text content of
>> the root element of this document:
>>
>>  <root><sub>&lt;e/&gt;</sub><root>
>>
>> The textContent attribute string would be "<e/>", right?
> 
> Which is fine: it's not markup. It's just text. 

Not quite.  Yes, it is true that it is text and not markup _at_the_
_intended_ level of interpretation.  However, it is markup at a
different level of intepretation.  And yes, that other level of
interpretation (taking text content from one level and re-interpreting
it (parsing it again) usually is irrevelant to XML/HTML
specifications.

However, wording that sounds like it covers that other level makes
the level relevant at least to the degree of avoiding mistaken
implications/inferences about it.


Ah, maybe here's part of why we're arguing.  You write:

   it's not markup. It's just text.

But markup _is_ text.  (In SGML/HTML/XML, markup is not binary codes
marking beginnings and endings of ranges of represented text; it is
_text_ marking up beginnings and endings of ranges of represented
text.)

You (and the above wording in the spec) probably need to distinguish
more clearly between text and represented text, or text and marked-up
text, or something like that.


 > You can then go
 > el.textContent = "<e/>" and it'll roundtrip because it's not markup.
> 
>> That string _does_ contain markup
> 
> No, it doesn't. That's like saying that the following XML document 
 > isn't well-formed because the "b" element isn't closed:
> 
>   <a><![CDATA[<b>]]></a>

Yes, it's true that one can't say that in that XML document, the b
element isn't closed, or the b start tag isn't balanced with an end
tag, because there is no b start tag in _that_ XML element.

However, if someone refers to the b tag, you can't say that there
is no b tag at all.

Yes, the b tag is only in the XML document that results from taking
the text represented by the a element in the given XML document (and
interpreting it as XML document), and, yes, that usually that is
completely relevant.

However, if one says there's no b tag, something has to limit that
to saying there's no b tag in the _given_ XML document, or one is
say that there is not b tag at all, which isn't true.


Why not have the spec say what it means but not sound it like means
more than it intends to mean?

Daniel
Received on Monday, 21 June 2010 15:43:25 UTC