Re: Made rdf:HTML/XMLLiteral non-normative from Richard Cyganiak on 2013-12-17 (public-rdf-wg@w3.org from December 2013)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 17 Dec 2013 15:18:14 +0000
To: Markus Lanthaler <markus.lanthaler@gmx.net>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <C2F1EAD0-E5A8-42DF-A9A6-2DD065132247@cyganiak.de>
On 17 Dec 2013, at 14:22, Markus Lanthaler <markus.lanthaler@gmx.net> wrote:
> On Monday, December 16, 2013 7:56 PM, Richard Cyganiak wrote:
> [...]
>> 
>> In my eyes, the correct thing to do is this:
>> 
>> 1. Make the datatypes normative again
>> 2. Define the value space and L2V mapping as "implementation-defined"
>> 3. Add informative material that describes the DOM4-based
>> implementations of these two concepts, and state that they are
>> informative simply because the DOM4 spec is still subject to change at
>> time of writing.
>> 
>> The result is that an implementation technically conforms to the spec
>> regardless of how it implements the value space of these datatypes.
> 
> And what would we gain by that?
> 
> IMHO normative statements, especially MUSTs, have to be used sparingly and
> only to specify things that are necessary for interoperability.

That is, how shall I put it, a strange view.

Back to basics. What is the purpose of a specification? It exists to promote interoperability. It achieves this by defining what an implementation has to do to be considered conforming.

It follows that anything that isn’t necessary for interoperability (i.e., informative content) doesn’t *really* belong into a specification. Informative content cannot, by definition, affect conformance. The purpose of informative content is to help a reader through the trickier bits of a specification by providing rationale, context, advise, and so on.

Furthermore, the only thing that creates interoperability is MUST. Every SHOULD and MAY decreases interoperability, because they allow having conforming implementations that are actually not fully interoperable. Too many SHOULDs and MAYs, and your spec will fail to create interoperability because every feature is optional.

> Given that we can't improve interoperability at this point in time, what is
> it that we are trying to achieve?

The current wording has a conformance statement that depends on a “definition" that is in an informative section. That is logically ill-formed. It's like a null pointer exception, like dividing by zero. This is what needs fixing.

> I think all we want is an IRI to label
> literals as HTML or XML snippets so that applications won't lose that
> information. It's simply a marker.

For HTML, yes, that’s essentially it. For XML it’s a lot trickier, because the contents need to be well-formed. We want validators to check that kind of stuff, for example. Remember that in RDF 2004, XMLLiteral had to be not just well-formed but canonicalised.

> Whether that marker is normative or not
> doesn't change anything IMO, especially considering the prominence they'll
> get by just being in the "rdf" namespace.

If it’s not normative, then an implementation might use rdf:HTML and rdf:XMLLiteral to identify absolutely anything it wants, and may — correctly — claim that it conforms to RDF Concepts, including all optional parts. For example, a Turtle parser that fails with an XML parse error on “<br>”^^rdf:HTML *would be conforming*. It would *not be a bug*. It goes without saying that this would violate user expectations.

By making the datatype normative, and the value space implementation-defined, an implementation that supports rdf:HTML *MUST* at least treat the lexical space correctly to claim conformance. Granted, it could define the value space and mapping as absolutely anything, but there’s no way around that at this stage. 

>> That's the best we can do. It doesn't seem like a big deal, because
>> actually implementing equivalence checking on these literals seems to
>> have little benefit anyways.
> 
> The simplest thing to do then would be to make the value space equal to the
> lexical space, i.e., a string as I've proposed in the past. Unfortunately,
> doing that at this stage would be a significant change and require as to go
> back to LC I believe.

This has been extensively discussed before.

This would make it impossible to have an implementation that conforms both to RDF 2004 and RDF 1.1, because in RDF 2004, they *had* to canonicalise, and with this design, they would have to *not* canonicalise.

Also, many XML and HTML environments throw away certain information (e.g., order of attributes) on parsing, so it’s *very difficult* to serialise the same string. With an L2V mapping as currently in the spec, that’s fine; the re-serialised literal will be not identical but equal.

> So, to move forward, my proposal would simply be
> 
> PROPOSAL: Drop the "If the IRI ...#XMLLiteral/#HTML is recognized then it
> refers to the datatype rdf:XMLLiteral/rdf:HTML" statements in section 5.4 of
> RDF Concepts.

In that case, an implementation that accepts “<not>well-formed”^^rdf:XMLLiteral would be conforming. This behaviour would not constitue a bug. It would not be broken.

The DOM4 problem we have is with the value space only. There’s no problem with the lexical space. There’s every reason to *require* implementations to do the right thing for the lexical space. The lexical space is more critical for interoperability anyways.

Best,
Richard



> 
> 
> I checked Semantics and saw that their only mention is in
> 
>  "Two other datatypes rdf:XMLLiteral and rdf:HTML are defined in
>   [RDF11-CONCEPTS]D interpretations MAY fail to recognize these
>  datatypes."
> 
> Which could simply be dropped. They are also mentioned in the incomplete
> "some rdfs-valid triples". It doesn't really matter whether they stay there
> or not.
> 
> 
> --
> Markus Lanthaler
> @markuslanthaler
> 
>
Received on Tuesday, 17 December 2013 15:18:44 UTC