Re: Request for clarifications re XMLLiteral in RDFa (was: Re: XML Literals poll) from Gregg Kellogg on 2011-11-24 (public-rdf-comments@w3.org from November 2011)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Wed, 23 Nov 2011 19:37:35 -0500
To: Richard Cyganiak <richard@cyganiak.de>
CC: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-ID: <B0287B17-280E-4E53-B6D2-6EA07507B947@greggkellogg.net>
Hi Richard,

On Nov 23, 2011, at 4:15 PM, Richard Cyganiak wrote:

> Hi Gregg,
> 
> On 23 Nov 2011, at 23:04, Gregg Kellogg wrote:
>> In most cases, an L2V mapping of the content is not important, only when querying or comparing graphs does it become useful. In any case, comparing based on equivalent literal content in this case seems fragile. I would hate to impose a requirement that environments perform this transformation in an environment where it's not likely to be useful.
> 
> I note that environments that don't have to do value comparisons never need to perform L2V mapping, and hence don't need to implement it.
> 
>> As an alternative, I would consider relaxing the Exclusive C14N requirements regarding namespace promotion. This is often not done correctly, or results in extra namespaces being handled. The advice for people running the RDFa test harness is to ignore failing tests that use XMLLiterals because of these problems.
> 
> Namespace promotion isn't a feature of rdf:XMLLiterals – it's a feature of certain RDF syntaxes (RDF/XML for @parseType="literal", and RDFa). Changing this for RDFa is not something that the RDF WG can help with, I think.

Good point, it's RDFa that says the XMLLiteral should be in Exclusive Canonical representation, not the datatype. I'll bring this to the group.

>>> Given that HTML and XML have very different syntactic constraints, two separate datatypes seem like a natural approach to take?
>> 
>> RDFa 1.0 will always need to deal with XML Literals. We could potentially change this for RDFa 1.1, but it would still be different depending on the host language being used: XHTML, SVG and XML would probably continue to use XML Literal, while HTML4 and HTML5 could use a hypothetical HTML Literal.
> 
> I wouldn't see the choice between XMLLiteral and HTMLLiteral as one that is made by the parser or syntax, but one made by the document author. If I know that I'm producing a well-balanced XML fragment, then I can datatype it as an XMLLiteral. If I want to ship tag soup in my document, then I can datatype it as an HTMLLiteral. (There's a bit of an issue with shipping tagsoup inside an XML-based syntax such as XHTML+RDFa or RDF/XML – I suppose it would call for CDATA sections.)

Now that I think more about it, I think you're right that the author could choose between the two datatypes; in RDFa 1.0, creating an XMLLiteral was an implicit operation, in 1.1 it's explicit, so this should work okay.

CDATA would probably be required for XHTML, or it could be coerced to an infoset and have normalied XHTML inserted in an application, which would probably serve the needs of an author. Realistically, I don't think we'll see too much XHTML for these types of applications, and HTML would be fine with tag-soup, for all practical purposes.

>> It would be much easier, and more likely to come out right, if we stuck with a single datatype (XML Literal), but relaxed the C14N rules to achieve greater interoperability.
> 
> (Realistically, I don't see this group agreeing to shipping non-XML in an XML datatype, so at least the well-formedness constraint is very likely to stay. This would mean shipping tag soup in rdf:XMLLiteral will remain an error. We'll have to see about non-canonical XML.)

This would be an advantage for an HTML Literal datatype, where we could potentially avoid the C14N situation.

>> OTOH, adding an optional transformation to an infoset could be useful in some cases, but I would make this a MAY, or perhaps a SHOULD, but not a MUST.
>> 
>>>>> Q4. Should *invalid XML* be allowed in the lexical space?
>>>>> 
>>>>> In other words, should "</bar !!!>"^^rdf:XMLLiteral be ill-typed (just like "AAA"^^xsd:integer) or well-typed (just like "</bar !!!>"^^xsd:string)?
>>>> 
>>>> +1. If we depend on authors only using "correct" markup, we'll invalidate many common cases, even where the HTML is incorrect. 
>>> 
>>> I note that the lexical form isn't necessarily what authors write – there's always a parser in between.
>> 
>> I disagree that there's always a parser in between; if I write Turtle containing an XML Literal this doesn't have to involve an HTML (or XML) tool chain. I commonly write my HTML by hand, of course, I try to do so correctly.
> 
> There's still a parser in between – the Turtle parser. It may not do anything special with the literal. The point is that the parser could be specified to clean up the mess before generating a literal.
> 
>> One thing XMLLiteral implies is that an RDFa processor should use use the HTML/XML content to form the literal, not the innerText content. This is really what most people care about, IMO. Within a different context, say JSON-LD, it might be useful in an Ajax response to know that the result should update the text or HTML of the element; for example using jQuery $("#id").html(literal value) vs $("#id").text(literal value). None of this requires any reasoning over the literal value itself; I think that's a more common use of the datatype.
> 
> What you're saying is that rdf:XMLLiteral is being abused to indicate the presence of general HTML markup. This abuse indicates the existence of an important unmet need. The response should be a call for meeting that need, and not necessarily a call for changing rdf:XMLLiteral to legalize the abuse.

I'll bring it to the attention of the RDFWA WG.

>>>> C14N is a pain. I'd remove any requirement that in-scope namespace definitions be added to top-level elements within the nodeset too.
>>> 
>>> I note that this would likely invalidate most existing RDF/XML content.
>> 
>> As with RDFa, I presume there will always be backwards compatibility issues. Within the context of RDF/XML, C14N may continue to be required, but why require it for Turtle and/or RDFa. I can't think of a real-world use case for this in those environments.
> 
> You suggested dropping the requirement that namespace declaration be propagated into XML literals. I said that this would invalidate existing RDF/XML content. It's a non-issue for Turtle. It's an issue for RDFa, but one that the RDFa WG has to look into – the requirement isn't imposed by the datatype or by anything else in the scope of the RDF WG.

Yes, thanks for pointing this out; I realized that at one time :).

My takeaway is that the RDFWA WG might want to look more at the HTML Literal as being more appropriate for our purposes.

Thanks,

Gregg

> Best,
> Richard
Received on Thursday, 24 November 2011 00:38:21 UTC