Re: Preserving markup when distilling @property values in xhtml from Sebastian Heath on 2012-12-22 (public-rdfa-wg@w3.org from December 2012)

From: Sebastian Heath <sebastian.heath@gmail.com>
Date: Sat, 22 Dec 2012 00:38:52 -0500
To: RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <CACsb_1ofJ9+Y_FFKME0NT7N3s_-tt9L3=s2oS5dBxLxi9azg3w@mail.gmail.com>
Manu,

 Thanks for the reply. It makes sense in terms of "I understand that I
can take these steps" but more fundamentally I think the decision to
discard the markup was wrong.

 As an aside, I definitely don't want to default to RDFa 1.0 behavior.
I take that would mean giving up @prefix, @vocab, etc.

 And having to put @datatype on all my elements that have @property is
bad. I'll try not to just repeat the principle that the default
behavior of an RDFa distiller should not be to toss out intentionally
authored information. But that does remain my basic point.

 Looking at your response:

"That is, simple things like dc:title contained a slew of XHTML markup"

No surprise that dc:titles have markup in them. Titles are complex
entities. This use supports my point. Preserve the markup.


* ... something simple like "Foo <i>Bar</i>" would expand into a
gigantic string ...

Why is the working group deciding that the <i> element is "simple" and
therefore can be discarded? An author has put it there intentionally.
Respect that and keep it. Does this mean many namespaces in the
output? Perhaps but what's so wrong with that? Tools can handle it no
problem and the original (x)html is the human friendly representation.
Or think of this as the same problem as found in XSLT and provide
mechanisms to suppress the output of namespaces. I do that all the
time in my transforms.


Looking at issue 19, it has:

"Anecdotal evidence via deployment experience has shown that this is a
common authoring mistake and that most authors do not intend to
generate XMLLiterals when there is no @datatype specified. We need to
collect a statistically significant sample set to determine if this is
true and if so, ensure that plain literals are generated instead of
XMLLiterals."

 Was this collection ever done? WHo made the determination that the
markup being looked at was a "mistake". The relevant telecon doesn't
seem to give a detailed justification for the decision.

 I really think we need to revisit this decision with an eye to
respecting authorial intent.

 Thanks,

 -Sebastian



On Fri, Dec 21, 2012 at 11:54 PM, Manu Sporny <msporny@digitalbazaar.com> wrote:
> On 12/20/2012 05:39 PM, Sebastian Heath wrote:
>> My issue is that the '<i>' element has been dropped out. I guess
>> this is because the original XMLLitteral is being co-erced into a
>> plain string. If that's the explanation, I think that is the
>> incorrect default behavior. I understand that I can add an @datatype,
>> but that will make my markup very messy. Particularly as I've chosen
>> a simple case. There are lots of places where I want to preserve the
>> markup in @property as that markup communicates important aspects of
>> the data. Again, the underlying data is an XML literal and I suggest
>> that the default behavior should be to preserve that when distilling
>> RDFa in XHTML contexts.
>>
>> It is possible that such preservation of markup should only be
>> defined for RDFa in (X)HTML(5). Again, why destroy good structured
>> information in a host-language context?
>
> Hi Sebastian,
>
> Yes, we debated this for a very long time in the XHTML+RDFa 1.0 days
> (between 2006-2008) and came to the same conclusion you did - that any
> markup should be preserved if found.
>
> As it turns out, that was exactly the wrong decision to make. When we
> did a post-REC analysis on how XHTML+RDFa 1.0 was being used in the
> wild, we found many, many examples on the Web where people were
> expressing strings with markup that they never intended to express. That
> is, simple things like dc:title contained a slew of XHTML markup. Even
> worse, something simple like "Foo <i>Bar</i>" would expand into a
> gigantic string if there were lots of RDFa prefix declarations in the
> document (because all xmlns: definitions need to be preserved in
> XMLLiterals in RDFa to ensure the snippet stays well-formed.
>
> So, we reversed the decision for RDFa 1.1 and made the processor strip
> out markup if found:
>
> http://www.w3.org/2010/02/rdfa/track/issues/19
>
> As you stated, in XHTML+RDFa 1.1 you can continue to preserve markup if
> you add datatype="rdf:XMLLiteral" to the element containing the
> @property attribute and the markup you want to preserve.
>
> If you want the XHTML+RDFa 1.0 default behavior of preserving markup,
> you can force the processor into XHTML+RDFa 1.0 mode by adding a
> version="XHTML+RDFa 1.0" on the HTML element of the document. You can
> also set it by declaring the XHTML+RDFa 1.0 DTD at the top of the document.
>
> So, there are 3 ways to achieve what you want, but as you say, it might
> make your markup a bit more verbose. Generating XMLLiterals by default
> was creating too much garbage data on the Web, which is why we do it the
> other way now. Plain Literals by default - XMLLiterals (or HTML Literals
> if you're in HTML mode) if you explicitly specify it.
>
> Does that make sense?
>
> -- manu
>
> --
> Manu Sporny (skype: msporny, twitter: manusporny)
> Founder/CEO - Digital Bazaar, Inc.
> blog: The Problem with RDF and Nuclear Power
> http://manu.sporny.org/2012/nuclear-rdf/
Received on Saturday, 22 December 2012 05:39:20 UTC