Re: ISSUE-147 (preserve markup by default): RDFa Processors should preserve markup by default [RDFa 1.1 in HTML5]

> On Sat, Dec 29, 2012 at 8:25 AM, Ivan Herman <ivan@w3.org> wrote:
>> I also do not see any new evidence in

*chair hat on*

Hey Ivan,

I raised the issue because I saw a glimmer of new evidence that
Sebastian expanded upon in his latest e-mail. We had not considered
different LTR/RTL written languages or WAI issues when we made the
decision. I can't find anything in the minutes, nor do I remember us
discussing those two points at the time.

That said, I think the new evidence doesn't override the greater
concerns 1) backward compatibility with RDFa Core 1.1, RDFa Lite 1.1,
XML+RDFa 1.1, and XHTML1+RDFa 1.1, and 2) mistakenly including markup in
triples due to authoring error.

Unless I've missed something, I think we're on solid procedural ground
to re-open the issue. Sebastian, you're more than welcome to join us on
our next telecon on January 10th if you think that would help?

On 12/29/2012 02:47 PM, Sebastian Heath wrote:
> 1) The consideration of ISSUE 147 with in the context of the Working 
> Draft of "RDFa 1.1 in HTML5" is timely.

This is another reason the issue was re-opened, as we also didn't
explicitly talk about the issue in this particular context... even
though it was implied at the time.

> 2) There is new evidence in the form of the existence of RDFa Lite, 
> the introduction of a use case in which child elements are not a 
> mistake, full consideration of multi-lingual issues as they appear
> in HTML5 as used in the real world, and the possibility of WAI
> impact.

This is also true.

> 3) There is a substantive case for the default production of 
> rdf:XMLLiteral and rdf:HTML in the context of HTML5 and its
> variants. See "2)" immediately above.

There is a case, it is yet to be determined if it is substantive or not.

> I do hope the above discussion allows us to move beyond procedural 
> issues to full consideration of the merits of ISSUE 147.

I think we're on firm procedural ground to re-open the issue, unless I
made a mistake and missed something... which is always quite possible. :)

*chair hat off*

> My final point is that I hope it's clear that this issue is of great 
> importance to me. I want to use XHMTL+RDFa but this default behavior
> is a real impediment.

Keep in mind that support for HTML Literals was added in the latest
Working Draft (look at the last bullet item in the "Additional RDFa
Processing Rules" section):

http://www.w3.org/TR/2012/WD-rdfa-in-html-20121213/#additional-rdfa-processing-rules

So, all you would need to do is change this:

   <span property="dc:bibliographicCitation">

to this:

   <span property="dc:bibliographicCitation" datatype="rdf:HTML">

Granted, this is not done by default. The reasoning for doing so boils
down to this:

Non-experts don't understand the difference between an XMLLiteral, an
HTML Literal, and a plain literal. While it may be more "correct" to
express data that contains markup as an XML literal or an HTML literal
by default, we have seen numerous cases where authors screw this up in
the wild.

It is easier to tell an expert like yourself, Sebastian, that they
should use datatype="rdf:HTML" if they want to preserve the markup than
it is to tell a newbie to do datatype="" if they want just a plain literal.

One of the driving goals underlying the decision was to ensure that
we're maximizing the amount of correct data that is being generated
on the Web. We introduced the initial context because of this reason. We
also made plain literals the default for this reason. More harm can be
done by an army of newbies that don't know the datatype="" trick than
can be done by an expert such as yourself who cares about making sure
that the data they're expressing is correct. In fact, I would bet that
you're going to get your markup right where a newbie wouldn't know if
their markup was correct. I realize that it's a pain for you,
specifically because of the type of content that you mark up,
but your pain is outweighed by thousands of others that would be
unknowingly publishing "ugly" data.

We can certainly boil the argument down to its essence, which I believe
is "what is more correct - destroying markup or preserving it?".

The answer to that question wholly depends on what problem you're trying
to solve. If you are trying to say that "we should never destroy
markup", then I see that as a theoretically pure argument. It's correct
from a design purity standpoint.

If you care more about creating a language that is easy to use for
beginners, then I think it's acceptable to "destroy some markup" because
the data generated on the Internet in aggregate is going to be more
accurate to what most authors, in aggregate, intended. This may come
across as a "lowest common denominator" feature, and it is. We didn't
make this decision because it was the theoretically pure thing to do. We
made it because we saw many (albeit, anecdotal) examples of bad RDFa
markup online wrt. XML literals because authors didn't know that markup
would be a part of the data that was produced by the RDFa processor.

To put this another way, iirc, you have been the only person to raise
this as an issue since the decision was made in 2010. If we had made the
wrong call on this particular feature, I would have expected to see much
more push-back on this change in RDFa 1.1.

That said, the thing that worries me about this is that you've been very
involved with RDFa from an implementation side for quite a while. You
know what you're doing. So, it would be foolish of us to not hear you
out, as sometimes the process of doing that raises other concerns that
might weigh more heavily on the decision.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: HTML5 and RDFa 1.1
http://manu.sporny.org/2012/html5-and-rdfa/

Received on Saturday, 29 December 2012 21:58:21 UTC