Re: ISSUE-147: PROPOSAL for rdfa:defaultDatatype from Sebastian Heath on 2013-01-09 (public-rdfa-wg@w3.org from January 2013)

From: Sebastian Heath <sebastian.heath@gmail.com>
Date: Wed, 9 Jan 2013 00:33:07 -0500
To: RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <CACsb_1o2TcMmneofWii6gy3itJwOJb7WsxqDr1Uu0D7XdBdE1g@mail.gmail.com>
Ivan,

 Thank you for your response. Some reactions...

Again, I am proposing an in-document, global as well as section
specific mechanism for defining the default production for elements
marked as RDFa properties that have child elements. You call it a
"major incompatibility". I disagree. This is a far less radical
supplement to the RDFa 1.1 Core than is @itemref so I believe that if
it's been deemed appropriate to adopt that then it's somewhat
arbitrary to reject my proposal.

You raise the issue of presentational/structural markup, which is
important. HTML5 has moved towards greater adoption of structural
markup and the W3 encourages its use (passim in section 3 of "HTML5
differences from HTML4" [1]). This is all the more reason to provide a
robust mechanism to preserve it when it is in place. Especially as
RDFa 1.1 in HTML addresses both the 4 and 5 variants of HTML.
Accommodation of the greater emphasis on the structural role of
elements such as HTML5 'b','i','u', and 'strong' should be considered
as we develop the spec. With the refined definitions of these
elements, it is very much more likely that their presence is not a
mistake. Indeed, we should presume they are intentional. Providing a
global mechanism to ensure these are preserved is appropriate in
HTML5. I have already introduced specific use cases and general
discussion of bi-directional texts and WAI issues. Those are also
relevant here. I hope the WG will address this accumulating evidence.

[[ As somewhat of a side-note: in your example and as I'm sure you've
considered, moving the '<abbr>' outside the span  that defines the
dc:title is a simple solution. But were that my foaf file I would want
the <abbr> in the RDF triple for the very accessibility reasons you
suggest. Again, "different approaches by different people" so let's
make good mechanisms to support them all. And I'd love to test those
mechanisms against more sophisticated examples than foaf profiles. The
RDFa standards process has been ill-served by the dominance of such
simple cases. I've tried to avoid them in this discussion.]]

 You call @datatype a "hack". That seems a little strong, but I take
that to mean that you understand it has drawbacks. The current
proposal addresses those by suggesting a global and robust mechanism.
If it is a "hack", let's do better.

 You suggest a parameter and then worry that it will be forgotten. I
had considered a parameter but didn't propose it for the reason you
suggest. An "in-document" approach that is well articulated in the
spec is much more robust.

As Manu indicated there are no procedural reasons to reject a robust
infrastructure for @property processing and my proposal to allow
in-document global specification of behavior is a reasonable
middle-ground between forcing the two behaviors that have heretofore
been considered either/or options.

 And as always, I enjoy the opportunity to pursue this important topic.

 Thanks,

 Sebastian.

[1] http://www.w3.org/TR/html5-diff/#language

On Tue, Jan 8, 2013 at 11:04 AM, Ivan Herman <ivan@w3.org> wrote:
> Sebastian,
>
> - I understand your use case, and I do not deny it is a valid one. The problem we have is that there are many use cases which are just about the opposite, namely when the markup is really for presentation or for HTML purposes and does not have any meaning for the RDF part. I have myself run into this problem many many times, doing a simple thing like writing a foaf profile. A typical example is from my own foaf profile where I have something like:
>
> <a rel="foaf:workplaceHomepage" href="http://www.cwi.nl"><span property="dc:title">Centre Mathematics and Computer Sciences (<abbr title="Centrum voor Wiskunde en Informatica">CWI</abbr>)</span></a>
>
> The current RDFa 1.1 generates
>
>  <> foaf:workplaceHomepage <http://www.cwi.nl> .
>  <http://www.cwi.nl> dc:title "Centre Mathematics and Computer Sciences (CWI)" .
>
> whereas, with the default XMLLiteral version it would say
>
> <> foaf:workplaceHomepage <http://www.cwi.nl> .
>  <http://www.cwi.nl> dc:title """Centre Mathematics and Computer Sciences (<abbr title="Centrum voor Wiskunde en Informatica"> CWI</abbr>)""" .
>
> Obviously, the second set of triples is not what one wants, the <abbr> is used here for user interface/accessibility purposes.
>
> Running up to RDFa 1.1 we got many feedbacks of such structures and the overall feedback was that generating XML Literals by default creates lots of problems for users who are forced to use the @datatype="" hack.
>
> So, after long discussions, we made a decision to change this. That boat has sailed.
>
> - Your solution means, if my understanding is correct, that we would have some sort of a parametrizing of the RDFa processor behaviour.  Ie, by adding some parameters into the file, the behaviour of the processor changes insofar as it would generate an XML (or HTML) literal. This creates a major change in the way the processing works and also opens up the floodgates: I could imagine a number of other such cases that could depend on parameters (a good example: should we keep white spaces in the text as they are in the source, or should we merge them? The current decision is the former although, I must admit, I would have preferred the latter.)
>
> I would be extremely uneasy adding this to HTML5+RDFa; HTML5+RDFa is just defining a language/host profile for RDFa Core, it is not supposed to introduce such a radical departure from the way an RDFa processor works (and this sort of approach would fairly radically change existing implementations). It would also means a major incompatibility between HTML5+RDFa and XHTML+RDFa.
>
> Finally... you reject the current solution whereby the author would have to put an explicit @datatype into the code because the author may get it wrong. Well, isn't this the same problem? We have had the experience with authors forgetting to put namespace/prefix declarations into the RDFa source all the time, hence our approach of defining a default context for RDFa 1.1; what makes you think that this extra parameter will not be forgotten all the time?
>
> I can imagine particular RDFa implementations introducing non-standard parameters that would govern whether HTML/XML literals would be generated by default or not. If there is a real need for that, implementation would do that. I could even go as far as defining that parameter's name in the HTML5+RDFa document, or even in a revised version of the RDFa 1.1 Core, as long as implementation of that parameter is not required (but, if an implementation wants to do it, it would use a predefined parameter name), but this is as far as I can see going.
>
> Sorry...
>
> Ivan
>
>
> On Jan 8, 2013, at 14:35 , Sebastian Heath <sebastian.heath@gmail.com> wrote:
>
>> First I'd like to thank Ivan [1], Gregg [2], and Manu [3] for their
>> thoughtful replies on 29/12/2012. Other commitments kept me from
>> responding right away but I am hoping to provide more context before
>> the teleconference on the 10th.
>>
>> Preliminarily and germane to the PROPOSAL I'll make in this e-mail,
>> I'd like to consider a few of the points made in those message.
>>
>> Gregg wrote that it seemed I was very concerned with the archival
>> document that I am creating. This is true but not my only concern. My
>> most immediate concern is the workflows I am developing for present
>> processing and analysis of XHTML+RDFa 1.1 documents.
>>
>> The generic scenario is as follows:
>>
>> 1) Use a command line processor to extract triples from XHTML+RDFa1.1 document
>> 2) Identify the parts of the document that are of interest to users
>> on the basis of a SPARQL query
>> 3) Display those parts of the document to users.
>>
>> I hope it is clear that the current XHTML5 spec will by default
>> discard information in this workflow. A more specific is example is
>> that I have texts which make reference to geographic entities. We mark
>> these up in the following way:
>>
>> <span rel="dcterms:references" typeof=''dcterms:Location">
>>  <a rel="rdfs:isDefinedBy" property="rdfs:label"
>> href="http://pleiades.stoa.org/places/668331">Palmyra (<span
>> xml:lang="grc">Πάλμυρα</span>)
>> </span>
>>
>>
>> I want rapper to identify all the dcterms:Location entities in my
>> documents, discover their lat/long via reference to the Pleiades URI,
>> and then create a map labelled with the rdfs:lable value as it appears
>> in the text. Sure I can do various code-driven things to go back into
>> the text and grab the original but that is a practical burden. Yes, I
>> could have my editors put on a @datatype. But these are Manu's
>> "beginners". They will make mistakes. It is much simpler for RDFa to
>> respect the fact that these strings are XML and to retain the markup
>> by default.
>>
>> This overlaps with Manu's suggestion that I am asking for purity. I
>> am not. I am in the real world here and do not want RDFa to co-erce
>> (which term I use in its CS meaning) to the simple - dare I say "pure"
>> - form of a plain literal. Instead I would like it to preserve the
>> messy reality of my actual data. I don't just mean to say "Hey wait,
>> you're being pure. I'm the realist" but to point out that there are
>> many,  many real and practical uses for RDFa that are negatively
>> impacted by the not pursuing ISSUE-147 to a more robust solution than
>> just closing it. The real world is messy and currently the RDFa 1.1 in
>> HTML5 spec makes it hard to deal with that messiness. Hard for
>> developers and hard for beginners. Especially in that it silently
>> discards [4] intentional markup.
>>
>> So....
>>
>> PROPOSAL: Define an rdfa:defaultDatatype attribute that can be used
>> in XHTML5 texts. This would take the form of:
>>
>>
>>  rdfa:defaultDatatype="rdf:HTML" (slightly more technically, it would
>> have an rdf:range of rdf:resource).
>>
>> When this attribute is in scope, @property processing will produce
>> that datatype for elements that have children. If the value is
>> "rdf:HTML" or "rdf:XMLLiteral" processing will be according to the W3
>> rules defined for those types.
>>
>> I am not a spec writer but I hope the intent is clear. I believe this
>> is a flexible mechanism that provides for robust preservation of the
>> original intent of markup undertaken by both beginners and experts. I
>> believe it accommodates the other use cases and evidence I have raised
>> previously on this issue. I believe it is timely in that we are
>> considering more substantial incompatibilities such as @itemref.
>>
>> Thanks,
>>
>> Sebastian.
>>
>>
>>
>> [1] http://lists.w3.org/Archives/Public/public-rdfa-wg/2012Dec/0083.html
>> [2] http://lists.w3.org/Archives/Public/public-rdfa-wg/2012Dec/0084.html
>> [3] http://lists.w3.org/Archives/Public/public-rdfa-wg/2012Dec/0086.html
>> [4] While I have used "destroy", I have done so in recent e-mails.
>> "discard" is more accurate and maybe we should keep to that.
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
Received on Wednesday, 9 January 2013 05:33:35 UTC