ISSUE-147: PROPOSAL for rdfa:defaultDatatype from Sebastian Heath on 2013-01-08 (public-rdfa-wg@w3.org from January 2013)

From: Sebastian Heath <sebastian.heath@gmail.com>
Date: Tue, 8 Jan 2013 08:35:49 -0500
To: RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <CACsb_1qS3kEPKgOtBX7zZDSuS3zSuw0VLL1td2ZBucZAFePgkg@mail.gmail.com>
First I'd like to thank Ivan [1], Gregg [2], and Manu [3] for their
thoughtful replies on 29/12/2012. Other commitments kept me from
responding right away but I am hoping to provide more context before
the teleconference on the 10th.

 Preliminarily and germane to the PROPOSAL I'll make in this e-mail,
I'd like to consider a few of the points made in those message.

 Gregg wrote that it seemed I was very concerned with the archival
document that I am creating. This is true but not my only concern. My
most immediate concern is the workflows I am developing for present
processing and analysis of XHTML+RDFa 1.1 documents.

 The generic scenario is as follows:

 1) Use a command line processor to extract triples from XHTML+RDFa1.1 document
 2) Identify the parts of the document that are of interest to users
on the basis of a SPARQL query
 3) Display those parts of the document to users.

 I hope it is clear that the current XHTML5 spec will by default
discard information in this workflow. A more specific is example is
that I have texts which make reference to geographic entities. We mark
these up in the following way:

<span rel="dcterms:references" typeof=''dcterms:Location">
  <a rel="rdfs:isDefinedBy" property="rdfs:label"
href="http://pleiades.stoa.org/places/668331">Palmyra (<span
xml:lang="grc">Πάλμυρα</span>)
</span>


I want rapper to identify all the dcterms:Location entities in my
documents, discover their lat/long via reference to the Pleiades URI,
and then create a map labelled with the rdfs:lable value as it appears
in the text. Sure I can do various code-driven things to go back into
the text and grab the original but that is a practical burden. Yes, I
could have my editors put on a @datatype. But these are Manu's
"beginners". They will make mistakes. It is much simpler for RDFa to
respect the fact that these strings are XML and to retain the markup
by default.

 This overlaps with Manu's suggestion that I am asking for purity. I
am not. I am in the real world here and do not want RDFa to co-erce
(which term I use in its CS meaning) to the simple - dare I say "pure"
- form of a plain literal. Instead I would like it to preserve the
messy reality of my actual data. I don't just mean to say "Hey wait,
you're being pure. I'm the realist" but to point out that there are
many,  many real and practical uses for RDFa that are negatively
impacted by the not pursuing ISSUE-147 to a more robust solution than
just closing it. The real world is messy and currently the RDFa 1.1 in
HTML5 spec makes it hard to deal with that messiness. Hard for
developers and hard for beginners. Especially in that it silently
discards [4] intentional markup.

 So....

 PROPOSAL: Define an rdfa:defaultDatatype attribute that can be used
in XHTML5 texts. This would take the form of:


  rdfa:defaultDatatype="rdf:HTML" (slightly more technically, it would
have an rdf:range of rdf:resource).

 When this attribute is in scope, @property processing will produce
that datatype for elements that have children. If the value is
"rdf:HTML" or "rdf:XMLLiteral" processing will be according to the W3
rules defined for those types.

 I am not a spec writer but I hope the intent is clear. I believe this
is a flexible mechanism that provides for robust preservation of the
original intent of markup undertaken by both beginners and experts. I
believe it accommodates the other use cases and evidence I have raised
previously on this issue. I believe it is timely in that we are
considering more substantial incompatibilities such as @itemref.

 Thanks,

 Sebastian.



[1] http://lists.w3.org/Archives/Public/public-rdfa-wg/2012Dec/0083.html
[2] http://lists.w3.org/Archives/Public/public-rdfa-wg/2012Dec/0084.html
[3] http://lists.w3.org/Archives/Public/public-rdfa-wg/2012Dec/0086.html
[4] While I have used "destroy", I have done so in recent e-mails.
"discard" is more accurate and maybe we should keep to that.
Received on Tuesday, 8 January 2013 13:36:18 UTC