Re 2: More specific proposal on the vocabulary expansion (a.k.a. vocab proxy) feature from Ivan Herman on 2011-08-05 (public-rdfa-wg@w3.org from August 2011)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 5 Aug 2011 10:54:43 +0200
To: Gregg Kellogg <gregg@kellogg-assoc.com>
Cc: W3C RDFWA WG <public-rdfa-wg@w3.org>
Message-Id: <6E68BBAB-5C84-4E03-877B-9FEEF6CD5B8D@w3.org>

Forgetting the admin issues:-)

On Aug 4, 2011, at 20:10 , Gregg Kellogg wrote:
> 
> We had also discussed that datatype coercion would be something that a proxy vocabulary might accomplish, however, as I mentioned, I don't think this can be done through normal entailment regimes. It's also a theme in recent Microdata-RDF discussions [5] which have a similar issue. @itemvaltype is not universally loved, and it's questionable how much @datatype is used properly in RDFa. Clearly, being able to use some combination of rdfs:range and lexical value space could be useful in reproducing this, but it would seem to need to be done at processing time, not after the fact.
> 
> If we could solve this issue, it would be useful across all RDF serializations where publishers are often lazy about using typed literals.

So... I have given some more thoughts and I have to retract what I said yesterday evening. Indeed, if we stick to RDFS or even OWL 2, this will not be solved. After all, what I think we are talking about is to make a literal->literal mapping that would add a datatype to a literal. No entailment that I know of does that.

The issue I have is that, as you say, this has to be done during the core RDFa processing, not as a post-processing step. Otherwise you may end up by duplicating triples at the end, something like

<a> <b> "1234", "1234"^^xsd:int.

which is not really good. But the other danger is that there might be differences between RDFa processor outputs, depending on whether the processor implements this whole mechanism or not, and I think that would be even worse. 

I wonder whether it is not possible to something slightly different, ie, that *some* literals could be automatically interpreted with a datatype by the RDFa processor. I would expect only the very simple ones, like integers and floats. But we could say that, for example,

<span property="bla">1234</span>

would automatically generate

<> <bla> "1234"^^xsd:integer .

Note that, in Turtle, this could be written as 

<> <bla> 1234 .

so there is an analogy there...

In Turtle, decimal integers, floating point numbers (as xsd:double), and decimals (as xsd:decimal) are accepted this way. Maybe we could extend that with dates if they are in ISO formats (although dates are really really messy:-(.

The question is what happens if the user does _not_ want that? Well, if there _is_ a datatype attribute set, than that takes priority. Ie, 

<span property="bla" datatype="xsd:string">1234</span>

will generate

<> <bla> "1234"^^xsd:string .

(Note that with the latest resolutions of the W3C RDF WG, this could also be serialized with:

<> <bla> "1234" .

I believe that by introducing this we would cover most of the datatype use cases without mixing in the post processing.

Thoughts?

Ivan

P.S. Yes, this is a little bit analogous to the issue on whether

<span property="bla">http://www.w3.org</span>

could be interpreted as 

<> <bla> <http://www.w3.org> .

because it is some sort of an automatic interpretation of the literal. This has been discussed, maybe this is an issue we would reopen? Or is it a can of worm?

----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Friday, 5 August 2011 08:52:58 UTC