- From: Pat Hayes <phayes@ihmc.us>
- Date: Fri, 13 May 2011 15:49:51 -0500
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: Alex Hall <alexhall@revelytix.com>, RDF Working Group WG <public-rdf-wg@w3.org>
- Message-Id: <09730824-04D0-4A49-B486-6492B6AA70F2@ihmc.us>
On May 13, 2011, at 10:00 AM, Richard Cyganiak wrote: > On 13 May 2011, at 15:33, Alex Hall wrote: >> It's for this reason that I'd prefer to keep rdf:PlainLiteral out of the core RDF specs and reserve it for exchanging language-tagged literals with systems that don't support that notion. Having to deal with the extraneous '@' for literals without language tags seems like needless complexity for what should be a simple string manipulation. > > Strong +1. Earlier I tried to work out the changes to the spec that would be required to make rdf:PlainLiteral the unified representation of strings, and it's a bloody mess and I really don't want to go there. I agree, but if we have to (a) include lang tags and (b) fit within the current RDF description of a datatype (which mentions a mapping from a string to a value, not from a pair to a value) then this is about the best that can be done, I think. (I was part of the debates that led to this design, and tried very hard myself to get rid of the trailing @ at the time, but couldn't find a way to do it.) Actually, I don't think it ALL that much of a mess: one trailing @ character isn't a bone-breaker, surely, to anyone who has to take a URI apart every now ant again. BUt it would be neater without it, for sure. HOWEVER... I think we do have another way out. Unlike the designers of rdf:PlainLIteral, who were obliged to work within the constraints of the current RDF design, we can re-design RDF. See below. > I kept my notes on the wiki anyways: > http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/SyntacticSugarProposal > >> If we're going to say that everything has a datatype, I'd prefer to see "foo" get normalized to "foo"^^xsd:string. But my reasons there are more aesthetic; it just seems wrong to single out that one particular primitive datatype and say that it should not be used. >> > >> FWIW, my preferred approach would be to: >> 1. Say that every literal has *either* a datatype *or* a language tag. >> 2. Say that the datatype of the surface form "foo" is xsd:string. > > This feels weird. Ok, "foo" is of type string, even though the type is implicit, I can understand that. But why is it no longer a string if I tag it as English? Shouldn't it still have an implicit type of string? The string itself is still a string, but the literal is not just that string, its that string plus a tag, ie a pair. Which is why it – the literal rather than the the string – can't be typed with xsd:string. Sigh. But try this for size. Plain literals are a very special case, unique to RDF, and it is the language tag which makes them so special and strange. Datatypes are defined currently as mappings from a string to a value (so the rdf:PlainLiteral had to smush the tag into the string, hence all the @ business.) But we can define a special datatype which maps pairs into values, just for this purpose. We can even call it rdf:PlainLIteral without contradicting the current specs. It applies to two kinds of lexical forms: strings (these will be the ones with the @ in them), and pairs of a string with a lang tag. The lang tag may be the empty tag, but still we distinguish between S and <S, empty>. This, every plain literal is assumed to have a lang tag in it, even when there is no @ in the syntax. Its value space is the set of strings containing at lest one '@' character, and pairs of a string and a language tag. The mapping follows the current rdf:PlainLiteral spec when applied to strings, so that "foo@en"^^rdf:PlainLiteral maps to <"foo", "en"> ; but in addition, it applies to current plain literal syntax, treated as being a pair of a string and a lang tag, so that "foo"@en also maps to <"foo", "en">. Here is the complete mapping as a table: Lexical form value "foo@" "foo" "foo@tag" <"foo", tag> "foo", empty "foo" "foo", tag <"foo", tag> when tag =/= empty and the plain literal syntax is understood thus: "foo" parses to "foo", empty and "foo"@tag parses to "foo", tag . The reason for this empty-tag shuffle is to keep a plain literal string distinguished from the rdf:PlainLIteral string with the trailing @ added, of course. If we could ignore the current rdf:PlainLIteral specs, this would be easier and we could simply map "foo" to itself and "foo"@en to <"foo", en>. But I think the shuffling is worth doing to avoid having even more inter-specs contradictions in this area. Advantages: Gives a type to plain literals; preserves rdf:PlainLIteral specs (extending them, but not contradicting them); allows people to use plain literals without getting involved with trailing @; and allows xsd:string to be deprecated in favor of plain literal syntax (or the reverse, of course.) Disadvantages: might be thought too complicated; takes the notion of type slightly outside the current RDF datatype specs. Thoughts? Pat > So you have replaced one weird thing (multiple ways of representing a string) with another weird thing (a notion of string datatypes that doesn't make sense). > > I think the sensible way would be: > 1) every literal has *both* a datatype and a (possibly empty) language tag; EVERY literal? What about numbers and dates and times and ... ? > 2) of the built-in datatypes, only xsd:string can have non-empty language tags; > 3) plain literals and rdf:PlainLiterals don't exist; > 4) "foo" in concrete syntaxes is syntactic sugar for "foo"^^xsd:string. > 5) "foo"@en in concrete syntaxes is syntactic sugar for "foo"^^xsd:string@en. > > This *might* work better than the rdf:PlainLiteral mess when translated into spec changes, but raises BC issues, and requires changes to syntax specs to add the syntactic sugar, so I prefer the proposal that says implementations MAY unify to plain literals, as it doesn't require changes to the abstract syntax. > >> As long as the surface forms "foo" and "foo"^^xsd:string get normalized to the same thing (or systems have permission to do such normalization) then I'm happy. > > Good to hear that. > > Best, > Richard ------------------------------------------------------------ IHMC (850)434 8903 or (650)494 3973 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Friday, 13 May 2011 20:50:29 UTC