Re: Lang and dt in the graph. Was: Dumb SPARQL query problem from Ross Horne on 2013-12-02 (public-lod@w3.org from December 2013)

From: Ross Horne <ross.horne@gmail.com>
Date: Mon, 2 Dec 2013 12:24:35 +0600
To: public-lod community <public-lod@w3.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, Hugh Glaser <hugh@glasers.org>
Message-ID: <CAHBrK_hb-kMQkaYJkN_16Y4d+D3Bnt2wM8hfvfFbHEj_d1kE=g@mail.gmail.com>

Andy is right (as usual!). With the proposed bnode encoding, the graph
becomes fatter each time the same triple is loaded.

RDF 1.1 has just fixed the mess caused by blurring the roles of the lexer
and the parser, as summarised by David recently:
http://lists.w3.org/Archives/Public/public-lod/2013Nov/0093.html

Please don't get back into mixing up the lexer and the parser. The lexical
spaces of the basic datatypes are disjoint, so in any language we can just
write:
 - 999  instead of "999"^^xsd:integer
 - 9.99 instead of "9.99"^^xsd:decimal
 - "WWV" instead of "WWV"^^xsd:string
 - 2013-06-6T11:00:00+01:00 instead of
"2013-06-6T11:00:00+01:00"^^xsd:dateTime

As part of a compiler [1], a lexer gobbles up characters, e.g. 999, and
turns the characters into a token. A token consists of a string, called an
attribute value, plus a token name, e.g. "999"^^xsd:integer. Only a
relatively small handful of people writing compilers for languages should
have to care about how tokens are represented, not end users of languages.

For language tags, a little simple conventional datatype subtyping (as
opposed to rdfs:subClassOf), could help the programmer further [2]. e.g. a
programmer that writes regex("WWV2013"@en, "WWV") clearly meant
regex("WWV2013", "WWV") and shouldn't have to care about the distinction,
unless I am mistaken.

Regards,

Ross

[1] Ullman, Aho, Lam and Sethi. Compilers: principles, techniques and
tools. 1986
[2] Local Type Checking for Linked Data Consumers. http:/
dx.doi.org/10.4204/EPTCS.123.4

Received on Monday, 2 December 2013 06:56:24 UTC