W3C home > Mailing lists > Public > public-lod@w3.org > December 2013

Re: Lang and dt in the graph. Was: Dumb SPARQL query problem

From: Ross Horne <ross.horne@gmail.com>
Date: Mon, 2 Dec 2013 12:24:35 +0600
Message-ID: <CAHBrK_hb-kMQkaYJkN_16Y4d+D3Bnt2wM8hfvfFbHEj_d1kE=g@mail.gmail.com>
To: public-lod community <public-lod@w3.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, Hugh Glaser <hugh@glasers.org>
Andy is right (as usual!). With the proposed bnode encoding, the graph
becomes fatter each time the same triple is loaded.

RDF 1.1 has just fixed the mess caused by blurring the roles of the lexer
and the parser, as summarised by David recently:

Please don't get back into mixing up the lexer and the parser. The lexical
spaces of the basic datatypes are disjoint, so in any language we can just
 - 999  instead of "999"^^xsd:integer
 - 9.99 instead of "9.99"^^xsd:decimal
 - "WWV" instead of "WWV"^^xsd:string
 - 2013-06-6T11:00:00+01:00 instead of

As part of a compiler [1], a lexer gobbles up characters, e.g. 999, and
turns the characters into a token. A token consists of a string, called an
attribute value, plus a token name, e.g. "999"^^xsd:integer. Only a
relatively small handful of people writing compilers for languages should
have to care about how tokens are represented, not end users of languages.

For language tags, a little simple conventional datatype subtyping (as
opposed to rdfs:subClassOf), could help the programmer further [2]. e.g. a
programmer that writes regex("WWV2013"@en, "WWV") clearly meant
regex("WWV2013", "WWV") and shouldn't have to care about the distinction,
unless I am mistaken.



[1] Ullman, Aho, Lam and Sethi. Compilers: principles, techniques and
tools. 1986
[2] Local Type Checking for Linked Data Consumers. http:/
Received on Monday, 2 December 2013 06:56:24 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:22:00 UTC