- From: Ross Horne <ross.horne@gmail.com>
- Date: Mon, 2 Dec 2013 12:24:35 +0600
- To: public-lod community <public-lod@w3.org>
- Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, Hugh Glaser <hugh@glasers.org>
- Message-ID: <CAHBrK_hb-kMQkaYJkN_16Y4d+D3Bnt2wM8hfvfFbHEj_d1kE=g@mail.gmail.com>
Andy is right (as usual!). With the proposed bnode encoding, the graph becomes fatter each time the same triple is loaded. RDF 1.1 has just fixed the mess caused by blurring the roles of the lexer and the parser, as summarised by David recently: http://lists.w3.org/Archives/Public/public-lod/2013Nov/0093.html Please don't get back into mixing up the lexer and the parser. The lexical spaces of the basic datatypes are disjoint, so in any language we can just write: - 999 instead of "999"^^xsd:integer - 9.99 instead of "9.99"^^xsd:decimal - "WWV" instead of "WWV"^^xsd:string - 2013-06-6T11:00:00+01:00 instead of "2013-06-6T11:00:00+01:00"^^xsd:dateTime As part of a compiler [1], a lexer gobbles up characters, e.g. 999, and turns the characters into a token. A token consists of a string, called an attribute value, plus a token name, e.g. "999"^^xsd:integer. Only a relatively small handful of people writing compilers for languages should have to care about how tokens are represented, not end users of languages. For language tags, a little simple conventional datatype subtyping (as opposed to rdfs:subClassOf), could help the programmer further [2]. e.g. a programmer that writes regex("WWV2013"@en, "WWV") clearly meant regex("WWV2013", "WWV") and shouldn't have to care about the distinction, unless I am mistaken. Regards, Ross [1] Ullman, Aho, Lam and Sethi. Compilers: principles, techniques and tools. 1986 [2] Local Type Checking for Linked Data Consumers. http:/ dx.doi.org/10.4204/EPTCS.123.4
Received on Monday, 2 December 2013 06:56:24 UTC