csv/tsv lexical forms from Gregory Williams on 2011-10-16 (public-rdf-dawg@w3.org from October to December 2011)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Sun, 16 Oct 2011 16:31:30 -0400
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <0DAA41DE-AD2D-4E03-A7C6-4E5132C37BC0@evilfunhouse.com>

Speaking of datatype lexical forms, I'd also like to suggest that we need some changes to the csv test data and results. Since CSV is a lossy format, there's no way to do value-based comparisons of numeric literals in test results. For example, csv-tsv-res/csvtsv03.csv contains this row:

http://example.org/s6,http://example.org/p6,1.0e6

Since the CSV format doesn't indicate that the "1.0e6" value started out as a xsd:double, it's impossible to tell if the test should succeed if the query results contain:

http://example.org/s6,http://example.org/p6,1.0E6

(with capitalized "E"). If the data is numeric (as it is in this case), then the difference between "e" and "E" between expected and actual results should be fine. If, however, the data was a plain literal that just looked like an xsd:double value, then the difference should mean that the test fails. The trouble is there's no way to distinguish these cases.

I think we should update the data and expected results files to use canonical lexical form for the double value (using capitalized "E"), as this may help implementations avoid the problem if they pass through lexical values unchanged or if they canonicalize data on input. I think we should also think about marking this test as requiring (via an mf:requires triple) either canonicalization of lexical values, or of not touching lexical forms.

.greg

Received on Sunday, 16 October 2011 20:32:32 UTC