- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Sat, 6 Apr 2013 16:48:15 -0400
- To: Ruben Verborgh <ruben.verborgh@ugent.be>
- Cc: public-rdf-comments@w3.org, Gregg Kellogg <gregg@greggkellogg.net>, gavin@carothers.name
* Ruben Verborgh <ruben.verborgh@ugent.be> [2013-04-06 22:15+0200] > Dear all, > > I have been working on making my JavaScript streaming Turtle parser node-n3 [1] compatible with the CR spec tests [2]. > I’ve come across some issues with normalization that I’d like to have your feedback on. > > My current test setup is: > 1. parse action file, write as N-triples, send to cwm > 2. download correct N-triples result, send to cwm > 3. compare both cwm outputs string-wise > > With this setup, I’m experiencing the following normalization issues: > - The result of bareword_double is a bit inconvenient because is includes an uppercase E to indicate the double’s exponent, > instead of a lowercase e found in other tests such as turtle-subm-19 and turtle-subm-20. > While this is of course not wrong, it is inconvenient with parsers that normalize the exponent (to either lowercase or uppercase). > If I choose to normalize to uppercase, bareword_double fails. If I choose to normalize to lowercase, turtle-subm-19 and turtle-subm-20 fail. > > - The result of positive_numeric includes "+1"^^<http://www.w3.org/2001/XMLSchema#integer>. > Although correct, it is more convenient when normalized to "1"^^<http://www.w3.org/2001/XMLSchema#integer>. > > - The result of numeric_with_leading_0 includes "01"^^<http://www.w3.org/2001/XMLSchema#integer> > Although correct, it is more convenient when normalized to "1"^^<http://www.w3.org/2001/XMLSchema#integer>. > (In that case, the result could be shared with positive_numeric.) > > - The result of turtle-subm-11 includes leading zeros on two lines, although the test is called “decimal integer canonicalization”. > I’d expect canonicalization to be applied indeed and the leading zeros removed. > > - The Turtle draft spec part about quoted literals [3] points to the RDF 1.1 Concepts and Abstract Syntax [4], > which says that the language tag must be normalized to lowercase. > However, this normalization does not happen in the result of “langtagged_LONG_with_subtag”, which uses @en-UK. > > Therefore, I wonder: > - Would it be meaningful to change the test results to make them use normalization? The tests are enforcing checking that the term generaged from e.g. '"+1"^^xsd:integer' is distinct from '1' (which is the same as '"1"^^xsd:integer'). https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#dfn-literal-equality This might be an opportunity to comment out some code. > - If not, are there any suggestions to change my test setup? If you write both as N-triples, you can use Jena's isIsomorphicWith to compare them. http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html#isIsomorphicWith(com.hp.hpl.jena.graph.Graph) I use SWObjects with a command line "-d test.nt --compare ref.nt" > Right now, the tests are difficult for parsers that apply normalization, > i.e., you are forced to remember the initial serialization to get correct results. > This is probably not desirable. > > Best regards, > > Ruben > > PS I expect to have passing EARL reports soon. > > [1] https://github.com/RubenVerborgh/node-n3/tree/cr-spec > [2] http://lists.w3.org/Archives/Public/public-rdf-comments/2013Feb/0037.html > [3] http://www.w3.org/TR/turtle/#turtle-literals > [4] http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#dfn-literal -- -ericP
Received on Saturday, 6 April 2013 20:48:48 UTC