Re: normalization issues with Turtle spec tests

* Ruben Verborgh <ruben.verborgh@ugent.be> [2013-04-06 22:15+0200]
> Dear all,
> 
> I have been working on making my JavaScript streaming Turtle parser node-n3 [1] compatible with the CR spec tests [2].
> I’ve come across some issues with normalization that I’d like to have your feedback on.
> 
> My current test setup is:
> 1. parse action file, write as N-triples, send to cwm
> 2. download correct N-triples result, send to cwm
> 3. compare both cwm outputs string-wise
> 
> With this setup, I’m experiencing the following normalization issues:
> - The result of bareword_double is a bit inconvenient because is includes an uppercase E to indicate the double’s exponent,
>    instead of a lowercase e found in other tests such as turtle-subm-19 and turtle-subm-20.
>    While this is of course not wrong, it is inconvenient with parsers that normalize the exponent (to either lowercase or uppercase).
>    If I choose to normalize to uppercase, bareword_double fails. If I choose to normalize to lowercase, turtle-subm-19 and turtle-subm-20 fail.
>    
> - The result of positive_numeric includes "+1"^^<http://www.w3.org/2001/XMLSchema#integer>.
>    Although correct, it is more convenient when normalized to "1"^^<http://www.w3.org/2001/XMLSchema#integer>.
> 
> - The result of numeric_with_leading_0 includes "01"^^<http://www.w3.org/2001/XMLSchema#integer>
>    Although correct, it is more convenient when normalized to "1"^^<http://www.w3.org/2001/XMLSchema#integer>.
>    (In that case, the result could be shared with positive_numeric.)
> 
> - The result of turtle-subm-11 includes leading zeros on two lines, although the test is called “decimal integer canonicalization”.
>   I’d expect canonicalization to be applied indeed and the leading zeros removed.
> 
> - The Turtle draft spec part about quoted literals [3] points to the RDF 1.1 Concepts and Abstract Syntax [4],
>    which says that the language tag must be normalized to lowercase.
>    However, this normalization does not happen in the result of “langtagged_LONG_with_subtag”, which uses @en-UK.
> 
> Therefore, I wonder:
> - Would it be meaningful to change the test results to make them use normalization?

The tests are enforcing checking that the term generaged from e.g. '"+1"^^xsd:integer' is distinct from '1' (which is the same as '"1"^^xsd:integer').
  https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#dfn-literal-equality
This might be an opportunity to comment out some code.


> - If not, are there any suggestions to change my test setup?

If you write both as N-triples, you can use Jena's isIsomorphicWith to compare them.
  http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html#isIsomorphicWith(com.hp.hpl.jena.graph.Graph)
I use SWObjects with a command line "-d test.nt --compare ref.nt"


> Right now, the tests are difficult for parsers that apply normalization,
> i.e., you are forced to remember the initial serialization to get correct results.
> This is probably not desirable.
> 
> Best regards,
> 
> Ruben
> 
> PS I expect to have passing EARL reports soon.
> 
> [1] https://github.com/RubenVerborgh/node-n3/tree/cr-spec
> [2] http://lists.w3.org/Archives/Public/public-rdf-comments/2013Feb/0037.html
> [3] http://www.w3.org/TR/turtle/#turtle-literals
> [4] http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#dfn-literal

-- 
-ericP

Received on Saturday, 6 April 2013 20:48:48 UTC