Re: Test case identification

* Boris Villazón Terrazas <bvillazon@fi.upm.es> [2011-03-08 02:46+0100]
> Hi Bob
> 
> First of all, thank you very much for start implementing the Test
> Cases ... ;-)
> 
> Second of all, apologies but the previous week I was quite busy with
> some project deadlines.
> 
> Third of all, I would like to announce that Alex is back in Madrid,
> and he will join the team of the test cases definition.
> 
> Finally, my comments inline
> 
> On 07/03/2011 17:10, Robert Scanlon wrote:
> >Hi all,
> >
> >We've started implementing some of the test cases.  We very much
> >like the general approach of having multiple unique databases, and
> >attaching mapping test cases to those; and in general, the use of
> >short unique identifiers for test cases ('tc0000a', etc).
> >However, coming at it from a perspective of large-scale automation
> >as well as browsing and understandability, we do have some
> >concerns about how both the data sets and test cases are
> >identified.  Is there a chance that the WG might consider some
> >minor tweaks?
> Yes, of course ... if everyone agree
> >The 2 main concerns are:
> >
> >   1. there's no explicit short, sortable unique identifier for
> >      databases (schemas or 'data sets')
> >
> You mean, something like database1, database2, etc?
> >
> >   1. the test case identifiers are not consistent and scalable
> >
> It is our first approach, and we know we can improve it ... thanks
> for pointing out this.
> >
> >
> >_*Summary*_
> >
> >To address these, an identification system similar to the one
> >below is proposed.  Note that the organization is identical to
> >what's there now, it is simply the identifiers (and associated
> >file names using those identifiers) that are changed.  Details are
> >below for those who want more background.
> >
> >*D000 *- Simple 1 table, 1 column database; empty.
> >    TC000-G000 - Direct graph - map_g000.ttl, ...
> >    TC000-R000 - <description> - map_r000.ttl, ...
> >    TC000-R001 - <description>- map_r001.ttl, ...
> >    ...
> >*D001 *- Simple 1 table, 1 column database; 1 record.
> >    TC001-G000 - Direct graph - map_g000.ttl, ...
> >    TC001-R000 - <description>- map_r000.ttl, ...
> >    TC001-R001 - <description>- map_r001.ttl, ...
> >    ...
> >*D002 *- Simple 1 table, 2 columns database; 1 record.
> >    TC002-G000 - Direct graph - map_g000.ttl, ...
> >    TC002-R000 - <description>- map_r000.ttl, ...
> >    TC002-R001 - <description>- map_r001.ttl, ...
> >    TC002-R002 - <description>- map_r002.ttl, ...
> >    TC002-R003 - <description>- map_r003.ttl, ...
> >  ...
> >
> >I show capital letters above, which is appropriate for use in
> >docs, but in directories and file names, all identifiers should be
> >lowercase.
> >
> >
> Ok, I'll check it and I'll modify them.
> >*_Background, Item 1_*
> >
> >For the first item, currently the names are 'munged' short
> >descriptions, such as *1table0rows*,
> >*1table1compositeprimarykey3columns1row*, etc.  These can get
> >quite long, but more importantly, they don't sort nicely in
> >directory/file listings (such as under
> >dvcs.w3.org/hg/rdb2rdf-tests/file
> ><https://dvcs.w3.org/hg/rdb2rdf-tests/file/92f2b4b0c9ca>).
> >Besides file system sorting, the names begin with numbers, which
> >keeps them from being used in any way as part of a database schema
> >(for some/many systems).
> These short descriptions were suggested by Eric ;) ... but I see the
> problem ...
> Eric, do you want to say something about this?

In my experience, descriptive test names save a human lookup and cut
down on test redundancy (which means you can focus more on coverage).
The names are never perfect as they can only capture a subset of the
interesting characteristics, but in e.g. SPARQL, I found the
descriptively-named tests more useful than the ones given arbitrary
names.

I find descriptive filenames even more helpful as it encourages us to
re-use files for multiple tests. I won't lie down in the road over
this, but I do find descriptive names more helpful than harmful.

The leading numbers is an interesting point. I continually wrestle
with XML's adoption of the the lexical name restrictions in common
programming languages (most notably, no leading digits) and hadn't
even thought it being a pain here.


> >I would suggest we use a unique identifier such as 'D000', 'D001',
> >etc, for 'data' or 'database'.  This addresses all the issues
> >raised above.
> >
> >[Note that while I say there's no _explicit_ sortable identifier
> >for data sets, there does seem to be an /implicit /one, bound into
> >the names of all test cases under a data set.  E.g., tc0000,
> >tc0000a, tc0000b, etc are all tied to the first data set (which
> >one could infer had an identifier along the lines of '0000').
> >Similarly for tc0001, etc.  So really I'm just suggesting that the
> >data set identifier be made explicit (and start with an alpha
> >character).  It can be 4 digits in length (as the test cases
> >currently are), but 3 seems sufficient (maybe even 2; too many
> >unique data sets will make test case management more difficult).]
> >
> >
> Ok, I see ... if everyone agree I'll update the test cases.
> any comments?
> 
> >*_Background, _**_Item 2_*
> >
> >For the second item, the test case (mapping) identifiers have a
> >couple variations depending on where you look, are not explicitly
> >tied to their parent data set, and seem limited to 26
> >mappings/tests per database.
> >
> >As far as variations, in the wiki
> >(http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_Test_Cases_v1), some
> >of the test cases have an 'Id' property, with values such as
> >R2RMLTC0001a, R2RMLTC0001b, etc.  (Not all do, and some have a
> >header with a space, such as 'Direct Graph TC0000').  Whereas the
> >'official' test case spec
> >(http://www.w3.org/2001/sw/rdb2rdf/test-cases/) has similar names,
> >but no explicit 'Id', and the R2RML headers have a space like the
> >Direct Graph ones.  It would be nice to make these Id's
> >consistent, and short, e.g., use a code letter instead of 'Direct
> >Graph' and 'R2RML' in the Id.
> >
> >As far as links to database/data set, becuase there are not
> >explicitly-defined data set Id's, one has to infer that the
> >'0000', '0001', etc parts of these are tying them to the (not
> >explicitly identified) data sets.  That can be addressed by adding
> >explicit data set Id's as suggested above.
> >
> >As far as scalability, the use of the letters 'a', 'b', 'c', etc
> >for the individual mappings/test cases attached to each data set
> >seems to limit the # test cases to 26 cases per database.  That
> >seems insufficient.
> >
> >To resolve these, I'd suggest we uniquely identify each mapping
> >test case as follows:
> >
> >test case id = TC<data set #>-<mapping id>
> >
> >where <mapping id> would be local to a data set, with values G000,
> >G001, etc, for direct graph mappings and R000, R001, R002, etc for
> >R2RML mappings.  (I'm assuming that there *could* eventually be
> >more than 1 direct mapping test case per data set, based on
> >configuration; otherwise the 'G' and 'R' for the mapping part
> >would not be needed, with '000' representing the direct graph,
> >like it is now).  As with the data set Id, the # of digits can
> >probably safely be 3 (or perhaps 2, though that *could* come up
> >short in some cases).
> >
> >Ideally, the mapping files for any given data set would be named
> >using the local mapping ID, e.g., map_g000.ttl, map_r000.ttl,
> >map_r001.ttl, etc, so they can be unambiguously linked to spec
> >definitions and so they (and output filenames generated from them)
> >sort nicely within the context of a given data set's directory.
> >
> >
> Ok, I see ... if everyone agree I'll update the test cases.
> any comments?
> 
> Thank you very much for your suggestions Bob
> 
> Boris
> 
> >Thanks,
> >Bob Scanlon
> >Revelytix
> >
> 

-- 
-ericP

Received on Tuesday, 8 March 2011 03:44:34 UTC