RE: [epubcheck] naming and issues conventions from Siegman, Tzviya - Hoboken on 2017-06-14 (public-epub3@w3.org from June 2017)

From: Siegman, Tzviya - Hoboken <tsiegman@wiley.com>
Date: Wed, 14 Jun 2017 14:38:33 +0000
To: Wolfgang Schindler <ws.schindler@googlemail.com>, Charles LaPierre <charlesl@benetech.org>
CC: Romain <rdeltour@gmail.com>, "public-epub3@w3.org" <public-epub3@w3.org>
Message-ID: <SN1PR0201MB16151AABBC76DBAA022AD3D2D5C30@SN1PR0201MB1615.namprd02.prod.outlook.>
Hi Wolfgang,

Thank you for your feedback. Garth already suggested that prefixes, such as “OPF” will likely be irrelevant once he proposed a proper site structure.

Part of the proposed clean up is  exactly what you’ve suggested, ensuring that each test tests only one thing. See [1] for more details.

[1] https://github.com/IDPF/epubcheck/wiki/TestSuiteCleanup#cleanup-for-each-test


Tzviya Siegman
Information Standards Lead
Wiley
201-748-6884
tsiegman@wiley.com<mailto:tsiegman@wiley.com>

From: Wolfgang Schindler [mailto:ws.schindler@googlemail.com]
Sent: Wednesday, June 14, 2017 9:36 AM
To: Charles LaPierre
Cc: Romain; Siegman, Tzviya - Hoboken; public-epub3@w3.org
Subject: Re: [epubcheck] naming and issues conventions

many thanks, Tzviya!

Garth is defining a function-oriented directory structure which should give a first, decisive orientation if you try to find a specific test case. The file names of the test epubs should be as telling as possible, but I'm afraid that file names that are too long and complicated are rather difficult to read. Would we really need the prefix in capital letters after a meaningful directory structure has been established?

Wouldn't additionally a lookup table or an index based on the descriptive texts be helpful for an efficient search of the test cases?

I've found a case - epubcheck<https://github.com/IDPF/epubcheck>/src<https://github.com/IDPF/epubcheck/tree/master/src>/test<https://github.com/IDPF/epubcheck/tree/master/src/test>/resources<https://github.com/IDPF/epubcheck/tree/master/src/test/resources>/30<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30>/epub<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30/epub>/invalid<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30/epub/invalid>/issue265.epub described as "Add checks for duplicate ZIP entries. Fixes issue 265" which is valid (according to epubcheck 4.0.2) although it is in the "invalid" directory and where I don't find any duplicate ZIP in the epub file. The issue tracker for #265 leads on to other test cases with a different number. But there is also epubcheck<https://github.com/IDPF/epubcheck>/src<https://github.com/IDPF/epubcheck/tree/master/src>/test<https://github.com/IDPF/epubcheck/tree/master/src/test>/resources<https://github.com/IDPF/epubcheck/tree/master/src/test/resources>/30<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30>/epub<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30/epub>/invalid<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30/epub/invalid>/issue265b.epub  with the same description which has indeed a duplicate content doc and is invalid. Validating this file has the following output:

Verwendung der EPUB 3.0.1 Prüfungen
WARNING(OPF-003): issue265b.epub/issue265b.epub(-1,-1): Die Datei 'EPUB/loremA?.
xhtml' ist im EPUB vorhanden, wurde jedoch nicht in der OPF-Datei deklariert. (xhtml not declared in opf - which is wrong!)
WARNING(PKG-012): issue265b.epub/EPUB/loremA?.xhtml(-1,-1): Dateiname enthält die folgenden Nicht-ASCII-Zeichen: '?'. Versuche den Dateinamen zu ändern! (encoding of file name as UTF-8 not recognized)
WARNING(OPF-061): issue265b.epub/issue265b.epub(-1,-1): Doppelte Datei im EPUB-Archiv (nach Unicode-NFC-Normalisierung): '%1$f' :f != java.lang.String (duplicate content doc!)
WARNING(PKG-012): issue265b.epub/EPUB/loremÁ.xhtml(-1,-1): Dateiname enthält die  folgenden Nicht-ASCII-Zeichen: 'Á'. Versuche den Dateinamen zu ändern!  (encoding of file name as UTF-8 not recognized)

I have described this case in more detail because I think it demonstrates several issues:

  *   It might happen that it is not always obvious what the function of a test case is - an invalid test case that validates successfully, two cases with the same descriptive text, issues that you don't see in the epub contents, etc. This is by the way definitely not meant as a criticism! I think we would need some guidance from @Romain or @Tobias to define a procedure for such cases because the renaming operation presupposes a proper understanding of the function of a test case.
  *   The warnings in my example show that there are at least two issues - OPF: duplicate content docs vs. PKG: issues with UTF-8 support. It would make sense to define "pure" test cases where only one issue is covered. Would that mean that in this case "issue265.epub" should be deleted, "issue265.epub" should also be deleted and two new epubs "content-docs-with-same-file-name.epub" and "content-doc-with-Unicode-file-name.epub" should be generated and uploaded?
  *   Ideally we would have a step-by-step instruction and could discuss any problems with a specific epub as an issue in Github.
Have a nice afternoon!

Best,
Wolfgang

2017-06-13 23:22 GMT+02:00 Charles LaPierre <charlesl@benetech.org<mailto:charlesl@benetech.org>>:
Thanks for the clarification Romain, then yes this sounds good, and I agree with Vincent that adding in a category at the beginning might be a good idea as well "strong prefix for categorizing files: OPF, CONTENT, NAV, CONTAINER…”


Thanks
EOM

Charles LaPierre
Technical Lead, DIAGRAM and Born Accessible
E-mail: charlesl@benetech.org<mailto:charlesl@benetech.org>
Twitter: @CLaPierreA11Y
Skype: charles_lapierre
Phone: 650-600-3301<tel:(650)%20600-3301>


On Jun 13, 2017, at 1:36 PM, Romain <rdeltour@gmail.com<mailto:rdeltour@gmail.com>> wrote:


On 13 Jun 2017, at 17:32, Charles LaPierre <charlesl@benetech.org<mailto:charlesl@benetech.org>> wrote:

For unit tests the convention is that every test must start with the word test in lowercase.  Some testing environments require this I believe.

This a convention for the test method names in the Java classes, indeed. What Tzviya suggested was about the test data (the actual EPUB files).

As far as I can see there should be a 1-1 mapping of test data to test methods, and it's easy to convert the naming convention from one to the other (e.g. from file name to Java name: add a test prefix and make it camel case).

Best,
Romain.
Received on Wednesday, 14 June 2017 14:39:11 UTC