Re: [epubcheck] naming and issues conventions from Wolfgang Schindler on 2017-06-14 (public-epub3@w3.org from June 2017)

From: Wolfgang Schindler <ws.schindler@googlemail.com>
Date: Wed, 14 Jun 2017 15:35:42 +0200
To: Charles LaPierre <charlesl@benetech.org>
Cc: Romain <rdeltour@gmail.com>, "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>, "public-epub3@w3.org" <public-epub3@w3.org>
Message-ID: <CAH2qv+xoUsZQSGf4vhGqj+E7nMLtBUVSkudE+wgYw58QFabgFg@mail.gmail.com>
many thanks, Tzviya!

Garth is defining a function-oriented directory structure which should give
a first, decisive orientation if you try to find a specific test case. The
file names of the test epubs should be as telling as possible, but I'm
afraid that file names that are too long and complicated are rather
difficult to read. Would we really need the prefix in capital letters after
a meaningful directory structure has been established?

Wouldn't additionally a lookup table or an index based on the descriptive
texts be helpful for an efficient search of the test cases?

I've found a case - epubcheck <https://github.com/IDPF/epubcheck>/src
<https://github.com/IDPF/epubcheck/tree/master/src>/test
<https://github.com/IDPF/epubcheck/tree/master/src/test>/resources
<https://github.com/IDPF/epubcheck/tree/master/src/test/resources>/30
<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30>/epub
<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30/epub>/
invalid
<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30/epub/invalid>
/issue265.epub described as "Add checks for duplicate ZIP entries. Fixes
issue 265" which is valid (according to epubcheck 4.0.2) although it is in
the "invalid" directory and where I don't find any duplicate ZIP in the
epub file. The issue tracker for #265 leads on to other test cases with a
different number. But there is also epubcheck
<https://github.com/IDPF/epubcheck>/src
<https://github.com/IDPF/epubcheck/tree/master/src>/test
<https://github.com/IDPF/epubcheck/tree/master/src/test>/resources
<https://github.com/IDPF/epubcheck/tree/master/src/test/resources>/30
<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30>/epub
<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30/epub>/
invalid
<https://github.com/IDPF/epubcheck/tree/master/src/test/resources/30/epub/invalid>
/issue265b.epub  with the same description which has indeed a duplicate
content doc and is invalid. Validating this file has the following output:

Verwendung der EPUB 3.0.1 Prüfungen
WARNING(OPF-003): issue265b.epub/issue265b.epub(-1,-1): Die Datei
'EPUB/loremA?.
xhtml' ist im EPUB vorhanden, wurde jedoch nicht in der OPF-Datei
deklariert. (xhtml not declared in opf - which is wrong!)
WARNING(PKG-012): issue265b.epub/EPUB/loremA?.xhtml(-1,-1): Dateiname
enthält die folgenden Nicht-ASCII-Zeichen: '?'. Versuche den Dateinamen zu
ändern! (encoding of file name as UTF-8 not recognized)
WARNING(OPF-061): issue265b.epub/issue265b.epub(-1,-1): Doppelte Datei im
EPUB-Archiv (nach Unicode-NFC-Normalisierung): '%1$f' :f !=
java.lang.String (duplicate content doc!)
WARNING(PKG-012): issue265b.epub/EPUB/loremÁ.xhtml(-1,-1): Dateiname
enthält die  folgenden Nicht-ASCII-Zeichen: 'Á'. Versuche den Dateinamen zu
ändern!  (encoding of file name as UTF-8 not recognized)

I have described this case in more detail because I think it demonstrates
several issues:

   - It might happen that it is *not always obvious what the function of a
   test case is* - an invalid test case that validates successfully, two
   cases with the same descriptive text, issues that you don't see in the epub
   contents, etc. This is by the way definitely not meant as a criticism! I
   think we would need some guidance from @Romain or @Tobias to define a
   procedure for such cases because the renaming operation presupposes a
   proper understanding of the function of a test case.

   - The warnings in my example show that there are at least two issues -
   OPF: duplicate content docs vs. PKG: issues with UTF-8 support. It would
   make sense to define "pure" test cases where only one issue is covered.
   Would that mean that in this case "issue265.epub" should be deleted,
   "issue265.epub" should also be deleted and two new epubs
   "content-docs-with-same-file-name.epub" and
   "content-doc-with-Unicode-file-name.epub" should be generated and uploaded?

   - Ideally we would have a step-by-step instruction and could discuss any
   problems with a specific epub as an issue in Github.


Have a nice afternoon!

Best,
Wolfgang

2017-06-13 23:22 GMT+02:00 Charles LaPierre <charlesl@benetech.org>:

> Thanks for the clarification Romain, then yes this sounds good, and I
> agree with Vincent that adding in a category at the beginning might be a
> good idea as well "strong prefix for categorizing files: OPF, CONTENT,
> NAV, CONTAINER…”
>
>
> Thanks
> EOM
>
> Charles LaPierre
> Technical Lead, DIAGRAM and Born Accessible
> E-mail: charlesl@benetech.org
> Twitter: @CLaPierreA11Y
> Skype: charles_lapierre
> Phone: 650-600-3301 <(650)%20600-3301>
>
>
>
> On Jun 13, 2017, at 1:36 PM, Romain <rdeltour@gmail.com> wrote:
>
>
> On 13 Jun 2017, at 17:32, Charles LaPierre <charlesl@benetech.org> wrote:
>
> For unit tests the convention is that every test must start with the word
> test in lowercase.  Some testing environments require this I believe.
>
>
> This a convention for the test method names in the Java classes, indeed.
> What Tzviya suggested was about the test data (the actual EPUB files).
>
> As far as I can see there should be a 1-1 mapping of test data to test
> methods, and it's easy to convert the naming convention from one to the
> other (e.g. from file name to Java name: add a test prefix and make it
> camel case).
>
> Best,
> Romain.
>
>
>
Received on Wednesday, 14 June 2017 13:36:18 UTC