Last call comments on WOFF from Bert Bos on 2011-01-12 (www-font@w3.org from January to March 2011)

From: Bert Bos <bert@w3.org>
Date: Wed, 12 Jan 2011 16:16:26 +0100
To: www-font@w3.org
Message-Id: <201101121616.26719.bert@w3.org>
Hello Fonts WG,

Here are my personal comments on the last call for   
http://www.w3.org/TR/2010/WD-WOFF-20101116/

Sorry for being so late. I wrote these originally as part of the review 
by the CSS WG, and then had to wait until the CSS WG decided which 
comments it would send as a group. Unfortunately, my Christmas holidays 
started before I knew the outcome. :-( )



1) I'd like to say one more time that letting a URL carry information  
about the meaning of a resource is counter to W3C's common  
architecture for the Web and simply a bad idea. If I move a file to a  
different server (and hopefully leave a redirect behind), the file  
still means the same thing. If I distribute it over p2p, on a CD, or  
coin a URN for it, it is still the same file and should not act any  
differently. Going against this architecture *will* lead to problems.

And it's not like we don't know how to do it right. The way to encode  
usage metadata for fonts, in a protocol-independent and machine  
readable way, was invented by Microsoft for EOT more than ten years  
ago. The exact syntax doesn't matter, but the data has to be at the  
application level, not in the URL and not in the protocol.

2) I like the clear and careful language in the introduction. (So  
this is not an issue, not even a criticism, but it *is* a  
comment. :-) ) Especially the way it explains what WOFF is not (a new  
font format), and what it is for (@font-face).

That doesn't meant there will be no confusion, but I think the spec  
does as well as it can. EXI can explain that it is just XML with a  
syntax optimized for streaming, but people still see it as a new  
format. With WOFF it will be the same.

3) Section 7 Private date block: Why is the padding at the end a  
"should"? I could understand "must" (something you can test), or  
"may" (just ignore it). But if you are going to ignore the padding  
anyway, why should generators try hard to not write it? Ditto for the  
padding of the extended metadata block.

It is also ironic that the specification accuses the OpenType spec of  
not being clear about the padding of the final table, and then itself  
allows that padding to vary. (Sure, WOFF is not _unclear_, but the  
effect is the same. Imagine that some future Meta-WOFF wants to  
encode WOFF: it will have the same problems as WOFF in ensuring  
roundtrip encoding...)

Which means that a "must" seems the best choice. Whether it is "must  
be omitted" or "must be included" is less important, although doing  
the same for all blocks, whether the last or not, seems easiest.

4) Section 6 Extended metadata: If it is in XML and is metadata, it  
would seem logical to have chosen XMP. Existing XMP and RDF tools  
would be able to read it, no need for new parsers; it could be linked  
to other RDF ontologies, to enable Semantic Web tools to make  
inferences; and it would be extensible without the need to have  
different syntaxes for predefined and extended elements.

5) Section 4 / section 8. I like the rigorous error handling: if the  
decompressed length is not what was declared as origLength, the file  
_must_ be rejected. No attempts to second-guess what the encoder  
"intended" to do.

On the other hand, it's a bit wasteful to use four bytes to store an  
origLength. One bit to indicate compression would have been enough.  
There is no actual need to check the length, because there is already  
a checksum.

6) A context-free grammar for WOFF in the spec would have been nice,  
even if there are long-distance dependencies a CFG cannot express  
(such as that some number must correspond to the number of bytes  
somewhere else). A grammar gives a concise view of the structure of a  
file, better than the English text can, and thus helps programmers.

7) Is there really an advantage to aligning tables to 4-byte  
boundaries? It's another bit of extra work for a generator, another  
place where a programmer can make mistakes.

8) Section 4: It is a pity that there are multiple ways to encode the  
same font, and even to encode the same OpenType file: each table  
may be compressed or not, extended metadata may be added or not,  
private data may be added or not. That means you cannot do a simple  
binary compare to see if two files encode the same OpenType file, let  
alone the same font. A unique (canonical) format would also have  
helped with digital signing: Now it is possible to decode and re- 
encode the font without doing anything else and still end up with a  
broken digital signature.

Is there an advantage to having different ways to encode the same  
OpenType file?

9) The specification is called "1.0" but the actual format contains  
neither a version number, nor a way to define extensions. It is  
probably a good thing that there is only one WOFF format. It means  
that a correct implementations cannot be incompatible with another  
correct implementation, just because one was written later than the  
other. But why then is there that "1.0" in the title of the spec?

(The optional metadata has a version. Is that what the "1.0"  
corresponds to? Although that part of the file can be ignored by many  
kinds of implementations, it is still a pity that there can, in the  
future, be different formats that are all called WOFF, with the same  
file extension, the same magic string, and, probably, the same MIME  
type.)

10) Section 8: I didn't check that the summary is indeed compatible  
with the earlier sections, but it is clear that it contains some  
things that were already said earlier. I get uncomfortable when a  
spec repeats things in a normative section. There is almost certainly  
a contradiction somewhere. And if not now then in the next version of  
the draft. Shouldn't this section be labeled as informative instead?

11) Section 8: The note about "extra" data in OpenType files between  
the tables doesn't seem to belong in this section. It doesn't explain  
anything about the conformance of WOFF files. It relates to  
roundtripping, and while an interesting remark in itself, it was  
already mentioned earlier.



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France
Received on Wednesday, 12 January 2011 15:16:59 UTC