Re: Potential conformance statements in WOFF2, sections 1 to 3

On 19/8/14 18:22, Chris Lilley wrote:

> 3.1. Data types, 255UInt16 Data Type
> "An encoder may produce any of these"
>
> This is checkable by a validator so should be marked as [FF] normative.
>
> An example of a test would be two fonts, one of which expresses a
> value < 253 as, for example
>
> code: 123
> and another which uses the longer but allowed form
> code: 253
> value: 123
>
> the validator passes if both fonts are checked as valid.

Is there any benefit to allowing multiple encodings of the same value 
here? Does the existing code actually use multiple forms, or could we 
simply tighten up the spec to require one of the forms and prohibit the 
others?

> 3.1. Data types, 255UInt16 Data Type
> "encoders should choose shorter encodings, and should be consistent in
> choice of encoding for the same value"
>
> The two "should" statements are not obviously testable in a
> pass/fail sense, only in a warning sense. If we want to do warnings,
> that would be [FF].
>
> 3.1. Data types, UIntBase128 Data Type
> "suitable for values up to 2^32-1"
>
> Thus, since 7 bits per byte are used for all but the last byte, the
> the maximum length would be 32/7 = 4.57 = 5 bytes, right? If so, add
> [FF]
>
> "A valid UIntBase128 encoded form MUST take up one to five bytes."
>
> Test with a UIntBase128 of six bytes, first five all have MSB set,
> sixth has MSB unset. It is too long so
> [FF] a validator passes if it checks the font as invalid
> [UA] a UA passes if it does what? Discuss.
>
> We clearly need some conformance statement on what a UA does when the
> input contains an invalid UIntBase128 form. As far as I can see, too
> long is the only detectable error condition.

What about overflow of the 32-bit result value? AFAICS, the five-byte 
sequence

   0xFF 0xFF 0xFF 0xFF 0x7F

might in principle decode to

   34359738367

which requires 35 bits as an unsigned value (0x7FFFFFFFF).

Also, should "leading zeros" (i.e. initial 0x80 bytes, which extend the 
length of the encoded form but contribute nothing to the value) be 
permitted, or treated as an error? I.e. is it legal to encode the value 
1000 as

   0x80 0x80 0x80 0x87 0x68

or MUST it be encoded in its shortest form,

   0x87 0x68

(if I've got that right)?

JK

Received on Wednesday, 3 September 2014 13:28:50 UTC