- From: xmlplus custodians <xmlplus.custodians@gmail.com>
- Date: Tue, 12 Apr 2011 00:24:50 +0530
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: xmlschema-dev@w3.org
- Message-ID: <BANLkTi=kV3NAuA9vqMyf5BpBTJmWCD75FQ@mail.gmail.com>
Hi Sperberg-McQueen, The concept of 3 octets accommodating 4 base64 chars, is what gave me a wrong idea that octet-lengths have to be a multiple of 3. I guess, I had not accounted for use of padding chars while encoding to base64. With the use of 1 or 2 padding chars, the base64 encoded string, when stripped-off of whitespaces, would always be of a length multiple of 4. It's clear now! Thanks for the detailed and patient reply! It was both, very insightful and helpful. -- Best Regards, Satya Prakash Tripathi On Mon, Apr 11, 2011 at 9:50 PM, C. M. Sperberg-McQueen < cmsmcq@blackmesatech.com> wrote: > > On Apr 9, 2011, at 2:49 PM, xmlplus custodians wrote: > > > Hi > > > > The XSD1.1 DataTypes spec in the base64Binary section gives following > pseudo-code for calculating octet length of a base64Binary encoded string. > > > > > --------------------------------------------------------------------------------- > > 1) lex2 := killwhitespace(lexform) -- remove whitespace characters > > 2) lex3 := strip_equals(lex2) -- strip padding characters at > end > > 3) length := floor (length(lex3) * 3 / 4) -- calculate length > > > --------------------------------------------------------------------------------- > > > > > > My understanding is that, for a base64Binary encoded string, it's lexical > length would be a multiple of 4 and it's octet length would be a multiple of > 3. > > It's been a while since I read the base64 spec, but my recollection is that > base64 encodes > octet sequences of any length, not just octet sequences whose length is a > multiple of three. > > The lexical length (ignoring whitespace) will indeed always be a multiple > of four; the > padding characters are added at the end in order to ensure that this is so. > > > > > As an example if we take a base64Binary encoded string, which doesn't > contain whitespaces or padding > > chars(=), so that lexform is same as lex3 in above code. Now let us take > a lex3 of length 10 then, > > according to above code, the octet length would be 7(not a multiple of > 4). > > Yes, precisely. If the lexical form, ignoring whitespace, is twelve > characters long > and the last two characters are equals signs, then what you have is two > clusters of four characters, each of which encodes three octets, followed > by a final cluster of two non-padding characters, which encodes the final > octet. > > > Are octet-lengths which are not multiple of 4, valid in case of > base64Binary encoded string ? > > Yes. > > > Also, what should be the formulae for calculating lexical-length from the > octet-length of a base64Binary string ? > > Should it be something like this: > > > > lexical-length := ceil( octet-length*4/3) > > > > If we take an example with octet-length=10, the lexical-length is not a > multiple of 4. > > I am clueless here. Appreciate your help on the same. > > In base64 encoding, any input octet stream is subdivided into 24-bit > (i.e. three-octet) groups, each of which is encoded in four base64 > digitis. If there are fewer than 24 bits in the final group of bits, then > padding characters are used. So if you wish to calculate the minimum > length of the base64 encoding for an arbitrary sequence of octets (i.e. > the length of an encoding without any white space), then I think the > formula you want will be 4 * ceil( octet-length / 3). It is a good idea, > though, to follow the recommendations in the RFC for adding > whitespace and newlines; it makes debugging problems easier, if > nothing else. > > You may find it helpful to read RFC 3548, which is normatively referred > to from the XSD spec. > > http://www.ietf.org/rfc/rfc3548.txt > > I hope this helps. > > > -- > **************************************************************** > * C. M. Sperberg-McQueen, Black Mesa Technologies LLC > * http://www.blackmesatech.com > * http://cmsmcq.com/mib > * http://balisage.net > **************************************************************** > > > > >
Received on Monday, 11 April 2011 18:55:20 UTC