W3C home > Mailing lists > Public > www-lib@w3.org > January to March 1999

RE: Decoding routines (here they are)

From: <Linus.Walleij@ecs.ericsson.se>
Date: Wed, 17 Mar 1999 14:35:33 +0100
To: frystyk@w3.org, www-lib@w3.org
Message-id: <81314DE3B27DD0119A7400609719CF4B04A48012@eseldnt100.ericsson.se>
> It would be really cool if you could make the converter the other way
> and make these two function public and move them to HTWWWStr.c.
> Interested?

OK here is what I cooked up for HTWWWStr.c:

/**********************************************************
 **  This routine encodes x-www-urlencoded, the string   **
 **  must be ISO 8859-1 dammit! If you compile on        **
 **  MSDOS / MacIntosh etcetera, make sure you convert   **
 **  your host specific character layouts to conform to  **
 **  to ISO 8859-1,                                      **
 **  Linus Walleij March 1999                            **
 **********************************************************/
char *HTURLEncode(char *cStrIn)
{
  char *cNew;
  char *cPtr;
  char cHexChars[] = "0123456789ABCDEF";
  size_t NewLength;

  NewLength = strlen(cStrIn);
  cNew = (char *) malloc(NewLength);
  cPtr = cNew;
  while (*cStrIn) {
    if (*cStrIn==' ') *cPtr = '+';
    else if ( (*cStrIn>='0' && *cStrIn<='9') ||
	      (*cStrIn>='A' && *cStrIn<='Z') ||
	      (*cStrIn>='a' && *cStrIn<='z'))
      *cPtr = *cStrIn;
    else {
      NewLength += 2;
      *cPtr = '\0';
      cNew = (char *) realloc(cNew, NewLength);
      cPtr = (char *) strchr(cNew, '\0');
      *cPtr = '%';
      cPtr++;
      *cPtr = cHexChars[((*cStrIn >> 4) & 0xF)];
      cPtr++;
      *cPtr = cHexChars[(*cStrIn & 0xF)];
    }
    cPtr++;
    cStrIn++;
  }
  *cPtr = '\0';
  return cNew;
}

/**********************************************************
 **  This routine decodes x-www-urlencoded, any content  **
 **  that is not valid will be thrown away, and the      **
 **  string must be ISO 8859-1 dammit! If you compile on **
 **  MSDOS / MacIntosh etcetera, make sure you convert   **
 **  your host specific character layouts to conform to  **
 **  to ISO 8859-1,                                      **
 **  Linus Walleij March 1999                            **
 **********************************************************/
char *HTURLDecode(char *cStrIn)
{
  char *cNew;
  char *cPtr;
  char loNyb, hiNyb, bytByte;
  size_t OldLength;
  size_t NewLength;

  OldLength = NewLength = strlen(cStrIn);
  cNew = (char *) malloc(NewLength);
  cPtr = cNew;
  while (*cStrIn) {
    if (*cStrIn=='+') *cPtr = ' ';
    else if ( (*cStrIn>='0' && *cStrIn<='9') ||
	      (*cStrIn>='A' && *cStrIn<='Z') ||
	      (*cStrIn>='a' && *cStrIn<='z'))
      *cPtr = *cStrIn;
    else if (*cStrIn=='%') {
      NewLength -= 2;
      cStrIn++;
      hiNyb = *cStrIn;
      hiNyb -= (hiNyb < 0x60) ? 0 : 0x20;
      hiNyb -= (hiNyb <= 0x39) ? 0x30 : 0x37;
      cStrIn++;
      loNyb = *cStrIn;
      loNyb -= (loNyb < 0x60) ? 0 : 0x20;
      loNyb -= (loNyb <= 0x39) ? 0x30 : 0x37;
      *cPtr = (hiNyb << 4) | loNyb;
    }
    cPtr++;
    cStrIn++;
  }
  *cPtr = '\0';
  if (NewLength < OldLength) cNew = (char *) realloc(cNew, NewLength);
  return cNew;
}

The routines work fine but if you want to add them to HTWWWStr.h I guess you
have to look at how I allocate / reallocate memory, as I remembered libwww
to use its own memory allocation routines. (HT_MALLOC?) I don't have a
compiling codebase here so I can't try it out unfortunately :-/ Also using
size_t as string size type could be avoided for avoiding porting problems,
int is just as good I guess.

After they've been added I'd recommend going through the query_url_encode
and form_url_encode functions in HTAccess and correct them using calls to
HTURLEncode / HTURLDecode as they are defunct as they look today.

Allright I bet someone could hack up better encode/decode routines easily,
but these work atleast...

Linus Walleij
Received on Wednesday, 17 March 1999 08:40:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:29 GMT