Only fetch certain formats
I'm writing a client that only needs to fetch text or html data
but I'm not sure what's the best way to screen out all the other
formats I don't need. Here's my first cut at it:
1. For every URL the client gets, examine the file extension.
If the file extension is recognized as a format the client
doesn't want, ignore it.
2. If the URL passes the 1st test, go ahead and fetch the HEAD.
3. Look at the Format info inside the anchor when the HEAD is
loaded. If the format is not the ones the client wants, stop.
4. Go ahead and GET the doc.
I'm looking for an algorithm that will work smoothly on FTP
and directory listings as well.
Am I missing anything here? Is there a better way to do it?
Thanks for any suggestions.