Re: Should (some of the) ContactField objects use URLs rather than free-form strings?

Hi Dom,

thanks for following up on this in email.

On Jun 1, 2011, at 18:52 , Dominique Hazael-Massieux wrote:
> Richard and Robin have argued that it wouldn't be possible in every case
> to turn user-entered data into URLs.

I have several concerns. One of them is indeed being only requiring translation to URIs when we know for certain that it's always possible. Address book data is often a mess. The last thing we want is data being returned as invalid URIs instead of perfectly valid arbitrary text — the former case breaks a promise.

I know that adding a |uri| attribute that is only present when the URI could be correctly inferred addresses that part. But it's still a problem. It means that we need to specify the heuristics — for they will be heuristics in some cases — that map from possibly arbitrary values to URIs. I'm not saying it's necessarily huge work, but it's extra stuff. And we also need to require that implementations validate those URIs. In some cases, that's more of a burden than meets the eye.

To be honest, as much as I like URIs, going too far with this feels like convenience to me. It's something that JS libraries can do, and if they do it poorly then it's not an interoperability problem and it's something that can be fixed in the field.

> That's probably true for ims and phone numbers:
> * ims, because the type of instant messaging is not always recorded with
> the data, and when it is, it's not necessarily in a way that can lend to
> be turned into a URI scheme (when one exists)

I think that handling IMs is very likely impossible in the general case. I'd really rather we didn't try for these.

> * phone numbers, because they need to be complete enough to be turned
> into URIs (e.g. they need the international calling code?), and they may
> also contain non phone number information (e.g. manually indicated
> extension number)

Actually, they don't. The tel: scheme allows for an awful lot of variations, and my understanding is that that includes partial numbers. That being said, it is a complicated scheme that would have to take care of things like extensions, and a bunch of other aspects that I believe are more complex than I feel comfortable with.

> It seems to me that these concerns don't apply to emails, photos, and
> urls (!).

For photos I think it's an easy win. The current draft has Base64 or URI. We could just have URI, indicating that if it's Base64 it can be a data: URI (and it could also be a blob: URI).

For the others, I'm still not convinced :)

With email, what's the value of getting mailto:foo@bar.com over foo@bar.com? The extra processing required to ensure that the former is valid in the face of user error seems a high price to pay to me. We'd also have to handle issues such as what to do if the user enters foo@bar.com?subject=blah into their address book. I'm not at all saying that's impossible, but it's a number of extra conformance requirements, more testing, etc. It does raise the bar for implementation more than it seems to be worth.

As for URLs, I don't see that it's a given. We have to think about round-tripping even if we don't support it directly (yet). There are two options to convert user-entered URLs into proper URLs when they aren't (users often drop the "http://"): modify the value directly, or keep the value as is and add a corrected uri field. Again the issues with testing, heuristics, etc. crop up. But we also have to think about updates. What if I put two completely different values in there? How does it get serialised?

All in all, given that it can be handled in script with minimal trouble, I think that we should change photos and keep the rest as is.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon

Received on Wednesday, 1 June 2011 21:28:21 UTC