Re: Thinking about fn:build-uri (PR #1388)

Sasha Firsov <suns@firsov.net> writes:
> Norm, if we start from business requirements which would materialize in acceptance criteria, your doubts would vanish.

Perhaps.

> What URL/URIs are used for?
> * For comparison to figuring out the "uniqueness" By definition that is a unique resource
> locator/identifier. 

Yes, we use them for comparison.

> * For resolving. The standards define the match of encoded and partial variations to "final",
> which is the subject for uniqueness/distinctiveness.
> * For denomination of URL/URI parts. Whether it is a schema, domain, path, query, hash -
> each(!!!) needs capability to 
>
> * get it from URL
> * split on parts, not just to ordered collection but also with semantic tagging

That’s what fn:parse-uri is trying to do.

> * For working with URL parts
>
> * CRUD operate with part, and parts set (i.e. subset substitution)  
> * during oprations ^^ apply en- & de-coding

Having constructed a map with fn:parse-uri, the CRUD operations on the parts are straightfoward enough. The fn:build-uri function is supposed to do the re-encoding.

> I hope the XML community is worthy of thoughtful API design and implementation.

I hope so to. We’re trying.

I’m not sure how those observations help answer the technical questions though.

Given: “http://example.com/path/to%2fplace?a=b#spoon!”

We need to identify the parts. We do that by decomposing it into a map.

  { "authority":"example.com",
    "path":"/path/to%2fplace",
    "scheme":"http",
    "path-segments":("","path","to/place"),
    "query":"a=b",
    "absolute":true(),
    "host":"example.com",
    "hierarchical":true(),
    "uri":"http://example.com/path/to%2fplace?a=b#spoon!",
    "query-parameters":map{"a":"b"},
    "fragment":"spoon!"
  }

For better or worse, users often provide things that aren’t exactly URIs in contexts where URIs are expected.

Given: \\host\Users\My Docs\test[$$].zipDocuments\test$$.zip

We need to identify the parts of that too:

  { "filepath":"/Users/My Docs/test[$$].zip",
    "authority":"host",
    "path":"/Users/My Docs/test[$$].zip",
    "path-segments":("","Users","My Docs","test[$$].zip"),
    "host":"host",
    "hierarchical":true(),
    "uri":"\\host\Users\My Docs\test[$$].zip"
  }

And having identified those parts, we need to consider how best to reconstruct a URI (an actual URI) from the parts. The decomposition is necessarily a bit heuristic. The recomposition isn’t heuristic, but we can make different choices and the choices we make will impact the (re)constructed URI. Some choices will work better for some users than others.

(Remember, however, that we’re offering a convenience here. A very, very useful convenience, but there’s nothing preventing a user from writing code to do *any* sort of parsing and construction that they want.)

I think the questions that have been identified are:

1. Does the user need to be able to identify a query separator character? (Historically, though it seems less common now, you’d see URIs that used “;” instead of “&” to delimit query parameters.)

2. Does the user need to be able to identify a path separator character?

3. When encoding the parts of a URI (path, query parameters, fragment identifier) back into a URI, should the characters that get encoded vary by part? For example, should “/” be encoded in the path but not in the query parameters?

The current PR says “yes” to all three questions.

My initial message on this thread said “no” to 3 (and left unmentioned 1 and 2 so I guess that’s a “yes” on those).

Christian suggested the alternative that we say “no” to 1 and 2 but “yes” to 3.

I think Christian is probably right. That would simplify things with no loss in utility for the overwhelming majority of users.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Tuesday, 3 September 2024 09:10:05 UTC