Type matching algorithm for web intents

Proposed language for the algorithm for "type" matching to be
implemented by the user agent. Note that "action" matching is simple:
string equivalence. "type" matching is more complicated, and is
described as follows:

Registration Algorithm:

1. The |type| field of a registration is a space-separated values type
list: a string of text in which any whitespace characters are
considered to demarcate type strings (MIME types or literals), which
themselves do not contain internal whitespace.

type-list = (whitespace* type-string whitespace*)+

type-string = mime-type OR string-literal

1.1. The type is considered to be a MIME type if it parses as one (See
BNF in RFC 2045). In addition to the explicit MIME types, a type may
be equal to the wildcard type "*/*" or "*". Caveat: since a type
cannot contain internal spaces, that means the MIME type specifiers
cannot have internal whitespace, i.e. between MIME parameters.

1.2. The type is considered a string literal if it does not parse as a
MIME type as in (1.1).

1.3. An empty type list or a type list containing only whitespace is
not a legal specifier. Any registrations with such a specifier will be
ignored. The User Agent MAY signal the failure with an appropriate
error indication.

2. When a service is registered, all types in the service registration
typelist are registered.

For invocation:

1. The type specifier may contain only one type. Specifying multiple
types is an invocation error.

2. The type specifier is parsed the same as (1.1)-(1.3) above.

For matching to services, the type specified by the client is matched
to those registered by services as follows:

1. If one is a MIME type and the other is a string literal, they do not match.

2. If both are string literals, then they only match if the sequence
of unicode code points defined is identical. (That is, the strings may
be in different encodings, or use different unicode escaping, but must
define an identical sequence of code points.) If they do not, they do
not match.

3. If both are MIME types, then the MIME type matching algorithm is
followed. The top-level type and sublevel type must match exactly, or
be represented by the MIME wild card ("*"). If there are MIME
parameters present, they must be present and match exactly, although
order does not matter.


Type parsing examples.

A registered type string of "a b c" maps to three string-literal
types: ["a", "b", "c"].

A type string of "foo/bar" looks like a MIME type, but "foo" doesn't
match a MIME top-level type, so it parses as the string literal type
"foo/bar".

A type string of "text/plain" parses as a MIME type, so it is treated
as a MIME type text/plain.

A type string of "text/*" parses as a MIME type, so it is treated as a
MIME type specifier text/* (with a wildcard).

A type string of "text/plain; encoding=utf8" parses as two types. The
first is a MIME type text/plain;, which is equivalent to text/plain
(it has no parameters). The second is the string literal
"encoding=utf8". In this case, this is probably an error on the
author's part, but this is the way it will parse.

A type string of "schema.org/Person   " parses as the single string
literal "schema.org/Person" -- the whitespace is treated as
demarcating empty types, but since there is a non-empty type in the
list, the parsed type list is ["schema.org/Person"].

A type string of "", " ", "\r\n", etc will not be registered.

Matching examples:

A registered MIME type of text/* will match text/plain,
text/plain;encoding=ASCII, and text/html.

A registered MIME type of text/plain will match text/plain,
text/plain;encoding=ASCII, but not text/html.

A registered MIME type of */* will match a passed text/plain type, but
not an object literal "foo/bar" (since that is not a legal MIME type).

A registered string literal type "schema.org/Person" will match a
passed "schema.org/Person" type, even if the string is registered as a
utf-8 string and passed as utf-16.


------

Feedback requested! Please treat this as a code review request for
adequate specificity, clarity, correctness. I think this covers the
main agreement we reached at the F2F, so I'm hoping to fine tune
details and root out any problems in the particulars.


-Greg

Received on Monday, 9 April 2012 23:58:52 UTC