Some thoughts on effective access to "primary" vs "secondary" resources, consistency of descriptions, and bootstrapping the semantic web...

Howdy folks,

I recently put together some thoughts on a few key issues that 
several recent discussions in this and other forums have touched
upon, and about which I wanted to offer some pointed comments
which I think are very important to consider.

I've touched upon all of these issues in various ways at various
times, but I think that much of the points I aimed to make have
been lost in the myriad threads and discussions, so I present
them here in (hopefully) a clear and comprehensive form.

I'm not looking to see these comments turn in to any number of long
drawn out debates which merely generate alot of "digital" heat and
go nowhere. I ask that before giving into the impulse to point out
how my views are the babbling of a deranged lunatic, that you first
make a reasonable effort to understand what I am trying to say and
give me the benefit of the doubt about being insane if you simply
do not understand something.

I'm very happy to discuss any of these issues in a friendly and
mutually respectful manner with anyone. 

This posting is as much "for the record" as it is for the (presumed)
benefit of others and for the sake of further discussion.

Here goes...  ;-)

--

1. A Bootstrapping Mechanism for the Semantic Web is Essential

If the semantic web is to become truly globally ubiquitous, with
arbitrary semantic web agents intercommunicating in a truly 
dynamic fashion, then we cannot presume that any given agent will
have any pre-existing knowledge about the resource denoted by
any particular URI it may encounter; including vocabulary terms.

And if we abandon that presumption, as we should, then there must
be an efficient, standardized solution to obtaining an answer to
the question "what does this URI mean?", from an authoritative source,
and asking that question should not require of the agent any further 
knowledge than the URI itself, and knowledge of the generic, application
agnostic, standardized machinery of the web and semantic web.

The solution offered by URIQA is to utilize the proven, globally
deployed web, and web resolvable URIs, to allow agents to ask
that question from the web authority specified in the URI itself.

There may be other, third party, sources of information about the
resource denoted by that URI, and various query interfaces with
differing degrees of functionality providing access to such 
information, and the agent may be aware of such sources, and even
utilize some of them, but such interaction with otherwise
known third party sources of information should not be a necessary
element of the bootstrapping solution, nor do they alleviate
the need for a proper bootstrapping solution based solely on the
posession of a particular URI.

--

2. Consistency in Interchange of Resource Descriptions is Essential

If the semantic web is to scale, as the web did, then there
needs to be a high degree of consistency in the form and
scope of typical responses to the semantic web request 
"tell me about this thing" (i.e. a CBD in RDF/XML) comparable 
to the high degree of consistency in the form and scope of
typical responses to the web request "give me a representation
of this thing" (i.e. a document presentable in a browser).

That doesn't mean that CBDs would be the only form of description,
no more so than all representations returned by a web server are
primarily intended for presentation in a browser; but CBDs simply
constitute a standardized default form of description when no other
more specialized form is either requested or required. However,
a completely unpredictable "free for all" or "lottery" of description
types will severely hinder the semantic web reaching a critical mass
of applications which trully facilitate the free and dynamic interaction
of arbitrary agents.

Excessive variability in default forms of description will increase the 
complexity of our agents and/or limit the scope of effective interaction 
between agents.

--

3. Primary vs Secondary Web Access to Representations and Descriptions is Critical

URIrefs with fragment identifiers pose significant practical
problems to semantic web applications attempting to employ
the existing, proven, standardized web machinery to access
knowledge about resources. 

For the sake of easier discussion (and typing), let me introduce
three new terms (the last of which we won't actually concern ourselves
with herein):

"primary URI"       

   pURI = scheme ":" hier-part

"secondary URI"     

   sURI = pURI "#" fragment 

"query URI"         

   qURI = pURI "?" query 

where 'scheme', 'hier-part', 'fragment' and 'query' are defined
in RFC 2396-bis.

A "primary URI" denotes a "primary resource", which is a resource
which may, potentially, have directly accessible representations
(by resolving the primary URI to one or more representations).

A "secondary URI" denotes a "secondary resource", as defined in
AWWW, which is only accessible indirectly, through a particular
representation of the resource denoted by the base pURI of that sURI.

This is particularly crucial if semantic web agents are to have
efficient web access to representations (or descriptions via URIQA).

For example, consider the following two URIs, each of which
denote a distinct vocabulary term; the first of which is a pURI and
the second of which is a sURI:

   http://example.com/foo/bar
   http://example.com/foo#bas

Let us also presume that these are the only URIs which are
known to denote these two particular terms.

The "secondary resource" denoted by the sURI

   http://example.com/foo#bas

is accessible via the web only indirectly, via the representation
of some other resource, namely the resource denoted by the base pURI
of the sURI. Thus, in order to access a (kind of) representation of
whatever term is denoted by
 
   http://example.com/foo#bas

one must first obtain a representation of the resource denoted by

   http://example.com/foo

and within the context of that representation, outside the
scope of the web machinery proper, on the client side,
interpret the fragment identifier "#bas" in order to obtain
a (kind of) representation of the term. Note also that the
(kind of) representation extracted is not officially considered
a representation by the web architecture (hence the qualification
'kind of').

The problem for a requesting agent, one which may very well be
running on an embedded or mobile device with limited capacities
(and that is, by the way, the beautiful vision of the ubiquitous
semantic web painted for us), is that the representation of the
resource denoted by 

   http://example.com/foo

may be several megabytes in size, constituting the complete
definition of a complex ontology consisting of thousands of
terms; yet downloading that mass of (mostly irrelevant)
information is the only way to access the needed, limited,
information required about the particular term

   http://example.com/foo#bas

Real world examples of such problems exist (Cyc, Wordnet) and
more are likely to surface as the semantic web gains critical mass.

It is simply the case that arbitrary semantic web agents simply
cannot be expected to be "force fed" huge masses of information to
obtain the small bits of information needed to accomplish
a given task.

For agents running on mobile devices, which is obviously an 
application area of particular interest to Nokia, this is a 
critical issue.

In contrast, if the term is treated as a "primary resource" by
being denoted by a primary URI

   http://example.com/foo/bar

then the semantic web agent can access representations of that
term, specifically and directly (and efficiently) independently
of any representation of any other resource.

Primary resources denoted by primary URIs are first class citizens
of the web and semantic web agents can directly interact with
representations (and descriptions) of those resources, and can
do so in an efficient manner, employing the full richness of
the web machinery.

If the existing, proven web machinery is to be re-used and employed
by the semantic web, and I think that is a widely held presumption and
desire, then this issue regarding naming methodology and the continued
use of secondary URIs for vocabulary terms, which should be considered
primary resources, is critical.

The use of secondary URIs as the official URIs denoting resources
which a large number of semantic web agents are likely to refer
to and inquire about constitutes an inefficient and non-scalable
methodology.

Secondary resources denoted by secondary URIs are second class
citizens of the web. 

Vocabulary terms should be considered first class citizens of the
web, and therefore vocabulary terms should always be denoted by
primary URIs.

It should be considered a "best practice" to avoid the use of secondary
URIs, except for particular cases where the secondary resources in
question constitute logical or functional subcomponents of the resource
denoted by the base URI (e.g. a section of a web page, or a line segment
in an SVG graphic, etc.) and access to such component secondary resources
is not expected to happen independently from access to the encompassing
resource.

Applying this "best practice" includes avoiding the use of XML Namespaces 
ending in '#', rather ending all XML Namespaces in '/' (or some other
character which does not result in the creation of secondary URIs).

--

Patrick Stickler                (+358 40) 801 9690
Senior Architect                Hatanpäänkatu 1
Forum Nokia Web Services        33900 Tampere FINLAND
Nokia Technology Platforms      patrick.stickler@nokia.com

Forum Nokia is Nokia's online community for third party developers
creating mobile applications. Registered members can find a wide
range of development tools, supporting documents, and can meet other
developers on-line. The Forum Nokia site was established in 1995
and has currently over 1 500 000 members.

Forum Nokia                   http://www.forum.nokia.com
Register Online               http://www.forum.nokia.com/register
Info on Developer Platforms   http://www.forum.nokia.com/platform
Check the Latest Devices      http://www.forum.nokia.com/devices
Get the Latest Tools          http://www.forum.nokia.com/tools
Read the Latest Documents     http://www.forum.nokia.com/documents
Sell Your Applications        http://www.forum.nokia.com/business
Get Technical Support         http://www.forum.nokia.com/support
Get information on Testing    http://www.forum.nokia.com/testing
Give Feedback                 http://www.forum.nokia.com/feedback
Search/Browse Resources       http://www.nokia.com/search3

Received on Friday, 8 October 2004 08:11:02 UTC