URI string substructure queries - labelling/filtering use case from Dan Brickley on 2004-09-28 (public-rdf-dawg-comments@w3.org from September 2004)

From: Dan Brickley <danbri@w3.org>
Date: Tue, 28 Sep 2004 07:29:22 -0400
To: public-rdf-dawg-comments@w3.org
Cc: danbri@w3.org, kal@techquila.com, phil@icra.org
Message-ID: <20040928112922.GB16621@homer.w3.org>

DataAccess WG., 

some more notes re http://www.w3.org/TR/rdf-dawg-uc/ and design choices.

While acknowledging http://www.w3.org/Provider/Style/URI I'd like to
report on an RDF query use case from the labelling and filtering
application area. Specifically there are a number of systems which allow
content labels to be 'attached' to every resource (typically a document)
that has a URI which begins-with, ends-with, contains or matches some
candidate string. W3C's own PICS format (precursor to PICS-NG aka RDF 
itself) has such a mechanism. I wouldn't expect DAWG to try to replicate
this entire feature of PICS, but I do suggest that the WG consider
making it possible for string match operations on URIs to be allowable.

While keying metadata off of URI structure is not to everyone's tastes,
in practice there are many sites that do organise their information in
ways that are reliably reflected into URI structure. It seems natural
that an RDF database, exposed via DAWG's QL + protocol, should be usable
within a system that exploits such regularities to associate general
labels with classes of document that share certain naming patterns. (I'd
also hope to be able to use OWL to reason about those common labels, but
that's a separate story).

Possible use case story: 
[[
XYZ has collected a very large database of Web content labels expressed in
RDF. The labels use a custom vocabulary which asociated generally applicable
content descriptions with information about the rules for applying these
general labels to specific documents. These application rules are
typically expressed using string-matches against document URIs. The ability
to manage this metadata in terms of generalisations against classes of
document is important to XYZ, since it lowers costs and increases the
likelihood that a description can be found that applies to any given
URI. Some of these labels are provided by the content creator, others
are 3rd party annotations. The intent is to store them all in an
off-the-shelf RDF system and build applications that exploit this information.

XYZ is attending an evening class on Description Logic and hopes to
eventually make use of the generalised description facilities in W3C's
OWL standard. In the meantime, millions of RDF triples have to be stored
and retrieved in an efficient manner, so XYZ looks to DAWG-compatible
RDF data storage systems. XYZ would be delighted to be able to have a
product-neutral, standard way of asking such a database questions like
"what documents have URIs that begin http://example.com/pics/adult/ ?",
"what documents have URIs that end ".png"?, "what documents have URIs
that contain the string "/adult/"?, so that such matching could be done
within the database rather than in application code.
]]

Hope this helps. IMHO being able to do this would be hugely useful,

cheers,

Dan

ps. copying Kal and Phil, whose work this relates to. All mistakes are
mine etc etc.

Received on Tuesday, 28 September 2004 11:29:22 UTC