W3C home > Mailing lists > Public > semantic-web@w3.org > October 2007

Re: [announce] URF: a new replacement for RDF, XML, and JSON

From: Stephen D. Williams <sdw@lig.net>
Date: Tue, 16 Oct 2007 14:58:41 -0400
Message-ID: <471509E1.4020703@lig.net>
To: Garret Wilson <garret@globalmentor.com>
CC: Semantic Web <semantic-web@w3.org>

I like it so far.  There are some areas of comments that come to mind 
initially.  I definitely have an interest in at least talking through 
some solutions and possibly helping with specification if they are not 
already solved.  I like RDF and various semantic web efforts and I don't 
want to slow anything down, but alternatives should be examined as there 
is a bit of pain here and there.

What specific improvements does URF have over RDF and RDF/*?  Have you 
built a clear comparison yet?

There are reasons to have data / knowledge in an XML format, even if the 
native / better format isn't XML.  While RDF/XML is somewhat reasonable, 
what improvements might result from URF->XURF?  You say that URF is a 
superset.  What would need to be added?

How does URF handle reification?  I've come to believe that reification 
needs to be handled better; in some sense everything should be reified 
by default in a way that doesn't require you to burst a statement into 
3+ statements just to refer to it (what I think of as "long-hand 
reification").

N- Related to reification, there are many types of properties that might 
need to be pervasively tracked in a set of statements.  These should not 
have to trigger long-hand reification and there should be some way to 
avoid statement explosion.  Examples of this might be source of a 
statement, probability or trust, date/time stamp, and similar.  Ideally, 
there would be a concise way to optionally associate such metadata 
without resorting to reification or even to separate statements.  This 
may sound like I'm trying to drag relational concepts into the Semantic 
Web, but I don't think of it that way.  It is more like giving depth to 
data that is being treated in a purely semantic web fashion, dragging 
metadata along while keeping queries or reasoning simple.  The result 
set can then be reasoned on in either the data/metadata bound or unbound 
way.  Here is some more detail along these lines that I wrote back on 
Sept. 6th:

> Why Triples and not Quads?  The response of some is that they are 
> quads, just not explicit in the typical syntaxes, except N3 where you 
> can (re)state a triple as the subject of another triple, thereby 
> meta-referencing it.  (This is still ambiguous, as noted in: [1].)
>
> In my mind, this type of quad, and the idea of named graphs, and of 
> RDF document's URL/URI as the ID of the resulting graph, all are the 
> same or overlapping concepts with a little semantic sugar.  Triples 
> are always quads where the statement "handle" is implicit.  More 
> clearly, there are two implicit things about a triple: the identity of 
> the triple (which, traditionally I think, is most clearly represented 
> by the complete value of that triple) and the context of that triple.  
> In that sense, statements are actually _quints_.  There are many 
> reasons to make statements, or otherwise draw conclusions, based on 
> the identity and context of a triple, yet there is no easy way to do 
> this in many cases and fewer ways to interchange this effectively.
>
> This has to be fixed, sooner or later.  I understand that it has taken 
> time to absorb and react to the first steps of the knowledge 
> representation capabilities and implications of the Semantic Web / RDF 
> / OWL work.  We now are increasingly bumping into the limitations of 
> simple triples.  Reification, meta-chains of statements, and (worst of 
> all) one-for-one mapping statements can all technically solve parts of 
> the "advanced" problems encountered in the real world, but they are 
> all very clumsy in practice and make search and traversal needlessly 
> complex.
>
> In some of the work I do, I need to solve problems that RDF/OWL/etc. 
> are seemingly perfect for, except that I need the following:
>
>     * Statements versioned by time (all versions in the same knowledge
>       base (KB), and the ability to reason over them by time) with
>       both happened-at and known-by timestamps.
>     * Provenance for statements and contexts, including various
>       measures of likelihood, trust, probability.
>     * Security levels, ownership, ACLs, etc.
>     * Dependency - derived from chains for tracking, explaining, and
>       cleaning up after (i.e. retraction / knowledge maintenance)
>       automated reasoning engines.
>     * Alternate versions of statements / properties from different
>       provenance or even different likelihoods or theories from the
>       same source.
>     * Views of subsets of large KBs of this data, including flat
>       temporal, series temporal, security policy, viewpoint/provenance
>       filtering/merging, etc.
>     * The ability to generate, share, and efficiently make use of a
>       "delta" or stack of "deltas" between a parent document / KB and
>       updates.  Ideally, this or similar mechanism would allow rapid
>       access to the result of combining many clumps that resulted in a
>       particular view.
>
> The resulting views are slices through the KB which can be thought of 
> as planar in a "horizontal", point in time, or "vertical", over a 
> period of time, direction through clumps of statements and their 
> versions.  The slices themselves can be simple RDF or something of 
> higher K-arity.  K-arity refers to the degree and type of data beyond 
> K3=RDF triples.
>
> Minimally, explicit quads would be a huge improvement, while implicit 
> quads would still exist in certain contexts.  A (locally or globally) 
> unique statement ID allows concise triples rather than reification and 
> a handle to indicate any provenance, context/group/URI membership, 
> etc.  Versioning with quads is doable as a new quad could have a 
> statement pointing to the old or alternate versions.  This is somewhat 
> unsatisfying because it would require analysis and maintenance to make 
> changes that should be simple "insert this triple-plus-timestamp" 
> which would, in most cases, logically replace the old version.  One 
> option is to reuse the same statement ID with a different timestamp 
> (or provenance or other K-arity attribute) and different content.  A 
> flat view sees only a single version now or at a particular point in time.
>
> A full-blown representation might have statements that include, in 
> addition to S-P-O: statement identity, context, both timestamps, 
> provenance id/context, security context, and dependency context:  
> K10.  Many of these might point to a node that might link to many 
> values and in turn be shared by many statements.  Some part of the 
> time, that may be desirable.  In some applications however, these 
> sometimes fundamental meta-properties of a statement are used 
> pervasively and cumbersome if they don't have special status.  Queries 
> and results could be greatly simplified if filtering were done in 
> layered and mostly automatic ways and results were simplified into key 
> statements with most metainformation being more subtly managed and 
> represented.  This can all be done, technically, with triples and 
> reification.  In practice however, both in-memory during queries, 
> response, iteration, and other operations and for interchange, it 
> seems much better to have key pervasive metainformation have standard 
> ontology / slots.  This could possibly to be managed as a combination 
> of tuples and context graphs (which commonize the shared 
> metainformation to reduce per-statement K-arity).  I have some SPARQL 
> extensions designed that work well with time for instance, greatly 
> simplifying certain knowledge filtering constraints.
>
> I call this set of requirements the "Polyphasic Knowledge 
> Representation Problem" and my partial solutions "Polyphasic Knowledge 
> Representation" (PKR).  (I'm open to a better name if you can 
> summarize better.  "Polyphasic" seems like a good physics analogy 
> where different versions and provenances of overlapping information 
> are available in overlapping "phases" of knowledge.  Some people think 
> it's a little too Trek-kitsch.)  Many of these may seem special-case 
> or "advanced" to many, but I feel this is where things are going.  It 
> is not hard to find direct use in a lot of this availability of data 
> and metadata for various businesses including retail analysis, 
> credit/banking, research, sales tracking and analysis, etc.
>
> Additionally, I have been active in the area of efficient (both size 
> and processing) XML interchange and representation.  This has been the 
> topic of the Binary XML (now completed) and Efficient XML Interchange 
> [2] (now in progress) working groups.  As I am now defining an 
> efficient RDF interchange and representation, the problems of what are 
> actually needed for an "advanced" and efficient solution provide key 
> requirements.  The K-arity PKR effective structure of knowledge, where 
> K={3-10}, seems to cover it.  Is there a good, strong argument against 
> this kind of representation, given that conversion to or through K3 
> should be possible?
>
> Additionally, part of my thinking and work, but not the XBC or EXI 
> working group consensus, is the idea of a type of format that is 
> directly and randomly accessible _and_ modifiable in place in a 
> reasonably efficient way, in addition to support for low-level deltas 
> and stable virtual pointers.  Knowledge representation for high 
> performance applications is the application that lead to those 
> concepts in the first place.
>
> Comments and interest are welcome.  I could use suggestions on 
> solution ideas and best venues to publish papers.
>
> [1] http://www.w3.org/TR/rdf-primer/#reification
> [2] http://www.w3.org/TR/2007/WD-exi-20070716/


sdw

Garret Wilson wrote:
>
> I'd like to announce a new semantic framework named the Uniform 
> Resource Framework (URF), an alternative to RDF, XML, and JSON. The 
> complete specification is here:
>
> http://www.urf.name/
>
> If you'd like to jump right in and try out URF, you can find an online 
> URF processor here:
>
> http://www.guiseframework.com/demo/urfprocess
>
> The online processor allows you to paste in any valid TURF (Text URF, 
> the default text format of URF) and see the resulting assertions, a 
> TURF representation, and an interactive URF resource tree. You can 
> even paste in RDF/XML, and it will be converted to URF, allowing you 
> to see directly how URF differs from RDF. I suggest copying some 
> examples from the URF specification and/or from the RDF Primer to 
> start out.
>
> I've also provided an entire open-source source code library for 
> parsing URF from TURF, parsing URF from RDF/XML, creating a Java 
> instance tree from URF, generating TURF from URF, generating URF from 
> a Java instance tree, and manipulating URF in-memory using a simple 
> but comprehensive API. This is not a set of hacks, but robust, 
> thread-safe, fully-commented production code. The source code can be 
> retrieved from the Subversion repository below, with corresponding 
> online API documentation. The Guise(TM) development library is 
> distributed with the URF library already compiled.
>
> Source: https://svn.globalmentor.com/java/src/com/garretwilson/urf/
> API: 
> http://www.guiseframework.com/api/com/garretwilson/urf/package-summary.html 
>
> Library: http://www.guiseframework.com/com.guiseframework-unlicensed.jar
>
> A few highlights from the URF specification:
>
> * Everything in URF is a resource---even a string, a character, an 
> integer, a real, a boolean, a date, or a regular expression.
> * There are all the data types you need, including binary, sets, 
> lists, maps, date time, language, and even UTC offset.
> * Resources can have scoped properties that are contextual, making 
> complex values an easy task.
> * Resource properties can be ordered (solving the vCard name problem, 
> among many others).
> * Say good-bye to untyped strings. URF provides classes covering 
> resource with existing string-based types(such as the built-in 
> language class, which uses RFC 4646 language tags, e.g. 
> «"en-US"(urf.Language)»), as well as an enum facility for creating 
> your own such classes.
> * A resource can be represented in TURF the same regardless of its 
> context.
> * Symbols in TURF have the meanings you've come to expect: <> is a 
> URI, "" is a string, # is a number, [] is a list, {} is a set, // is a 
> regular expression, and = assigns properties. One of many TURF 
> innovations: «URI» indicates the resource identified by the URI <URI>.
> * There's even support for programming language initializers.
>
> A little background: I've long been an advocate of RDF. On the Open 
> eBook Forum as early as 1999-04-27, before I fully understood RDF, I 
> was encouraging Microsoft to use RDF for OEBF metadata. Later I 
> created an RDF-based packaging format and led a long (but 
> unsuccessful) campaign to get it into the OEB Publication Structure. 
> (This format, XPackage, does appear in the RDF Primer, however.) One 
> of my last OEBF discussions was an OEBF member (who is now a member of 
> semantic-web) trying to convince him that RDF would be better than XML 
> Schema as a basis for OEB. I advocated the conversion of vCard to RDF, 
> and even helped edit the new W3C vCard RDF specification. I've created 
> several RDF ontologies, including MAQRO and PLOOP. I had based the 
> configuration files of my Guise(TM) Internet application framework on 
> RDF, and my latest unannounced Internet project was using RDF at its 
> very core.
>
> As I've worked with RDF over the years, numerous problems with RDF 
> have come to light, however. These include:
>
> * A huge disparity between how resources and literals are represented 
> by the framework.
> * Redundant semantics (e.g. string-typed literals and plain literals 
> with no language tags; rdf:type and the typed literal datatype).
> * An inability to represent property values that may appear multiple 
> times but for which order is important without resorting to list-like 
> classes.
> * No way to represent values only valid in certain contexts. (The 
> plain literal language tag is an exception, but is hard-coded into the 
> framework for one special case.)
> * A lack of rigorous definitions for namespaces and namespace URI 
> formation, leading to namespace URIs with ending fragment identifiers, 
> as well as some resource URIs for which the namespace URI cannot be 
> determined algorithmically.
> * A horribly inadequate high-profile serialization format, RDF/XML 
> (although admittedly there are alternatives).
>
> So I have provided my own solution, fixing RDF's problems and updating 
> vCard all in one go. URF, together with its default representation 
> format, Text URF (TURF), it is a better RDF, data-oriented XML, and 
> JSON all in one package. I've already converted Guise(TM) to use URF, 
> and it's refreshingly elegant and consistent. (The online URF 
> processing demo above uses Guise.)
>
> Here's just a quick example of URF encoded in TURF:
>
> *declare and label the example and foaf namespaces*
> |example|<http://example.com/example>,
> |foaf|<http://xmlns.com/foaf/0.1/>,
> *describe a FOAF person identified by the URI 
> <http://example.com/example#janedoe>*
> example.janedoe(foaf.Person):
>  example.name="Jane Doe",
>  example.birthday=@1980-01-01,
>  example.salary=#1000000:
>    example.currency~«"usd"(example.Currency)»
>  ;
> ;
>
> The above is semantically identical to the following long-form TURF 
> without comments:
>
> «http://example.com/example#janedoe»:
>  «http://www.urf.name/urf#type»=«http://xmlns.com/foaf/0.1/Person»,
>  
> «http://example.com/example#name»=«info:lexical/http%3A%2F%2Furf.name%2Furf%23String#Jane%20Doe», 
>
>  
> «http://example.com/example#birthday»=«info:lexical/http%3A%2F%2Furf.name%2Furf%23Date#1980-01-01», 
>
>  
> «http://example.com/example#salary»=«info:lexical/http%3A%2F%2Furf.name%2Furf%23Integer#1000000»: 
>
>    
> «http://example.com/example#currency»~«info:lexical/http%3A%2F%2Fexample.com%2Fexample%23Currency#usd» 
>
>  ;
> ;
>
> And for comparison with RDF, here are two sets of identical 
> information, first in RDF/XML and then in TURF. (Paste in the 
> information at http://www.guiseframework.com/demo/urfprocess and see 
> for yourself.)
>
> <?xml version="1.0"?>
> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]>
> <rdf:RDF
>  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>  xmlns:dc="http://purl.org/dc/elements/1.1/"
>  xmlns:example="http://example.com/example#"
>  xmlns:foaf="http://xmlns.com/foaf/0.1/"
>  xmlns:xhtml="http://www.w3.org/1999/xhtml"
> >
>  <foaf:Person rdf:about="http://example.com/example#janedoe">
>    <foaf:nick xml:lang="pt-BR">Janinha</foaf:nick>
>    <example:age rdf:datatype="&xsd;integer">23</example:age>
>    <example:birthdate 
> rdf:datatype="&xsd;date">1980-04-05</example:birthdate>
>    <example:motto rdf:parseType="Literal">Do it. Do it 
> <xhtml:em>right</xhtml:em>.</example:motto>
>    <example:favoriteSites rdf:parseType="Collection">
>      <rdf:Description rdf:about="http://www.globalmentor.com/"/>
>      <rdf:Description rdf:about="http://www.garretwilson.com/"/>
>    </example:favoriteSites>
>    <example:possibleVacationDestinations>
>      <rdf:Alt>
>        <rdf:li>Paris</rdf:li>
>        <rdf:li>Rome</rdf:li>
>      </rdf:Alt>
>    </example:possibleVacationDestinations>
>  </foaf:Person>
> </rdf:RDF>
>
> |content|<http://urf.name/content>,
> |dc|<http://purl.org/dc/elements/1.1/>,
> |example|<http://example.com/example>,
> |foaf|<http://xmlns.com/foaf/0.1/>,
> |urf|<http://urf.name/urf>,
> example.janedoe(foaf.Person):
>  example.age=#23,
>  example.birthdate=@1980-04-05,
>  example.favoriteSites=
>  [
>    «http://www.globalmentor.com/»,
>    «http://www.garretwilson.com/»
>  ],
>  example.motto="Do it. Do it <xhtml:em 
> xmlns:xhtml=\"http://www.w3.org/1999/xhtml\">right</xhtml:em>.":
>    
> content.type~«"text/xml-external-parsed-entity/xml-external-parsed-entity"(urf.MediaType)» 
>
>  ;,
>  example.possibleVacationDestinations=
>  {
>    "Paris",
>    "Rome"
>  },
>  nick="Janinha":
>    dc.language~«"pt-BR"(urf.Language)»
>  ;
> ;
>
> If there are any questions or comments; or if you find problems in the 
> specification, code, or online processor, let me know.
>
> Sincerely,
>
> Garret

-- 
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-371-9362C 703-995-0407Fax 20147 AIM: sdw
Received on Tuesday, 16 October 2007 18:58:42 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:18 GMT