Re: About computer-optimized RDF format. from Stephen D. Williams on 2008-07-26 (semantic-web@w3.org from July 2008)

From: Stephen D. Williams <sdw@lig.net>
Date: Fri, 25 Jul 2008 17:30:27 -0700
To: Sampo Syreeni <decoy@iki.fi>
CC: Bijan Parsia <bparsia@cs.man.ac.uk>, Sandro Hawke <sandro@w3.org>, Damian Steer <pldms@mac.com>, Olivier Rossel <olivier.rossel@gmail.com>, Semantic Web <semantic-web@w3.org>
Message-ID: <488A7023.1030207@lig.net>
Sampo Syreeni wrote:
>
> On 2008-07-25, Stephen Williams wrote:
>
>> It's a little confusing I think to talk about XML vs. RDF as the 
>> encoding itself, which is most of what XML is, isn't really the 
>> issue. XML is about a basic encoding format with only a few rules 
>> about how anything should be represented, giving maximum flexibility. 
>> RDF (and OWL etc.), which already has several low-level encodings, is 
>> completely about representing ideas.  It is flexibility at a 
>> different level of abstraction.
>
> That is, it's an intermediate level metaformat just as XML is, but at 
> a lower level of abstraction, with more assumptions and structure 
> which enable you to reason with the data more easily. The highest 
> level of abstraction would probably be something like a binary file, 
> which can represent anything and everything but which doesn't allow 
> you to presume *anything* about what is being represented. The lowest 
> would be some specific, closed, completely specified, single-use file 
> format like .ico files.
>
> I think this sort of reasoning is useful because people often seem to 
> think that generality is the source of power in data representation. 
> That is not true by a long shot: sure, generality helps you at the 
> highest levels where you don't care about the specifics, but then at 
> the highest levels you can't really accomplish anything useful. You 
> have to specialize and bring in more structure/assumptions. Which is 
> what RDF does, over unstructured SGML/XML documents.
I think I agree with you, however I would have reversed highest / lowest 
"abstraction" in most of what you said.  RDF is higher level 
conceptually, has more constraints on construction in the detailed 
sense, while less constraints on construction at the semantic, meaning 
level.  XML is lower level conceptually, with fewer constraints on 
combining data, while almost paradoxically resulting, in typical usage, 
in a restrictive expression space.  The elemental example of this is 
that XML is tree-structured while RDF is graph structured.  Pointers of 
some kind have to be layered on to XML, while anything can be 
represented with an arbitrary graph capability.
>
>> Anything here can be represented in an XML encoding, just with a more 
>> sophisticated model than "typical" XML. The conversation at hand was 
>> about a much more efficient encoding than XML (or N3, et al) for the 
>> RDF semantics, however it would still be equivalent to some XML 
>> encoding.
>
> Of course. But I for one don't find much value in such musings. After 
> all, given a trivial encoding rule, you could embed every single bit 
> of human knowledge in a single real number.
>
> I think the question is not whether one metaformat is more general 
> than another. Rather it's about what you're trying to accomplish and 
> how well your chosen format/encoding deals with the problem at hand. 
> RDF deals rather well with semistructured information, its XML 
> encoding could be a bit simpler, that's why N3 and N-Triples were born 
> and why I like them, and finally XML in its full generality deals 
> better with things like structured, annotated, linear text/documentation.
>
> When you start with that sort of pragmatic reasoning, binary XML and 
> RDF become really simple to deal with. You start with an application, 
> like some space constrained mobile device where even O(n) gains can 
> potentially be significant. Then you derive a binary encoding that 
> suits you. You don't go around saying that's the be all and end all of 
> XML or RDF representation, because in other environments, it usually 
> makes more sense to just stick with
For a long, long time, my philosophy when solving a problem is always to 
consider feasible solutions that solve the widest version of the problem 
possible given knowledge and resources at hand.  We should try, 
periodically, to come up with unifying, broadly usable solutions.  If 
nothing else, failure points will help us understand many things 
better.  I've worked on and understand a fairly wide range of things.  
There's a lot I haven't examined, of course, but I'm not prairie dogging 
from doing payroll programs for 25 years.  If others need to stick to 
their knitting, that's fine.  I tend to experience bad architectures and 
practices as pain or an itch that I need to address somehow.  I'd rather 
fail than give up before starting because it was hard.
> the current representations. And if you then find yet another 
> environment where different constraints apply (say, random, OLTP-style 
> access), you find yet another representation (say, a schema under an 
> RDBMS, or perhaps an external indexing structure to a textual or a 
> binary representation of the XML or RDF you're dealing with). In any 
> case, there's no silver bullet, and so if you want to stnadardize 
> something, you'll have to start with a specific, broad enough 
> application area, with clear enough requirements, and go from there.
I don't have a problem with that.
>
>> I have often in the past explained XML as being more about a great 
>> set of idioms that had come of age, compared to older methods, than 
>> about the encoding.
>
> Quite. At the bottom, XML is a textually encoded, ordered tree with a 
> little bit of namespace management sprinkled on top, derived from a 
> single level of annotation on top of linear text. That's a proven 
> model, especially for structured text, but we shouldn't make more of 
> it than it really is.
>
>> Many of the innovations of XML could have been done with older 
>> technologies, just as many of the innovations of Java could have been 
>> implemented in prior technologies.  Similarly, but more so, RDF has a 
>> much different set of idioms than XML for representing data and 
>> solving problems, even if the result is expressed in XML. The new set 
>> of idioms are better than the old, still very good, set.
>
> And at least from my point of view, the new set answers a different 
> question, posed at a different level of abstraction. Because of that, 
> comparing the two would be a case of apples and oranges.

Absolutely, it is a different answer for a different question, however 
it can be used to solve the same AND new application-level problems.  
Some existing solutions might be able to be done better with new 
methods, which leads to the comparison.  It is more comparing the stacks 
at some point than levels in the stacks, which may be apples and oranges 
when directly juxtaposed.

sdw
Received on Saturday, 26 July 2008 00:31:08 UTC