Re: About computer-optimized RDF format. from Sampo Syreeni on 2008-07-25 (semantic-web@w3.org from July 2008)

From: Sampo Syreeni <decoy@iki.fi>
Date: Fri, 25 Jul 2008 22:20:48 +0300 (EEST)
To: Stephen Williams <sdw@lig.net>
cc: Bijan Parsia <bparsia@cs.man.ac.uk>, Sandro Hawke <sandro@w3.org>, Damian Steer <pldms@mac.com>, Olivier Rossel <olivier.rossel@gmail.com>, Semantic Web <semantic-web@w3.org>, hpti-tech@hpti.info
Message-ID: <Pine.SOL.4.62.0807252201070.14922@kruuna.helsinki.fi>

On 2008-07-25, Stephen Williams wrote:

> It's a little confusing I think to talk about XML vs. RDF as the 
> encoding itself, which is most of what XML is, isn't really the issue. 
> XML is about a basic encoding format with only a few rules about how 
> anything should be represented, giving maximum flexibility. RDF (and 
> OWL etc.), which already has several low-level encodings, is 
> completely about representing ideas.  It is flexibility at a different 
> level of abstraction.

That is, it's an intermediate level metaformat just as XML is, but at a 
lower level of abstraction, with more assumptions and structure which 
enable you to reason with the data more easily. The highest level of 
abstraction would probably be something like a binary file, which can 
represent anything and everything but which doesn't allow you to presume 
*anything* about what is being represented. The lowest would be some 
specific, closed, completely specified, single-use file format like .ico 
files.

I think this sort of reasoning is useful because people often seem to 
think that generality is the source of power in data representation. 
That is not true by a long shot: sure, generality helps you at the 
highest levels where you don't care about the specifics, but then at the 
highest levels you can't really accomplish anything useful. You have to 
specialize and bring in more structure/assumptions. Which is what RDF 
does, over unstructured SGML/XML documents.

> Anything here can be represented in an XML encoding, just with a more 
> sophisticated model than "typical" XML. The conversation at hand was 
> about a much more efficient encoding than XML (or N3, et al) for the 
> RDF semantics, however it would still be equivalent to some XML 
> encoding.

Of course. But I for one don't find much value in such musings. After 
all, given a trivial encoding rule, you could embed every single bit of 
human knowledge in a single real number.

I think the question is not whether one metaformat is more general than 
another. Rather it's about what you're trying to accomplish and how well 
your chosen format/encoding deals with the problem at hand. RDF deals 
rather well with semistructured information, its XML encoding could be a 
bit simpler, that's why N3 and N-Triples were born and why I like them, 
and finally XML in its full generality deals better with things like 
structured, annotated, linear text/documentation.

When you start with that sort of pragmatic reasoning, binary XML and RDF 
become really simple to deal with. You start with an application, like 
some space constrained mobile device where even O(n) gains can 
potentially be significant. Then you derive a binary encoding that suits 
you. You don't go around saying that's the be all and end all of XML or 
RDF representation, because in other environments, it usually makes more 
sense to just stick with the current representations. And if you then 
find yet another environment where different constraints apply (say, 
random, OLTP-style access), you find yet another representation (say, a 
schema under an RDBMS, or perhaps an external indexing structure to a 
textual or a binary representation of the XML or RDF you're dealing 
with). In any case, there's no silver bullet, and so if you want to 
stnadardize something, you'll have to start with a specific, broad 
enough application area, with clear enough requirements, and go from 
there.

> I have often in the past explained XML as being more about a great set 
> of idioms that had come of age, compared to older methods, than about 
> the encoding.

Quite. At the bottom, XML is a textually encoded, ordered tree with a 
little bit of namespace management sprinkled on top, derived from a 
single level of annotation on top of linear text. That's a proven model, 
especially for structured text, but we shouldn't make more of it than it 
really is.

> Many of the innovations of XML could have been done with older 
> technologies, just as many of the innovations of Java could have been 
> implemented in prior technologies.  Similarly, but more so, RDF has a 
> much different set of idioms than XML for representing data and 
> solving problems, even if the result is expressed in XML. The new set 
> of idioms are better than the old, still very good, set.

And at least from my point of view, the new set answers a different 
question, posed at a different level of abstraction. Because of that, 
comparing the two would be a case of apples and oranges.
-- 
Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Received on Friday, 25 July 2008 19:41:47 UTC