Re: Fwd: Re: Document fragment vocabulary from Erik Wilde on 2011-08-30 (uri@w3.org from August 2011)

From: Erik Wilde <dret@berkeley.edu>
Date: Tue, 30 Aug 2011 09:41:27 -0700
To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>, uri@w3.org
CC: John Cowan <cowan@mercury.ccil.org>, Michael Hausenblas <michael.hausenblas@deri.org>
Message-ID: <4E5D12B7.5070003@berkeley.edu>

hello.

On 2011-08-29 10:08 , Sebastian Hellmann wrote:
> Maybe http://en.wikipedia.org/wiki/Text_file
> is closest to your definition of text, i.e. what can be edited in a text
> editor.

in that case XML would be plain text, which does not make a whole lot of 
sense. XML is a tree which happens to be text-encoded, but there is a 
reason why all XML technologies are based on the tree (XDM) and not on 
the text serialization. if something has a text-based serialization 
that's convenient, but if the standard application-level access to that 
data uses parsing into some form of higher-level data structure, then 
it's not plain text anymore.

> I would argue this directly.
> If e.g. file://myfile.csv or file://myfile.xml
> have a syntax error (not well-formed) then #line=10,11 or #range=88,105
> will perform much better than CSV specific things or XPath, which do not
> work any more.

that simply depends on how you define "working". ranges/lines always 
select something, but not necessarily what you wanted them to select. 
the fact that (some) fragment identifiers can break is a good thing, in 
the same way as it is good that the web has 404s. in decentralized 
systems things change and break and you have to deal with it. if an XML 
document is broken, you cannot feed it into an XML pipeline, and 
therefore it's just not suitable for processing anymore.

> Furthermore, it will be much more interoperable as implementors could
> implement fragment identification once and it will work for many other
> formats.

how would that work? even if you had some cross-media-type fragment 
identifiers, the actual mapping of identifiers to fragments would need 
to be implemented for each individual media type.

> So there is another usefulness to it. I agree that matching the semantic
> model has certain benefits, reusing general Fragment Ids , however,
> should also be considered.

it's a good idea in theory, but very hard in practice. pretty much the 
only thing you can probably do would be to have ids, and even then the 
lexical structure of these probably would start to interfere badly with 
some of the targeted media types. i think there's an important reason 
why cross-media-type fragment identifiers never got off the ground: it 
would make the decentralized nature of media type definition much harder 
(they would need to coordinated to support fragment identifiers of a 
certain kind), and it would be impossible to enforce retroactively.

this is just my opinion, of course, and i am looking forward to see what 
you will end up doing. cheers,

dret.

-- 
erik wilde | mailto:dret@berkeley.edu  -  tel:+1-510-6432253 |
            | UC Berkeley  -  School of Information (ISchool) |
            | http://dret.net/netdret http://twitter.com/dret |

Received on Tuesday, 30 August 2011 16:41:09 UTC