RE: dwbp-ISSUE-46 (PIDs): How should we handle the issue of persistent URI design? [Use Cases & Requirements Document]

Phil,

> "URI persistence is a matter of policy ..."  -  http://www.w3.org/TR/webarch/#URI-persistence

>
> Having restated this, data should be identifiable *forever* - not for foreseeable future.

True, but no one can make promises forever, only for the foreseeable 
future ;-)

# Tomas
The intention must be *forever*, though it will eventually disappear: it is a matter of policy
##

  URI syntax is a different matter: one can put up with almost any 
syntax as long as it can identify the data.

And there's a can of worms. The identifier may identify the data, or it 
may identify a landing page about it or something else (and some 
communities don't understand the difference and glaze over when you try 
and say it's important).

# Tomas
Agree. This is the reason why in COMURI:
 -  "The approach is syntactic and it does not specifies the semantics of the URI ..."
 - Direct identification of variants
 - Direct identification of metadata

For example:
http://example.com/foo           # landing page
http://example.com/foo.zip     # direct identification of data
http://example.com/foo?         # metadata
##

>
> One has to assume that "web-based" means accessed with HTTP(S), so this implies that the data is always accessible with HTTP(S)  and in the *same* environment: this is not the case. For example, data accessible with:
>
>   - HTTP(S): data can be archived without the original environment - dynamic data will not be accessible

Huh?

http://example.com?service=weather&date=today


dynamic data can certainly be returned from a URI (which takes us back 
to a discussion we had ages ago about URIs being APIs).

# Tomas
I did not expressed clear enough. Though URI should be forever, this wonderful URI weather service disappear and some kind people archive it into:
 http://archive.org/example.com


The original data was produce dynamically by the "foo-weather" system behind the server and (for whatever reason) to run "foo-weather" in the archival server is not possible. Hence, it would be hard to get the data.

Archiving data is challenging, but it is a child game in comparison to maintain running legacy programs; this happen event to the experts :-) 

  -  Third World Wide Web Conference  1995 - 19 years ago : where is the data?
  -  http://info.cern.ch - about 23 years ago: try to run the original server 
##

>   - FILE: no server side processing - dynamic data will not be accessible

More true.

>
> The real world is far nastier.

Very true.

>
> In a nutshell:
>
>   - Long-term.- Think in at least 25 to 50 years: data must readable, and hence also identifiable

If we can justify those figures (or any other), I'd be happy to include 
them. The UK National Archives reckons it can't promise beyond the next 
5 years although it plans for its URIs to be as persistent as the 
original Magna Carta that it houses.

# Tomas
Good example: we need "Magna Carta URIs" :-)
Can be justify not to aim forever?
The URI is a component of long-term data preservation. It might useful to look at
   http://www.ietf.org/rfc/rfc4810.txt

##

>   - Simple.- Keep it very simple - minimal processing (this includes URI redirections) to get the data

Ideally yes. But URIs that are not URLs will need to return something 
and that might be a 303 redirect (and PLEASE let's not open up 
HttpRange-14 today... or any otehr day)

# Tomas
True: we talk most of the time about URI but in fact one should be referring to URL.
##

>   - Full life-cycle.- original site, archiving into archival sites, and offline data - http://dragoman.org/comuri.html#ultrapersistent-uri


Bear in mind my issue here is about phrasing the requirements that the 
WG needs to meet (whether by COMURI, the BP doc, the vocabs or anything 
else).

# Tomas
True. What is in scope?
The data preservation (online and offline archiving ) was taken into account in COMURI because the email exchange a few months ago.
COMURI, URI, URL, is probably the smallest part.
##

Phil

Regards
Tomas


> -----Original Message-----
> From: Data on the Web Best Practices Working Group Issue Tracker [mailto:sysbot+tracker@w3.org]
> Sent: Wednesday, October 01, 2014 9:47 AM
> To: public-dwbp-wg@w3.org
> Subject: dwbp-ISSUE-46 (PIDs): How should we handle the issue of persistent URI design? [Use Cases & Requirements Document]
>
> dwbp-ISSUE-46 (PIDs): How should we handle the issue of persistent URI design? [Use Cases & Requirements Document]
>
> http://www.w3.org/2013/dwbp/track/issues/46

>
> Raised by: Phil Archer
> On product: Use Cases & Requirements Document
>
> As of 2014-10-01, the UCR does not explicitly call for advice on URI design/design for persistence. It is, however, implied in R-PersistentIdentification which says "Data should be persistently identifiable."
>
> Do we need to add any detail to this? Or an additional requirement? Or do we think we've covered it?
>
> Context is all. In W3C space, persistent identifier means persistent URI. For some communities, that doesn't match the culture (scientific publishing for example).
>
>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/


http://philarcher.org

+44 (0)7887 767755
@philarcher1

Received on Wednesday, 1 October 2014 15:02:25 UTC