Re: As an aside, a possibly interesting read....

On Sep 25, 2014, at 3:58 PM, Todd Carpenter (Gmail) <tcarpenter@niso.org> wrote:

> There is a tremendous problem with distributed systems when it comes to canonical information and standard identifiers.  That being the metadata that is associated with that identifier.  An identifier is (or better put should be) just a dumb (i.e., without embedded meaning), unique set of string of characters. The structure of that string, while systematically important is beside the point. Whether an identifier is expressed as a 16-digit string, or as an URI or anything else is not finally the point.
> 
> The real power is in the associated metadata related to that identifier. While there is tremendous overhead in a centralized system, they are critically important in a well-functioning ID system. Without a controlling system, then there will be no standard set of associated metadata.  Now, how well that metadata is created, managed, curated and controlled are open questions (as Laura certainly knows), but without some authority driving compliance than inevitably there will be an increasing divergence of metadata quality, practice and interoperability.  

Thank you for clarifying the problem you’re trying to solve. That gives me two new thoughts.

First, I think it’s better to split this into two different issues; one for ID, and the other for meta-data. Solving ID alone is still quite useful and covers a lot of use cases, such as inter-publication-links or identifying publications in Open Annotations. Also, once a unique ID system is established, there’s a good possibility for anyone to build a system that associates necessary meta-data to the ID. That meta-data system can be centralized, but I’m primarily talking about ID system in my previous post.

The other thought was that, when I said “distributed” and read your comments, I felt a bit of discrepancy between what I said and what you understood. DNS is a distributed system, but it has hierarchy and identity, because there’s an authority for each level and what makes DNS distributed is actually delegations. One authority defines top-level domains, and delegates the rest to other authorities. Git is a distributed system, but there’s a central repository where people merges changes and resolve conflicts. When I said “distributed”, what I had in my mind is something like that. Does this make what I said “distributed” more accurately?

I’m just not in favor of one authority gives a unique digits or string to every single publication on the planet. It’s not just reality.

Take the example I gave; <isbn-international.org/123456789>. As long as we agreed on that it has to start with domain name the organization owns, DNS guarantees that it’s unique. Whois database will give you where to contact. The organization should do whatever they need to do make sure the rest are unique. I’m not insisting on this specific example, but this is one example of "distributed but still its uniqueness guaranteed" ID system.

I agree with you that meta-data quality is a big issue, and we’ll need a good system and/or guidelines to make it better. But as I said above, I think that’s a separate issue from building a good ID system, and I’d like to solve each separately.

/koji

Received on Thursday, 25 September 2014 11:32:24 UTC