Identifiers for learning objects – a discussion paper deel 2

In mijn reactie op de discussie ga ik in op een aantal van de eerder genoemde issues, zoals wel of niet meerder ID’s per object, URN vs URL, GUIDs of niet en bekijk daarna de tien punten zoals door Andy opgesteld.
De volledige reactie in het Engels:

First, I would like to respond to some of the issues raised in the discussion so far:

* Only one or multiple identifiers per object

I feel less strong about the need for a single unique identifier for an object than Phil Barker says he does in his reply to the mail by Lorna Campbell. As long as we don’t have a system to verify an object has been modified without changing the identifier to reflect the fact that it no longer is the same object (or rather that it is a new version of the old object, in which case you might store a link back to the old id in the RELATION element as a “isversionof”). But then again, when is it a different object? Does it need a new ID if I add an annotation, or just a new version, but am I allowed to change the version number, because there is no way for me to know whether or not the other system already uses that version number. So, even if that third repository retrieves two metadata records for one resource that have identical identifiers it can’t be sure it they are duplicate metadata records. Both records live and are valid in their own universe (yes Lorna, I’m one of those that think it is so). Because of this I also don’t mind if repositories assign their own id’s to objects as long as they store the original ID so I can (also) use that to find the object. After all the original ID is also valuable metadata.

While thinking about this I compared it with the way the DNS and the WWW works. There I have an URL that points to a specific resource on the web. More than one URL may point to the same resource, but often also the same URL is redirected to a different copy of the resource which might be personalised for my region (language or even contents). High volume website redirect based on my location (determined by IP-number). Just by looking at the identifier for the resource (the URL) I’m not able to determine if different identifiers mean that it actually is a different resource, or that if I share that identifier with someone else, they would get the exact same resource. That last situation would bother me if it happened to a learning object, since there in most cases I would want to know what a learner retrieves.

* URL vs URN

An URL ‘http://www.mydomain.com/rep/id0001’ implies that you use the DNS system to resolve the IP number of the server, use the HTTP protocol to connect to port 80 on that server and request ‘/rep/id001′. An identifier that is an URL implies that by typing the identifier as an address in your browser would return the object. If that is not the case, you shouldn’t use URLs I think.

When going through a resolver, URNs get transformed into URLs that point to the object. The PURL-based Object Identifier uses a structure based on a (fixed?) Resolver URL + a namespace-identifier + a local-identifier, which is basically the same as what Ben Ryan uses (URNs that can be transformed to URL by feeding it to a resolver), except that the resolver URL here is part of the identifier. Keeping them separated makes sense though I think, because it enables me to dynamically assign a different resolver URL, i.e. as part of the base href of the page in case I’m sending the information to a browser. It would also enable me to harvest the information and store it locally and just recheck from time to time that it still is correct (again like DNS does). Then again, if I harvested an identifier from the UKOLN system, It would also provide me with the information I need to resolve it to an URL (because that information is part of the identifier), where otherwise I would have to find out what the URL for the resolver is first.

* GUIDs or not

Advantage of using GUIDs is that they are unique, disadvantage is of course that they need to be generated by the system and aren’t easy to remember. If I have two object, one with GUID “8AC504BC-654B-40A3-B392-DE798CE49892” and the other with GUID “58162400-AF40-4C47-9599-3129E554FC2C”, it would be difficult for me to tell which one is pointing to the draft mail I’m writing and which one is a project proposal I’m working on. For that I would want to see the title, not the identifier. You could compare it with the favourites in my browser. Remembering the complete URL of some pages on some sites (like http://www.cetis.ac.uk/groups/20010801162745/viewGroup) can be somewhat difficult. The title “CETIS Metadata SIG” I assigned to it when I stored it as a favourite, is much easier. As long as I don’t have to share it, that is enough, if I want to share I need both (but I don’t want to see the GUID). The problem with this discussion is that whether or not I as a user should see the GUID in the editor is something that at best you would find in the non-normative Best Practice Guides of a specification since it is implementation related.

Then the list as provided by Andy

1) Persistent:

For Learning objects, 10-15 years is probably long, for assets I can understand it. Don’t try to look too far into the future though, you won’t succeed anyhow.

2) Unique:

Agree, even with the part about more than one identifier assigned to a single object as I explained above.

3) Resolvable:

Agree, though not sure that storing the resolution service ID as part of the object ID is the best idea.

4) Usable in Web browsers:

Agree, but since the ID could be parsed to something a browser can use before sending it to the browser I don’t think it should influence the ID structure. We all think XML is good for storage, but we parse it to HTML before sending it to a browser.

5) Transportable:

Agree, though I don’t fully understand why that is different from 2)

6) Simple to assign:

Agree, but that doesn’t mean the ID has to be simple to create. This should be handled by the tool and hidden from the person creating the object.

7) Assignable in devolved environments:

Agree; GUIDs are a reasonable sure way of doing that.

8) Usable in non-digital environments:
Agree and disagree. An ISBN number can be printed, dictated over the phone, but isn’t short and doesn’t make any sense to me (though I know there is a certain structure in it, but I don’t know all the publisher or country codes used).

9) URI compliant:

Agree, no problem with this, just makes life easier

10) Free at the point of use:

Agree and disagree; it would be nice, but even when I use an URL as identifier it isn’t completely free. I have to pay for the domain name. Though not expensive, it is not free.

so far for my 2 euro cents

De reactie is overigens ook via de CETIS lijst te lezen.