February 2008

Just how important are controlled subject vocabularies in the context of an IR?

Researchers submitting their work to IRs are presumably experts in their field, and  so should be fully conversant with the preferred terminology in their subject area. They can therefore be relied on to choose appropriate subject keywords to improve the findability of their work. Ditto, researchers looking for works by subject – presumably, in their own specialisation – should be fully conversant with the terminology; if they don’t find what they’re looking for under brown coal, for example, they should have a good enough sense of the literature to think of also looking under lignite. So controlled subject vocabularies are simply not relevant in IRs.

At least, so goes one argument. I’m not convinced. Brown coal might be the preferred term in the Victorian coal industry, but lignite is the term most commonly used elsewhere. Victorian coal researchers, well aware of this fact, are likely to try both terms; but coal researchers from other parts of the world may not necessarily be aware that one small part of the coal industry prefers the term brown coal.  At least, not unless they’re using a controlled vocabulary of some kind.

To maximise findability of individual works, IRs must use controlled subject vocabularies; ideally these vocabularies must be authoritative in their field – and either built into the IRs, or directly accessible from them. To assist with harvesting and sharing of metadata, the vocabulary needs to be explicitly specified in the record’s metadata.


… starting out a repository, and hoping to encourage users to self-submit their work.

Is it better to have a minimal number of input screens, and make things as simple as possible for the user?

Or is it better to offer as many fields as may possibly be needed, hence most likely overwhelming the user and maybe even scaring them off?

I’ve already determined which option I prefer … but it may not be compatible with possible future FRBRing of repositories, as per Andy Powell’s excellent presentation at VALA 2008.

… to customise the code to better meet your library’s needs?

… or to put up with bugs and awkward features until they’re officially fixed?

The former endangers a smooth upgrade path, and the latter fosters frustration for users.

Even if your library is lucky enough to have someone with the skills to customise the code, chances are that that person won’t always be around. And even the smallest of code fixes can seriously endanger the ease of future upgrades.

Thinking of self-submission of one’s own work into a repository and the quality of the resulting metadata, particularly subject metadata …

Any cataloguer who has ever tried to catalogue a technical report which helpfully includes user-submitted keywords will be conversant with the potential inadequacies of such keywords. It is not unusual, in my experience, for reports on highly complex subjects to include only vague keywords, such as “aspects”, “experiments” and “research”.

It is possible to build up highly complex controlled vocabulary infrastructures into repository software, so different user groups can access their own specialised vocabularies; which is great as far as that goes. But sometimes freeform keywords are the best option in a particular case. So how best to encourage users to choose high-quality, relevant keywords, without having to provide them with pages-long advice?

Now that the 23 Things project is finished, the Techapillan mind has been busy pondering the direction of this blog. To keep or not to keep? If to keep, to change or not to change?

After some deep Techapillan thought, the many fans of this blog will be delighted to know that they’ll still be able to get their occasional fix of Techapillan wisdom, albeit with a rather more focused approach.

As Techapilla’s work focus currently revolves around institutional repositories, open access and electronic resources, those are the areas most likely to be the subject of Techapilla’s much sought-after wisdom.

Welcome to the journey. May it be a productive one!