Quite un-apropos of an earlier post on Library 2.0, I’ve just read Steven Johnson’s Everything bad is good for you. The basic premise of the book is that - contrary to generally held assumptions - today’s popular culture in the form of TV, video games and the Internet is actually more intellectually stimulating and socially-expansive than the popular culture of days past. TV shows, for example, have become more complex, with multiple plot-lines, layers of subtlety, and hidden references to movies or contemporary events. Social networks have built up around specific shows, where plots and subplots and relationships are dissected in infinite and miniscule detail. The outcome of all this is that humans have become more intelligent over the years - what Johnson calls the Flynn effect -  as our brains are exercised and stimulated in increasingly complex ways.

Johnson acknowledges that not all popular culture necessarily has redeeming qualities, just as not all popular culture of the past did. I’d have liked to have seen more discussion on the seamy trend of some TV shows and video games - Johnson just pays lipservice to that. But overall, a great read.

So, Johnson’s book and Library 2.0? Johnson talks about how social networks, a fundamental precept of Library 2.0,  operate; although the book was written in 2005, so doesn’t really pay justice to the massive growth in social networking that has occurred over the past couple of years.

More particularly, Johnson’s book and video gaming in libraries? Has Johnson convinced me to change my mind, that gaming nights (as opposed to having video games for loan) have no place in libraries? Not at all. But I might be a little more flexible on the concept of libraries installing video games on library computers and allowing patrons to actually play, rather than just try them out before borrowing.

Having spent a fair bit of time the past few weeks checking out other institution’s IRs, one thing is clearly evident - very few are using controlled subject vocabularies, except in the most rudimentary way. Most of the Australian IRs are using the Australian Research Council’s Research Fields, Codes and Disciplines (RFCD) codes. Logical, since research activity must be reported by these codes under the Higher Education Research Data Collection (HERDC) scheme; and laudable since they provide a common search point across Australian IRs.

Most IRs are also using user-suggested keywords, sometimes (but not often) supplemented by metadata specialists. Very few are using formal controlled vocabularies apart from RFCD. This is understandable, as implementing controlled vocabularies in IRs can be quite a complex undertaking.

To enumerate just a few of the problems -

  • Which controlled vocabulary to use? Different disciplines may have different preferences.
  • Not all repository software supports the building in of controlled vocabularies; so how to ensure users use recommended vocabularies in this situation?
  • If multiple vocabularies are supported by the IR (Fez comes to mind here) , how to manage them and their different user groups without overly complicating repository administration?
  • How to ensure that users select appropriate terms? One user’s “car” is another user’s “1925 Ford Model T Tudor sedan”.
  • If considering using metadata specialists to vet user-submitted terms, how to resource such a labour-intensive task, especially for potentially high volume submissions?
  • How to balance the need for making controlled vocabularies compulsory with user frustration when encountering required fields <link to follow>

So the challenge for respositories is to determine not just whether fully-fledged controlled subject vocabularies are worth using in (and building into) their IRs, but if so, which ones, and the best way to implement them with a limited amount of resources and without alienating users and compromising usability.

Just how important are controlled subject vocabularies in the context of an IR?

Researchers submitting their work to IRs are presumably experts in their field, and  so should be fully conversant with the preferred terminology in their subject area. They can therefore be relied on to choose appropriate subject keywords to improve the findability of their work. Ditto, researchers looking for works by subject - presumably, in their own specialisation - should be fully conversant with the terminology; if they don’t find what they’re looking for under brown coal, for example, they should have a good enough sense of the literature to think of also looking under lignite. So controlled subject vocabularies are simply not relevant in IRs.

At least, so goes one argument. I’m not convinced. Brown coal might be the preferred term in the Victorian coal industry, but lignite is the term most commonly used elsewhere. Victorian coal researchers, well aware of this fact, are likely to try both terms; but coal researchers from other parts of the world may not necessarily be aware that one small part of the coal industry prefers the term brown coal.  At least, not unless they’re using a controlled vocabulary of some kind.

To maximise findability of individual works, IRs must use controlled subject vocabularies; ideally these vocabularies must be authoritative in their field - and either built into the IRs, or directly accessible from them. To assist with harvesting and sharing of metadata, the vocabulary needs to be explicitly specified in the record’s metadata.

… starting out a repository, and hoping to encourage users to self-submit their work.

Is it better to have a minimal number of input screens, and make things as simple as possible for the user?

Or is it better to offer as many fields as may possibly be needed, hence most likely overwhelming the user and maybe even scaring them off?

I’ve already determined which option I prefer … but it may not be compatible with possible future FRBRing of repositories, as per Andy Powell’s excellent presentation at VALA 2008.

… to customise the code to better meet your library’s needs?

… or to put up with bugs and awkward features until they’re officially fixed?

The former endangers a smooth upgrade path, and the latter fosters frustration for users.

Even if your library is lucky enough to have someone with the skills to customise the code, chances are that that person won’t always be around. And even the smallest of code fixes can seriously endanger the ease of future upgrades.

Thinking of self-submission of one’s own work into a repository and the quality of the resulting metadata, particularly subject metadata …

Any cataloguer who has ever tried to catalogue a technical report which helpfully includes user-submitted keywords will be conversant with the potential inadequacies of such keywords. It is not unusual, in my experience, for reports on highly complex subjects to include only vague keywords, such as “aspects”, “experiments” and “research”.

It is possible to build up highly complex controlled vocabulary infrastructures into repository software, so different user groups can access their own specialised vocabularies; which is great as far as that goes. But sometimes freeform keywords are the best option in a particular case. So how best to encourage users to choose high-quality, relevant keywords, without having to provide them with pages-long advice?

Next Page »