Readers of Web4Lib will have seen a version of this post on that list. Take the rest of the day off!
The discussion on Web4Lib concerns the relative merits of metadata-based retrieval and full text, link-enhanced retrieval. It raises interesting questions of great import to libraries in particular and information retireval in general.
Certainly we all agree
that Google-like searching is powerful and useful. Our further hope and prejudice (given that metadata puts food on many of our tables) is that
augmenting it with metadata search will improve retrieval in some use-cases
with some resource classes.
Testing this hypothesis has always been fraught with overwhelming experimental difficulty and a substantial component of ideological bias. Indeed, as far as I know, there has never even been a serious attempt at arriving at an estimate of the cost effectiveness of MARC. Of course its good! Right? RIGHT???
As Google Print and its various spawn develop, the possibility of tractable experimentation is upon us. Students of information retrieval will know of the TREC effort: information retrieval experimentation based on formal test collections. Perhaps it is time for ReMIX: Resource Metadata and IndeXing Experiments?
What are the domains of investigation? A quick list from the top of my head:
- Nature of metadata
- Library-created versus...?
- Richness (MARC, DC, MODS, IEEE-LOM, ONIX...)
- Nature of resources
- Type (books, articles, web resources, collections...)
- Information use cases
- Scholarly discovery
- End-user medical, legal, Government information....
- A neutral home
- A standard experimental corpus, balanced (whatever that means) and freely available
- Open access indexes and linking information
- Open access metadata of various types available to all
- Open-Data repositories for the experimental results
In other words, an open-access community-based project where the gradual accretion of knowledge on the subject would help all players understand the benefits of each mode, and combined modes as well, so as to improve retrieval performance and promote the development of more powerful systems over time.