3/8/00 discussion
Wellisch, H.H. (1995). Index: the word, its history and meanings (p, 199-210). Indexing languages: natural and controlled (p. 214-217). Indexing from A to Z, 2nd ed. New York: H.W. Wilson. [On reserve in the SLIS Library: Z695.9 .W45 1991]
The word ``index'' is used in many different ways by different fields. The basic meaning of a reference point to information is retained through most uses of the word. The aspects of the definition that change most are the scope of the word and the type of information referenced. For librarians, an index is a comprehensive list of topics and pointers to their locations (either within a single document, or within a collection of documents). For mathmaticians, and index is usually a single number that refers to an abstract location, where the location usually holds some numerical value of interest.
The number of different meanings is very large, but not surprising. Many commonly used words have been appropriated for specialized uses, simply because the common meaning of the word serves as a convenient starting point for the new definition. For example, a person familiar with the common (librarian) definition of index can quickly understand the mathematical definition by recognizing the change in scope.
3/8/00 discussion
Rayward, W.B. (1997). The origins of information science and the International Institute of Bibliography/International Federation for Information and Documentation (FID). JASIS, 48(4), 289-300.
I'm not quite sure what to make of this. I was very surprised to find out that UDC is ``one of the earliest and perhaps grandest of modern faceted classification systems.'' Now I'll have to make an effort to learn something about UDC. The dates are confusing. This article indicates that UDC and the Dewey system were being created at roughly the same time. Otlet was influenced by Dewey in 1895 (p. 291), and the first edition of UDC was published in 1904-1907 (p. 293), only ten years later. Given this, it's very surprising that Dewey's system caught on, but faceted systems are still considered ``up and coming'' ideas. Perhaps this was because we need computer technology to implement a true faceted system?
3/8/00 discussion
Vickery, B.C. (1966). Aspects of information retrieval (p. 23-39). Faceted classification schemes. New Brunswick: Rutgers Graduate School of Library Service. [On reserve in the SLIS Library: Z696 .A1 R97 v.5]
After reading this article and looking through some of last week's recommended readings, I have to take back my previous statement about faceted systems being ``up and coming''. It seems that some types of faceted systems have been in use for quite a while. This also revises my opinion of the definition of a faceted system. Faceted systems can be strict, as in the Lego domain, or more loose, as described here (and in the Priss & Jacob article). These looser systems don't seem to have true slots and fillers, but rather several (many) optional facets, each of which may contain zero, one, or many fillers. This is just slightly more restrictive than a simple controlled vocabulary. The terms in each facet may have a hierarcical structure, and terms used in one facet are generally excluded from other facets.
3/8/00 abstract
Soergel, D. (1985). Chapter 12: Terminological control (p. 213-222). Chapter 13: Index language functions (p. 225-249). Organizing information , San Diego, CA: Academic Press. [On reserve in the SLIS Library: Z699 .S539 1985]
Discusses challenges to controlling index vocabularies. Vocabularies should be controlled because indices using uncontrolled vocabularies are difficult to search. A word may have several morphological forms, spelling variants, and synonyms that should be collapsed into one index term. A word may have serveral sense which should be expanded into several index terms.
3/8/00 abstract
Ambroziak, J., and Woods, W.A. (1998). Natural language technology in precision content retrieval. Palo Alto, CA: Sun Microsystems Laboratories, Available on the web
Describes ConceptStore, an information retrieval system that uses phrase searching and linguistic rules to find documents. ConceptStore creates a concept hierarchy based on phrases and colocation that actually occur in the documents being indexed. 90 queries on UNIX topics produced 61% correct results, as opposed to 48% for probabilistic retrieval, 44% for a commercial search engine, and 29% for standard tfidf searching. Even without morphological rules and linguistic knowledge, the ``relaxation ranking'' metric of ConceptStore produced results comparable to the commercial search engine.
3/8/00 assignment
After reading the following selection from Lancaster, evaluate the effectiveness of indexing by extraction and indexing by assignment through a comparison of natural language and controlled vocabulary.
Lancaster, F. W. (1998). Natural language versus controlled vocabulary: some general considerations. In Indexing and abstracting in theory and practice, 2nd ed. (p. 227-232). Champaign, IL: Graduate School of Library and Information Science, University of Illinois.
For indexing documents ``correctly''
Indexing by extraction uses the author's terms, which are useful for representing the true nature of the document, especially if the terms are specific to the author or have recently entered use. It also has the advantage of being easily automated. Indexing by assignment uses the terms of the classification scheme, which are helpful in placing similar documents in the same class.
For locating documents
Indexing by assignment is likely to produce better precision, since items placed in a class should all be related. This assumes that the classification structure contains a class (or multiple classes) that matches the user's query, and that the user is able to locate this class. Indexing by extraction may produce lower precision, but recall will be much higher because there are more access points to a given document.
The general population doesn't want to put in the effort to learn a controlled vocabulary. Therefore, indexing by extraction is more useful for general searching. Indexing by assignment can accommodate natural language searching if appropriate linguistic analyses and thesaurus structures are put in place.
5/1/00 class notes
For some reason, I don't have notes from this class. Either I lost the notes I took this week, or there wasn't anything interesting outside the Official Lecture Notes.