next up previous
Next: Week 12 - Hypertext Up: L505 Journal Previous: Week 10 - Indexing

Week 11 - Indexing Systems: Postcoordinate Systems

3/25/00 Personal Library

I really need to organize my personal library. It's getting just large enough that I have trouble finding certain things (about 500 books). So I've been trying to decided what system to use.

My first thought was to come up with my own (enumerative) scheme. But I'm now fairly convinced that it would be better to use an existing scheme. This would be less work for me, and in the process, I would learn more about the existing scheme, so I could apply that knowledge elsewhere.

I've always hated the Library of Congress classification, with its ugly mixture of letters and numbers. So, the obvious choice is the Dewey Decimal Classification, which is widely used and fairly simple.

See this web page for an interesting essay on why you should use DDC to organize your computer. I haven't gotten to this point yet, but it's an interesting thought.

(At this point, I'm asking the class for any suggestions to compare my train of thought with theirs.)

Colon classification is interesting, but the codes can get just as long and ugly as in LCC. Here is an article on the connections between Colon classification and Yahoo! It somewhat misses the point, since there is a big difference between Colon classification's (systematic, orderly) faceted scheme and Yahoo's (unfocused, amorphous) polyhierarchy, but it's an interesting read.

My first response came back from the class. It was basically ``come up with your own scheme''. After looking over some of the more confusing items again, I still think creating my own scheme would be more trouble than it's worth. I can imagine myself looking for a particular book, and then wondering where I put it, much as I do now. I've already tried that approach for electronic documents. I have a fairly good scheme for electronic things now, but if I didn't have search facilities, I would loose a lot of things. I may end up using Dewey, and then rearrange things that I think are poorly located.

I finally read the Ranganathan article in the readings. It was very interesting...(my full thoughts are in the Week 8 section) I won't use it for my personal library, since it's far too complex for my purposes.

3/25/00 thoughts

There are a lot of tradeoffs in selection of a classification scheme. One of the big problems is that eventually, there must be a shelving order for the actual documents. No matter how documents are arranged, there will always be situations in which a different arrangement would be better. The best we can do (with physical documents) is find a shelving order that is reasonably useful, and provide multiple access points through other means (catalogs, OPACs, etc.). For a small collection, like my own, it isn't necessary to create the additional access points as long as the shelving order is good enough that we can quickly search for an item in two or three areas.

3/27/00 abstract

Aitchison, J., and Gilchrist, A. (1987). Planning and design of thesauri (p. 3-10). Vocabulary control (p. 12-22). Specificity and compound terms (p. 23-33). Structure: basic relationships and classification (p. 34-60). In Thesaurus construction: a practical manual, 2nd ed. London: Aslib.

Describes issues and heuristics for creating a thesaurus. The thesaurus should be considered in relation to its associated system and users. If possible, an existing thesaurus should be used or adapted. Indexing terms should generally be factored into preferred terms. Relationships between terms should be represented. An abbreviated notation may be used to represent the terms.

3/28/00 abstract
Eddison, B., and Batty, D. (1988). Database design: words, words, words -- descriptors, subject headings index terms. Database 11 (6), 109-113.

Presents background material about thesauri. A thesaurus is a controlled set of terms used to index information in a database. Free-text systems save on time and cost for indexing, but pass this expenditure on to the users. In the 1950's, the United States opted for simpler, computer-based indexing systems, while Europe opted for more complex, faceted indexing systems. The United States is now returning to a more structured approach.

3/28/00 abstract
Batty, D. (1989). Thesaurus construction and maintenance: a survival kit. Database 12 (1), 13-20.

Describes a process for creating a thesaurus. The range and depth of the thesaurus must be defined, based on anticipated users. Raw vocabulary must be collected from source documents. The raw vocabulary should be clustered and refined. A notation system may be used to impose an order on terms. A system for thesaurus maintenance should be developed.

3/28/00 abstract

Johnson, E. H. (1995). A hypertext interface for a searcher's thesaurus. Available on the web.

Describes a graphical interface to the INSPEC thesaurus. Users can enter keywords to search the thesaurus. The thesaurus will present a hierarchical display showing the word in relation to its broader and narrower terms, as well as a ``cloud'' displaying related terms. Users may click on any term shown in the display to navigate to that term. Preliminary user tests indicate that users prefer the related terms to the term hierarchy.

3/29/00 assignment
  LCSH ERIC Thesaurus Reader's Guide
Audience Librarians Education General
Specific Content Subject Headings Indexing Terms & relationships terms & citations
Coverage General General (Education-oriented?)  
Frequency of Cumulation Varies (yearly?) Quarterly Yearly/Quarterly
Distance from Citations 2 2 1
Coordination of Categories Post-coordinate Post-coordiante Pre-coordinate
Type of Vocabulary semi-controlled natural language? natural language, proper names
Composition of Vocab. 1-3 word terms, mix of -- and NT hyphenated phrases 1-3 word terms
Currency of Vocab. reasonable reasonable very
Consistency of Vocab. (time) many additions ?? steady additions, some focus changes
Consistency of Indexing ?? reasonable difficult to tell
Specificity of Descriptors very newspaper/conversational proper names, newspaper/magazine
Structure of Organization polyhierarchical polyhierarchical terms with see also refs.
Levels in Hierarchy 4+ 4 1
Presentation alphabetical alphabetical alphabetical
Lead-in Vocabulary yes yes yes, but not much
Syndetic Structure yes yes yes
Definitions (Scope notes) yes yes no
Strengths Very large & comprehensive Available online Combined with citations
Weaknesses Related terms not always noted, terms get lost many floating terms outside the hierarchy changes based on published content

class notes 3/29/00

Official Lecture Notes (continued)

When indexing, make sure the document is about each of the terms chosen.

Faceted and enumerative schemes both create hierarchies. The major difference is that enumerative schemes must specify all categories, while a faceted scheme allows them to be systematically created.

Using a faceted thesaurus, you can create

1.
post-coordinate system
2.
pre-coordinate subjects (possibly many subjects per document)
3.
pre-coordinate classification (one class per document)

We need to build thesauri that can standardize our representations of concepts over time. This way, the language can change, but the concepts will stay the same, so we won't lose any documents. This must be done in limited domains, since the concepts become ambiguous if we try to use too many domains at once.

Extra note: There is an article entitled ``Artificial Intelligence Meets Natural Stupidity'' (can't remember the author right now) which addresses the subject of choosing natural language names for the concepts we develop and the problems this can cause.

Check out ``Introduction to Metadata'' full book online.

After class discussions, I made some changes to the comparison table:
  LCSH ERIC Thesaurus Reader's Guide
Audience Librarians Education General
Specific Content Subject Headings Indexing Terms & relationships terms & citations
Coverage Books & Journals Books & Journals Periodical articles
Frequency of Cumulation Yearly/Quarterly/Weekly Varies Yearly/Quarterly/Monthly
Distance from Citations 2 2 1
Coordination of Categories Pre-coordinate Pre & Post-coordiante Pre-coordinate
Type of Vocabulary semi-controlled controlled natural language, proper names
Composition of Vocab. 1-3 word terms, mix of -- and NT hyphenated phrases 1-3 word terms
Currency of Vocab. reasonable reasonable very
Consistency of Vocab. (time) many additions ?? steady additions, some focus changes
Consistency of Indexing ?? reasonable difficult to tell
Specificity of Descriptors very newspaper/conversational, specific for education proper names, newspaper/magazine
Structure of Organization polyhierarchical polyhierarchical terms with see also refs.
Levels in Hierarchy 4+ 4 1
Presentation alphabetical alphabetical alphabetical
Lead-in Vocabulary yes yes yes, but not much
Syndetic Structure yes yes yes
Definitions (Scope notes) yes yes no
Strengths Very large, widely used & comprehensive Available online Combined with citations, easy to use
Weaknesses Related terms not always noted, terms get lost, indirect access to citations many floating terms outside the hierarchy, indirect access to citations changes based on published content, limited coverage


next up previous
Next: Week 12 - Hypertext Up: L505 Journal Previous: Week 10 - Indexing
Ryan Scherle
2000-06-15