Competency G

Introduction

The seventh competency in the SJSU iSchool MLIS program is to "demonstrate understanding of basic principles and standards involved in organizing information such as classification and controlled vocabulary systems, cataloging systems, metadata schemas or other systems for making information accessible to a particular clientele;.

The organization and analysis of information is of great significance to society and today's information age. The information, knowledge, and wisdom that is generated within societies is of little value if not for the ability to analyze, classify, and catalog this information. Classification and controlled vocabulary systems, cataloging systems, and metadata schemas are human-created tools used to help analyze the content or subject within information items. In addition to identifying the core essence of an information item, these tools reveal a network of related terms, concepts, and ideas found within and between information items. Identifying what an information item is about and its relationships to other similar items lead to more easily and effectively retrieving an item in time of need.

The Library of Congress Classification (LCC) and Dewey Decimal System (DDC) are example classification systems. These systems are used by libraries to categorize and locate library materials based on the item's subject. Items in the library are assigned a call number based on the item's subject that cooresponds to either an LCC or DDC classification. LCC is used by the United States Library of Congress as well as by many academic and research libraries. DDC is used in public and school libraries. These classification systems are standard classification systems used by information organizations to classify catalog items in the organization's collections.

The Library of Congress Subject Headings (LCSH) is an example of a controlled vocabulary (or thesaurus) that is used to identify the subject of an information item in the item's metadata or bibliographic record. These subject heading terms help to categorize and classify information items by the item's major content subjects. While the LCSH is used widely as a standard controlled vocabulary for many information organizations some organizations hold collections of a more specialized nature that can't be properly addressed by the LCSH. Where the LCSH isn't appropriate an information organization may use another controlled vocabulary or thesaurus that better meets their classification needs and is more focused on their area of expertise. For example, the U.S. National Library of Medicine created their own thesaurus of medical-specific terms called the Medical Subject Headings (MeSH).

Human language is variable, rich, and complex. This richness and complexity brings ambiguity and nuance that can lead to confusion and misunderstanding. Controlled vocabularies refers to a pre-defined set of terms that represent an information item's subject and content. This includes subject headings, classification systems, thesauri, or approved keywords (Lancaster, 1986, p. 3). Controlled vocabularies and classification systems are designed to remove this complexity by combining similar or related terms and contrasting between homographs (Lancaster, 1986, p. 7). They organize and structure information items for later retrieval. The lack of controlled vocabularies in classification activities and systems leads to "wasted effort and a certain degree of searching failure" (Cleveland & Cleveland, 1990, p. 77). The presence of these systems ensures that indexers and searchers of information items use a common set of keywords and terms to consistently and predictably categorize, search, and find information items.

Cataloging systems provide a way to organize information items and materials that leverages classification and controlled vocabulary systems. Cataloging systems include records for the information items available from the information organization. These records in the cataloging system are metadata about the information items. Metadata is data about data. It helps describe the information items available for searching and use by an information organization. The metadata records expose pre-determined access points for patrons to search against and find items that match their information needs. Example access points for a library catalog include author, title, subjects, and call number. Staff of an information organization catalog or categorize information items that the organization provides by creating metadata records that can be stored in the cataloging system. This categorization process includes using classification and controlled vocabulary systems to add structure and control that ensures items are placed in the catalog in ways that facilitates later retrieval. Patrons or clientele of an information organization perform searches using the cataloging system to find and retrieve the desired information items.

Metadata schemas are standards adopted by information organizations for describing information items and their content. These schemas provide consistentcy and predictability for how data is described and formatted. The American National Standards Institute (ANSI) and International Organization for Standardization (ISO) are information organizations created to help develop standard metadata schemas and repositories that can be used by information organizations throughout the United States and the world. Dublin Core is a standard controlled vocabulary used with metadata schemas to describe information items. This standard is adopted by many metadata schemas to allow for interoperability between metadata and cataloging systems. All of these concepts, processes, and standards impose bibliographic control over information items for the fundamental purpose of making information easily accessible. Coursework and projects in the SJSU MLIS program provided practical experience to learn the basic principles of organizing information and information items.

Evidence

The following evidence from previous course work demonstrate using and understanding the basic concepts and principles related to the organization and control of information items.

  1. Controlled vocabulary group project
  2. Cataloging exercises and assignments
  3. Metadata schema projects: Finding Aid (EAD) and Text Encoding Initiative (TEI) XML

LIBR 247 Controlled Vocabulary Group Project

The final project in LIBR 247 required constructing a thesaurus within a given knowledge domain. The project consisted of three group members. The first steps in the thesaurus construction process included identifying the thesaurus' target audience and narrowing the knowledge domain to something specific. The group decided to create a thesaurus within the knowledge domain of container vegetable gardening targeted to beginning, hobbyist, and urban gardeners growing vegetables in containers. This narrow focus helps to manage the scope of the terms included in the thesaurus so that term selection isn't too broad but the subject area isn't so narrow that the thesaurus lacks enough terms.

Next, terms for the thesaurus were extracted from online and print literature within the knowledge domain and entered into a spreadsheet. The initial list was over 400 extracted terms. Terms that didn't fall within the knowledge domain, duplicate terms, or non-relevant terms mere removed from the list. The remaining terms were put into facets and the relationships between facets and terms were analyzed. Scope notes were added to terms from a variety of sources to help clarify and limit the definition of terms.

The final thesaurus consists of 110 preferred terms. The boundaries of the thesaurus' domain was limited to vegetables, environment, pests, diseases, beneficial insects, and plant care. These are the central subject areas used to categorize and organize all the terms. Each of these facets had their own sub-facets to further categorize and classify the terms and their relationships between each other. The final thesaurus is presented using classified and alphabetical indices for the terms. The construction of the thesaurus and its organization is designed to assist users to find the terms helpful for their research. One example organizational feature are the use and UF reference are included in the thesaurus to direct a user to the correct term. Another example organizational feature is the use of broad and narrow terms to create an appropriate hierarchy across the thesaurus. This project validates the adage that constructing a thesaurus or controlled vocabulary is an art as well as a science.

LIBR 248 Cataloging Exercises and Assignments

MARC is a machine-readable cataloging record stanard used by libraries and information organizations for cataloging information items. This format stores metadata about an information item including information such as author, subject, title, publishing location and copyright date. These pieces of metadata about an item are like digital signposts that indicate to a computer or user information about the information item.

Cataloging exercises and assignments in LIBR 248 provided practical experience understanding the MARC bibliographic standard. These exercises and assignments taught the field and tag numbers most frequently used by catalogers and other information professionals as well as the general MARC syntax for each field. Each assignment required cataloging books of varying genres. This experience demonstrates the challenges and nuances involved in making decisions about how to best represent the item's bibliographic metadata in MARC format. Sometimes the information item represents a piece of information (e.g. book title) in multiple ways. These differences make it more challenging to know which representation is the best for the MARC record.

Exercises and assignments also introduced concepts of authority control records. When describing an information item or creating metadata about that item there are various pieces of information that could have various forms. Names, titles, subject headings, and geographic place names are example fields that if left freeform would lack consistency and data integrity. Authority control records establish recognized and standard forms for field values. The Library of Congress Name Authority File and the Library of Congress Subject Headings are used to standardize on the content of name fields and subject heading fields. Like controlled vocabulary design and construction cataloging is as much an art as it is a science.

LIBR 246 Metadata Schema Projects

Like data, metadata comes in many shapes and sizes. There are many metadata formats and standards to help provide structure and organization to how data is described and organized. XML (extensible markup language) is a metalanguage that gives the ability to define a markup language used to describe data. Many metadata formats manifest themselves in documents adhering to the XML specification. Text Encoding Initiative (TEI) and Encoded Archival Description (EAD) are example metadata formats (or schemas) that are used to describe information. TEI is a specification for digitally representing texts in digital form. EAD is a specification for digitally describing and encoding archival finding aids.

Two projects were completed offering experience with these metadata schemas. The first project consisted of creating an XML file using the TEI schema to digitally represent two handwritten letters from the 19th century. This project required understanding the basic structure and schema required for a TEI document. After creating a TEI file to represent a handwritten letter, the XML file was checked for being well-formed (the XML structure and syntax was correct) and validated against the schema used. The schema validation checks that the fields and values entered in the XML match the cooresponding metadata schema.

The second project created two XML files using the EAD schema to represent two finding aids.One of the finding aids needed to be transformed into a PDF (Portable Document Format) for print purposes. Another finding aid needed to be transformed into HTML (Hypertext Markup Language) format for online display and browsing by patrons. The process of transforming the EAD finding aid into other document formats required knowing the structure and standards specified in the EAD metadata schema. Both of these projects taught valuable skills working with metadata schemas.

Conclusion

Information's ultimate value lies in the ability to find and retrieve information in time of need. Information unorganized (or poorly organized) is difficult to find, retrieve, process, analyze, and understand. When information is organized, classified, categorized, and cataloged it becomes easier to find, retrieve, process, analyze, and understand. Bibliographic control mechanisms and tools used to organize, classify, catalog, and describe information items and resources are designed to make information more easily accessible to patrons and clientele. Coursework completed in the SJSU MLIS program gave first-hand experience designing and constructing a controlled vocabulary, working with current cataloging formats and standards, and learning the structure and standards defined in the TEI and EAD metadata schemas. This coursework illustrates the science as well as the art of organizing information resources.

References

Cleveland, D. B. and Cleveland, A. D. (1990). Introduction to indexing and abstracting. Englewood, Colorado: Libraries Unlimited, Inc.

Lancaster, F. W. (1986). Vocabulary control for information retrieval. Arlington, Virginia: Information Resources Press.