Competency E

Introduction

The fifth competency in the SJSU iSchool MLIS program is to "design, query, and evaluate information retrieval systems". Information retrieval (IR) is the process of finding information resources that answer a particular information need by searching a collection of stored data. An information retrieval system (IR system) is a system designed to help search and find specific information from one or more stored data repositories. Harter (1986) defined an IR system as "a device interposed between a potential end-user of an information collection and the information collection itself" (p. 2).

IR systems are composed of two basic components: 1) the database that stores the data and 2) a mechanism for retrieving information out of the database. The term database refers to the structured set of data stored in a way that facilitates information retrieval. IR systems can be analog or digital in nature. The library can be considered a type of IR system (Rubin, 2010, p. 127). Printed indexes and bibliographies are other examples of analog IR systems. Since the late 20th century computer and networking technologies have replaced many analog IR systems with digital or electronic systems. Bibliographic records have been converted into digital formats including the MARC (machine readable cataloging) and BIBFRAME (Bibliographic Framework) record formats. These digital record formats enable computers to store, connect, parse, and retrieve bibliographic information in more automated, efficient ways than possible using analog formats. The online public access catalog (OPAC) is an electronic IR system that can store and retrieve bibliographic information for use by an information organization. While originally designed as an electronic replacement of the analog library card catalog system OPACs have developed into sophisticated IR systems with improved search and discovery technologies that information organizations rely on for access to physical and electronic resources. Web search engines are another type of electronic IR system. The development and now ubiquity of the World Wide Web has afforded the creation of web search engines to help end-users search for information available on the World Wide Web.

The design, querying, and evaluation of an IR system are based upon the system's ability to match information from the IR system with the end-user's information need. Designing an IR system begins with identifying the intended end-users of the system. Knowing who will use the system leads to identifying their information needs and shapes what and how data is entered into the IR system. The design of an IR system also includes identifying and developing controlled vocabularies, keywords, and indexes of the data in the system to facilitate efficient information retrieval. Querying refers to an end-user's interaction with an IR system by entering search queries and terms relevant to their information need. The aim of an IR system is to return (or retrieve) information from the system that is most relevant to the end-user's information needs.

Evaluating an IR system assesses the effectiveness of the system to meet the information needs of end-users. An effective IR system distinguishes between relevant and non-relevant documents in the IR system's collection of data. Precision and recall are two basic measures for information retrieval effectiveness (Kerry, Kent, and Berry, 1955). Precision is the ratio of relevant items retrieved to all retrieved items. It is the number of correct results that match the search query divided by the number of all items returned by the search. Recall is the ratio of relevant items retrieved to all relevant items in the IR system's collection. It is the number of correct results divided by the number of items that should have been returned by the search query. An IR system with high precision returns more relevant items than non-relevant items. A system with high recall means that the system returns a majority of the relevant items. An IR system should optimize for precision or recall based on the information needs of the system's end-users. Some end users may want the IR system to have high recall and will tolerate low precision to ensure they get everything that could be relevant. Other end users may expect the IR system to return the most relevant items first (high precision) while disregarding about the entire set of items retrieved (low recall). Coursework and projects in the SJSU iSchool MLIS program provided opportunities to design, query, and evaluate IR systems.

Evidence

The following evidence from previous course work demonstrate experience in designing, querying, and evaluating information retrieval systems.

  1. Reflective essay about information retrieval systems
  2. Two-part database design group project
  3. Research paper about Geographic Information Systems (GIS) and information retrieval
  4. Series of assignments to practice querying various academic databases

LIBR 200: Reflective Essay on Information Retrieval Systems

To effectively design, use, and evaluate information retrieval systems requires a basic knowledge and understanding of information retrieval systems and their value within library and information science. A reflective essay exploring the definition of an information retrieval system and various examples of analog and digital systems helped establish this basic level of understanding and knowledge. The essay also provided examples of information retrieval systems that were familiar from previous experience and use. This includes a library catalog and library classification system, the phone book, printed indexes and bibliographies, internet search engines, online shopping websites, online data portals and data warehouses, and online mapping websites like MapQuest and Google Maps. Knowing and using a variety of information retrieval systems prepares the librarian or information scientist to answer the information needs of patrons and clientele of their organizations.

LIBR 202: Database Design Group Project

A two-part group project explored designing information retrieval systems. The first part of the project required creating a hypothetical collection of items useful to a specified user group and set of information needs. A database table was created to model the information stored for each item or record in the collection. The collection's unit of description was chosen for how users would search the collection's database. The group decided on creating a database to store metadata about a collection of Star Trek action figures manufactured by the fictional ACME TOY company. The first step in designing an information retrieval system is to identify the purpose of the retrieval system. This also included describing the main intended user group of the retrieval system and their information needs. A set of possible questions was formulated for potential users of the database. Based on the user group and their needs the database structure was designed. Field names and field types were chosen to capture the essential fields of data for each item in the collection. After determining the names, types, rules, and standards for each field online database software (i.e. WebData Pro) was used to create a database following this record and field structure. A set of hypothetical data was created and entered into the database. This allowed the group to test searching the database to determine how well the database met the needs of the intended user group.

The second part of the project involved evaluating another group's database design project from the class. Evaluting the database design included answering the following questions:

In addition to providing experience evaluating an information retrieval system, the second part of this project helped understand what is required to effectively design a retrieval system. A well designed information retrieval system has a clear intent and purpose. Items or records in the system are structured and described with useful information that meets the needs of the system's intended user group. The structure of the database also needs to be robust in accounting for exceptions and variations in the data that is modeled in the retrieval system. Effectively designed information retrieval systems make it easy for data entry as well as for users of the system to find information that matches their needs.

LIBR 202: Research Paper about GIS and Information Retrieval

Living in a information age with advanced digital computing and networking technologies there are massive amounts and varieties of information created each day. One type of digital information that continues to increase in quantity and value is geospatial information (also referred to as spatial or geographic information). Geospatial information is information that is tied to a physical location on earth. A Geographic Information System (GIS) is a collection of components designed to create, manage, store, and retrieve geospatial information. A GIS enables spatial analyses and data visualizations of the geographic data within the system. Understanding and evaluting geographic information systems makes it possible for librarians or information scientists to better meet the geographic information needs of their patrons and clientele.

GIS in an information organization setting enables patrons or clientele to search, find, and visualize digital maps for personal use and edification. It also provides opportunities for digital scholarship by students and faculty within an academic community setting that includes the need for exploring, analyzing, and visualizing geographic data. Information organizations can also use GIS to make their metadata repositories spatially aware where metadata contains geospatial information. For example, a library catalog that contains geographic information within bibligraphic entries could utilize GIS to help patrons locate items spatially within the library. Another application of GIS is on the World Wide Web (WWW) in the form of web mapping and web mapping services. There are also vast online repositories and portals of geographic data that vary widely in their collection, description, and organization of that data. The disparity in geographic metadata quality prevents users from finding the data they need to best answer their information query. The ability to identify and evaluate GIS as a service provided by an information organization and as an online tool for exploring and connecting geographic data within the context of the WWW is a skill needed by current librarians or information scientists. Evaluating GIS as an information retrieval system within these contexts provides opportunities to determine how GIS can best meet the information needs of the 21st century user.

INFO 244: Database Search Assignments

In an online searching class (INFO 244) a series of assignments were completed that gave hands-on experience querying various information retrieval systems. The assignments required searching against various databases available in ProQuest. Each assignment contained an information request from a hypothetical researcher to guide the information query and keywords. The first assignment involved performing searches for information related to health and medicine and included searches in the MEDLINE database. The information request included finding the five best results for results of clinical trials malaria vaccines tested on humans. The second assignment involved finding five citations of articles matching an information request in the aquatic sciences looking for strategies to manage the impact of climate change on freshwater lake ecosystems. These citations needed to include figures or tables as part of the article. The third assignment required finding five relevant citations of editorials or letters to the editor from American minority newspapers about Jesse Owens' participation and subsequent snubbing in the 1936 Olympics. The fourth assignment involved searching for citations by the information scientist Marydee Ojala using the Web of Science search platform.

For each assignment the information question had to be analyzed to understand the information need. Key terms were extracted to construct an initial boolean search query. The first query was executed and the result set was analyzed. Facets were used to help narrow the result sets down such as publishing date, subject heading, classification, and document features (i.e. tables, graphs, maps, diagrams, illustrations, and charts). The search results were analyzed and a determination was made to continue searching if the result set either was too large or didn't contain enough relevant results. Multiple queries were tried for each assignment to narrow down the search results to a manageable result set size. Bates' (1989) berrypicking-evolving search technique was used in each assignment to query and find relevant search results that answered the specified information need. These assignments gave practical experience querying information retrieval systems.

Conclusion

Librarians or information scientists learn and master the principles of information science to better meet the needs of their users. Information is of little value if it is not organized, classified, and stored for later retrieval. Information retrieval systems are a core part of information science that facilitate organizing, classifying, and storing data so that patrons and clientele of the systems can search and retrieve information they need. Coursework completed in the SJSU MLIS program demonstrates direct experience designing, querying, and evaluating information retrieval systems.

References

Bates, M. J. (1989). The design of browsing and berrypicking techniques for the online search interface. https://pages.gseis.ucla.edu/faculty/bates/berrypicking.html

Perry, J. W., Kent, A., & Berry, M. M. (1955). "Machine literature searching X. Machine language; factors underlying its design and development". American Documentation, 6(4), 242-254. http://doi.wiley.com/10.1002/asi.5090060411