SUPARS: Intelligent Librarian

In a sunny autumn of 1970, hundreds of students and teachers from Syracuse University took turns sitting in front of a printer terminal (similar to an electric typewriter) connected to an IBM 360 mainframe on the campus of New York State. Almost no one had used a computer, let alone a computer-based information retrieval system. Their hands trembled as they touched the keyboard; later, several people reported being afraid of destroying the entire system while inputting.

The participants were conducting their first online search, entering carefully selected words to find relevant psychology abstracts in a brand-new database. They entered only one keyword or command per line, such as "motivation" on the first line, "respect" on the second line, and "L1 and L2" on the third line, in order to search for papers containing these terms. After running the query, the terminal generated an output indicating how many documents matched each search condition; users could narrow or expand the search scope and generate a list of article citations. Many participants laughed when they saw the results returned by the remote computer.

As part of a follow-up telephone survey, participants were asked to provide two or three words to describe their experience. Out of a total of 78 words provided, 21 were the same adjective: "frustrating." Participants encountered difficulties logging into the system, experienced unpredictable failures and "irrelevant output," and most importantly, didn't know "what words to use in searching." However, they also found the system interesting and exciting ("fun," "I like computers"), with 94% saying they would use SUPARS (the Syracuse University Psychological Abstracts Retrieval Service) again if it were available. Several people proposed continuing the experiment and requested funding from their departments for the project.

The majority of these academic subjects were graduate students in education, psychology, and library science, and they were part of a radical online search experiment conducted by the School of Library Science at Syracuse University. SUPARS was one of many ambitious information retrieval studies conducted on American university campuses from the late 1960s to the mid-1970s. Several factors contributed to the surge in this research. The development of computer processing speed and storage capacity allowed academic databases and catalogs to be digitized and moved to online platforms. Computer terminals were new modular devices that could be distributed throughout campuses for decentralized access to the mainframe. Additionally, funding for computer-based research from the military and industrial sectors was more abundant than ever before. With this opportunity, academic librarians utilized this expensive new technology for exploration. In turn, universities provided a non-classified environment for collaboration with technology companies and military organizations; SUPARS was sponsored by the Rome Air Development Center of the United States Air Force Laboratory.

It is easy to understand why librarians in the 1970s embarked on the search revolution. The scale of academic work was expanding, and soon there would not be enough human librarians to support all the work. However, researchers needed to go through time-consuming, labor-intensive processes that required the intervention of librarians to obtain the desired information. While academic researchers could browse new journals in their field, conducting comprehensive searches to find all previous content still required consulting reference librarians and searching for the correct Library of Congress subject headings in multi-volume handbooks. With a set of subject headings, researchers could search library catalogs, search citation indexes of journal articles, including subscription databases like Science Citation Index, and bibliographies manually created by subject librarians at their universities. Finally, they would physically locate the correct books and bound journals, including materials they deemed potentially relevant—if those books happened to be on the library shelves.

No wonder SUPARS participants found the system remarkable, despite its limitations. Given the familiarity of academic librarians with the challenges of searching, it made sense for them to design a system that bypassed subject headings and citation indexes. What is even more surprising is that of all the online search experiments happening during this period (including commercial search systems like Lockheed's Dialog, which later became a corporate product), SUPARS came closest to approximating several key features of contemporary web search, foreshadowing the protocols we rely on more than 50 years later.

SUPARS and several other nearly forgotten systems were the precursors to the contemporary search engines we have today. While the popular history of the internet celebrates the programmers of Silicon Valley—and sometimes former US Vice President Al Gore—many of the initial concepts of search came from library scientists who were concerned with the accessibility of documents in time and space. With funding from the military and industry, their advancements are evident in the current field of online information—from the general methods of acquiring and indexing full-text documents to the complex algorithms of free-text searching and utilizing previously saved searches of others, the building blocks of contemporary query expansion and autocomplete. In fact, these methods and many others developed by campus pioneers are still used by billion-dollar web search and commercial library database systems, from Google to WorldCat.

SUPARS was designed by a librarian named Pauline Atherton (now Pauline Atherton Cochrane). In 1960, at the age of 30, she began her library career as a cross-reference editor for the revised edition of the World Book Encyclopedia, ensuring comprehensive and accurate cross-links between different entries. By 1966, she was working at the Syracuse University Library and School of Library Science, and in 1968, she demonstrated the first use of online decimal classification files to aid searching. That same year, she established the first computer-based instructional laboratory at the School of Library Science (LEEP), integrating online searching into regular classroom instruction. (In the pre-internet world, "online" meant establishing networked real-time connections between large computers and other remote devices, such as terminals.)

The following year, in 1969, Atherton and her collaborator, another library science professor at Syracuse University named Jeffrey Katzer, designed SUPARS. The primary goal of the SUPARS project was to provide large-scale online searching to learn as much as possible about how users searched online, their experiences with online searching, and what they needed to search better. To do this, the team built a searchable academic content corpus for campus-wide use; over 35,000 articles from Psychological Abstracts of the American Psychological Association. This was indexed and retrievable in the SUPARS system, making it the first large-scale database available online in an uncontrolled vocabulary environment. The user base and searchable content were both significant, although on a much smaller scale and scope than today's web search.

Two decisions made by Atherton and her team made SUPARS truly innovative. First, they removed all subject headings from the Psychological Abstracts entries, allowing all words to be searched directly, except for connecting words like "and" and articles like "a" or "the." This made SUPARS the first system to search and output large amounts of free text online. (Their final report was titled "Free Text Retrieval Evaluation.") Second, they saved each SUPARS search in a parallel database that could be queried alongside the abstract itself, making SUPARS the first experiment to allow users to access and use previous searches to find alternative terms or approaches.

Each of these features alone was novel, but to understand how forward-thinking this combination was at the time, it is necessary to look at how contemporary web search services operate today. Search engines like Google and Bing index web pages using two main components: crawlers that search for new pages and periodically re-crawl pages already found, and parsers that analyze the content of pages and store the resulting information (including all the free text) in an internal database. When a user enters a search query, Google attempts to match the words and phrases in the query with the pages in its database and provides the most relevant results to the user.

In addition to the words entered by the searcher themselves, contemporary web search algorithms also consider other words closely related to the words in the search query, including synonyms (e.g., searching "bike" returns results for "bicycle" and "cycle") and other directly related words.

Most search engines also include words from other similar queries performed by others as part of their internal thesaurus, which adds search terms to the user's query. This process of including related words is called query expansion and can significantly improve the relevance of returned records. Similarly, Google and other search engines also suggest additional search terms to users through autocomplete, creating predictions based on previous searches to help users complete their queries quickly.

Thus, by empowering users to search free text directly within documents and allowing searchers to draw on previous search strategies, SUPARS foreshadowed the arrival of web search. Additionally, SUPARS upgraded its vocabulary system, which continuously adapted to user input and behavior, compared to a system with a fixed, one-time controlled vocabulary for search. The SUPARS team did not know that artificial intelligence network search algorithms would accomplish this precise task decades later, but they clearly had a sense that it would be a new and effective way to continuously update search results.

In a letter to the editor of the Journal of the American Society for Information Science in 1972, Katzer described the rationale behind the database that provided all previous search queries:

The purpose of this search database is to assist users in formulating queries to the document database (Psychological Abstracts). Since SUPARS currently uses an unrestricted vocabulary, the output from the search database can help users discover other approaches to attacking their subject in the document database: it will provide the keywords used by other subject experts and a representation of their thought processes... We believe this is the beginning of an area that has not been fully explored: the use of user intelligence to enhance machine intelligence.

It is easy to portray Atherton's team as utopian futurists, but the design of the SUPARS experiment did not have a guiding vision like the open internet. It was specifically designed for a future where librarians would be increasingly unable to personally assist researchers. Expanding the collective intelligence of others was a practical solution rather than an idealistic one.

The SUPARS team observed that due to the new computer terminal locations at Syracuse University being "far from any human expert in the area of interest to the reference librarian or user," additional sources of help would be needed, which could be found in the "human wisdom of all other users of the system." They wrote that the collective decisions of other researchers were merely substitutes for expert librarians:

Ideally, users would be able to talk with people familiar with their area of interest and get a variety of vocabulary and other hints. Then, users could develop or formulate search queries to the system that maximize the specificity or exhaustiveness desired in the retrieval.

As they used modular terminals on campus, the SUPARS team saw the impending future and what would be lost in a world based on distributed, networked computing: more and more researchers would work independently outside the library, needing support that librarians couldn't provide. Atherton's team didn't predict a world without professional librarians; they were preparing for a world where research would take place in many different locations, too far from the reference desk for them to be of assistance.

The SUPARS experiment also concluded that while leveraging others' search terms was a promising alternative for subject-based searching, it did have limitations. One of the final recommendations from SUPARS was to continue developing controlled vocabularies, explaining that "interactive free-text searching still has a need for some form of user vocabulary or synonym control." They reached this conclusion after seeing participants frequently encounter search term issues, such as in their example of searching for "people" instead of "human" and getting no results. Participants themselves also overlooked the comprehensiveness of subject headings. In fact, as part of the SUPARS survey, they were asked whether they preferred a free-text system or a system with more controlled vocabulary: 42% preferred a free-text system, 36% preferred controlled vocabulary, and 12% wanted a combination of both.

In this way, the significance of SUPARS lies in it being both a design far ahead of its time and a counterexample to the established utopian history of internet and World Wide Web technologies. In this history, the visionaries are almost always imagined as seeing a world where technology will absolutely improve human communication, intellect, and efficiency.

For example, one of the most famous figures in this history is J.C.R. Licklider, whose idea of the Intergalactic Network directly inspired the invention of ARPANET, often referred to as the "first internet." (Licklider was also deeply involved in similar campus online search experiments of the 1960s and 1970s; he both funded and advised several studies at the MIT Libraries that took place around the same time as SUPARS.)

In 1968, the year before the design of SUPARS, Licklider's paper "The Computer as a Communication Device" proclaimed, "In a few years, men will be able to communicate more effectively through a machine than face to face," and described a beneficial, happy society mediated by human-computer interaction. Licklider predicted that "the lives of online individuals will be more fulfilling" and "communication will be more efficient and productive, and therefore more enjoyable." Licklider's article is typical of the predictive and utopian writings of this genre about the potential of information technology.

The cultural celebration of visionaries like Licklider is easy to understand. However, similarly, Atherton and the SUPARS research team should also be celebrated because they saw what the future might lose and designed for it. Expanding our group of internet dreamers to include people like Atherton, we see a more nuanced picture of how different types of researchers imagined the future world. Licklider saw what we would gain from being able to communicate with anyone in the world online, while Atherton's team saw what we would lose in expert intermediaries; they designed for this cost.

In 2022 and 2023, as the first generation of generative AI search engines (including academic search engines like Elicit and Consensus) are introduced to a wider audience, both excitement and skepticism arise, and it is equally useful to analyze what researchers will lose by relying on these tools. For example, when we can simply input a research question to create an instant literature review, it is not just a huge positive leap forward. This new technology will cause a loss of foundation and background, even in making incredible new discoveries—the loss Atherton saw is different but equally intangible and far-reaching. Being able to anticipate these consequences, not to mourn them like Luddites, but to actively consider how to help researchers overcome them, is a lesson we can learn from the SUPARS team.