Sunday, January 2, 2011

Next generation Library Systems (Nov. 16, 2007)

The problem
With the backdrop of the widely touted lessons of Amazoogle—an expression I can barely stand to write—three of the more interesting emerging developments of late have been OCLC’s WorldCat Local, Google Book Search, and Google Scholar. As Lorcan Dempsey argued, the "massive computational and data platforms [of Google, Amazon and EBay] exercise [a] strong gravitational web attraction," a sort of undeniable central force in the solar system of our users’ web experience. What has happened with WorldCat Local, Google Book Search and Google Scholar has extended that same sort of pull to key scholarly discovery resources. No one needed the OCLC environmental scans to be reminded that our users look to Google before they turn to the multi-million dollar scholarly resources that we purchase for them, and everyone was aware that Amazon satisfied a broad range of discovery needs more effectively than the local catalog. Now, however, mainstream “network services” like Amazon and Google web search, deficient in their ability to satisfy scholarly discovery, are complemented by similarly “massive computational and data platforms” that specialize in just that—finding resources in the scholarly sphere. These forces, and perhaps more like them in the future, should influence the way that we design and build our library systems. If we ignore these types of developments, choosing instead to build systems with ostensibly superior characteristics, systems that sit on the margins, we effectively ensure our irrelevance, building systems for an idealized user who is practically non-existent.

Our resources, skills and investments have helped to create an opportunity for us to shape a next generation of library systems, simultaneously cognizant of the strong network layer and our needs and responsibilities as a preeminent research library. At Michigan, we have designed and built our past systems, each in partial isolation from the other system, reflecting the state of library technology and our response to user needs. We were not wrong in the way that we developed our systems, but rather we were right for those times. In building things in this way, we have developed an LMS support team with extraordinary talent and responsiveness, a digital library systems development effort that blazed trails and continues to be valued for the solidity of its product, and base-funded IT infrastructure that is utterly rock-solid--all great, but generally as independently conceived efforts.[1] What libraries like ours must do now is reconceive our efforts in light of the changed environment. The reconceptualization should, as mentioned, not only be built with an awareness of the new destinations our users choose, but also with a recognition that we have a special responsibility for the long-term curation of library assets. Even at its most successful, Google Scholar does not include all of the roughly $8m in electronic resources that we purchase for the campus, and Google Book Search is not designed to support the array of activities that we associate with scholarship.

Knowing that we must change where we invest our resources is one thing; knowing where we must invest is another. I don’t believe I should (or could) paint an accurate picture of the sorts of shifts we should make. On the other hand, I can lay out here a number of key principles that should guide our work.

1. Balanced against network services
: I believe this is probably the most important principle in the design of what we must build. We must not try to do what the network can do for us. We must find ways to facilitate integration with network services and ensure that our investment is where our role is most important (e.g., not trying to compete with the network services unless we think we can and should displace them in a key area). For example, we have recognized that Google will be a point of discovery, and so rather than trying to duplicate what they do well for the broad masses of people, we should (1) put all things online in a way that Google can discover; and (2) because we recognize that Google won’t build services in ways that serve all scholarly needs, work to strategically complement what they do. In the first instance (i.e., making sure that Google can discover resources), we will always need to block them, for legal or other reasons, from discovering content.[2] These types of exceptions should add nuance to what we do in exposing content. In the second instance, when it comes to building complementary services, we’ll need to be both smart (and well-informed) and strategic.

2. Openness: What we develop should easily support our building services and, even more importantly, should allow others to build them. It should take advantage of existing protocols, tools and services. Throughout this document, I want to be very clear that these principles or criteria don’t necessarily point to a specific tool or a specific way of doing things. Here, I would like to note that the importance of openness, though great, does not necessarily point to the need to do things as open source. As O’Reilly has written in his analysis of the emergence of Web 2.0, this is what we see in Amazon’s and Google’s architectures, where the mechanisms for building services are clearly articulated, but no one sees the code for their basic services: the investment shifts from shareable software to services. Similarly, our being open to having external services built on top of our own should not imply that our best or only route is open source software. What is particularly important is the need to have data around which others would like to build tools and services: openness in resources that few wish to include is really only beautifying a backwater destination.

3. Open source: Despite what I noted above about openness, we should try, wherever possible, to do our work with open source licensing models and we should try to leverage existing open source activities. In part, this is simply because, in doing so, we’ll be able to leverage the development efforts of others. We should also aim for this because of the increasing cost of poorly functioning commercial products in the library marketplace. Note, though, that when we choose to use open source software, it’s important to pick the right open source development effort—one that is indeed open and around which others are developing. Much open source software is isolated, with few contributions. We should aim for openness in our services over slavish devotion to open source. We should also choose this route when we can simply because it's the best economic model for software in our sphere.

4. Integration: Tight integration is not the most important characteristic of the systems we should build, nor should this sort of integration be an end in itself; however, we have an opportunity to optimize integration across all or most of our systems, making an investment in one area count for others. In Michigan’s MBooks repository, we have already begun to demonstrate some of the value in this type of integration by relying on the Aleph X-Server for access to bibliographic information, and we should continue to make exceptions to tighter integration only after careful deliberation. A key example is the use of metasearch for discovery of remote and local resources: we should need to address only a single physical or virtual repository for locally-hosted content. We should give due consideration to the value of “loose” integration (e.g., automatically copying information out of sources and into target systems), but the example of the Aleph X-Server has been instructive and shows the way this sort of integration can provide both increased efficiency and greater reliability in results.

5. Rapid development: If we take a long time to develop our next generation architecture, it will be irrelevant before we deploy it. I know this pressure is a classic tension point between Management and Developers: one perspective holds that we’re spending our time on fine-looking code rather than getting a product to the user, and the other argues that work done rapidly will be done poorly. This dichotomy is false. The last few years of Google’s “perpetual beta” and a rapidly changing landscape have underlined the need to build services quickly, while the importance of reliability and unforgiving user expectations have helped to emphasize the value of a quality product. We can’t do one without the other, and I think the issue will be scaling our efforts to the available resources, picking the right battles, and not being overambitious.

These sorts of defining principles are familiar and perhaps obvious, but what is less obvious is where all of this points. Although there are some clear indications that these sorts of principles are at play in, for example, the adoption of WorldCat Local or the integration of Fedora in VTLS’s library management system, there are also contradictory examples (e.g., the rush to enhance the local catalog, and many more silo-like systems like DSpace), and I’ve heard no articulations of an overarching integrated environment. If we undertake a massive restructuring of our IT infrastructure rather than strategic changes in some specific areas, or tweaking in many areas, it may appear to be an idiosyncratic and expensive development effort that robs one's larger library organization of limited cycles for enhancements to existing systems. On the other hand, if we don’t position ourselves to take advantage of the types of changes I mentioned at the outset, we will polish the chrome on our existing investments for a few years until someone else gets this right or libraries are entirely irrelevant. Moreover, if we make the right sorts of choices in the current environment, we should also be able to capitalize on the efforts of others, thus compounding the return on each library’s investment. And of course, situating this discussion in a multi-institutional, cooperative effort minimizes the possibility that building the new architecture robs our institutions of scarce cycles.

It’s important, also, to keep in mind that this kind of perspective (i.e., the one I’m positing here) doesn’t presume to replace our existing technologies with something different. Many libraries have made many good choices on technologies that are serving their institutions well, and to the extent that they are the best or most effective tool for aligning with the principles I’ve laid out, we should use them. The X-Servers of Aleph and MetaLib are excellent examples of tools that allow the sort of integration we imagine. At UM, our own DLXS and the new repository software we developed are powerful and flexible tools without the overhead of some existing DL tools. But in each case, it may make more sense to migrate to a new technology because we are elaborating a model of broader integration (both locally and with the ‘net) that others may also use. Where there is a shared development community (e.g., Fedora, Evergreen or LibraryFind), we can benefit from a community of developers. In all of this, we’ll need a strategy, and a strategy that remains flexible as the landscape changes.

It’s time to see our environment as being comprised of a set of inventory management responsibilities (both print and digital, both local and remote) that leverages a growing and maturing array of network services so that our users can effectively discover and use the resources available to them. I think that requires a change in the way we think about our technologies and a much more strategic arrangement of those technologies in relation to each other. We may be stuck with a bunch of local print “repositories” because of the nature of print and the history of library development. That’s not the case for our digital repository, however. On top of this, we need to conceptualize the sorts of services we need (e.g., ingest, exposure, other types of dissemination, archiving, etc.) and the tools that can best accomplish these things.

[1] Incidentally, I also believe that Michigan’s organizational model, comprised as it is of five distinct IT departments, is ideally suited to building the next generation of access and management technologies. Core Services should continue to provide a foundation of technology relevant to all of our activities, and should continue to develop and maintain system integration services used by all of the Library’s IT units. Library Systems will need to continue to support operational activities such as circulation and cataloging at the same time that it manages our most important database of descriptive metadata. DLPS should continue to focus on technologies that manage and provide access to the digital objects themselves—the data described by those metadata. Web Systems is ideally suited to provide a top layer of discovery and “use” tools that tap into both local data resources and those things we license remotely. I believe that our current organizational model shares out responsibility effectively and allows for a sort of specialization that is complementary; however, I wouldn’t rule out different organizational models if they made sense in the course of this process. For those readers outside the UM Library, the fifth department is Desktop Support Services, responsible not only for the desktop platform but also for the infrastructure supporting it.

[2] For example, with regard to Deep Blue, our institutional repository, in Michigan’s agreement with Wiley, approximately 33% of the Wiley-published/UM-authored content is restricted to UM users; and in our agreement with Elsevier, we may make it possible for Google to discover metadata but not fulltext. Similar things are bound to occur in the materials we put online in services other than Deep Blue.


  1. When I first published this piece, Andy Ashton attached a couple of comments that raised important questions about next gen discovery in the larger web space. He began by noting that he doesn't "see the discovery services such as those implemented by Google et al to be the inevitable points-of-discovery. At the risk of arguing the same point: the more open and ubiquitous web services become, the less discovery becomes reliant on a single provider. Instead of opening our data to Google or Amazon, I see emerging projects opening data to anyone, including ourselves. This makes the point-of-access less defined and encourages a more fluid interchange of data via network services. If library systems open up, provided the metadata is there and accessible, academic networks can be just as valuable as Google in discovery. Obviously we have a long way to go, but it is encouraging to see a bunch of community-driven projects popping up that seem to be embracing this idea."

    He also added the following:
    "Although I come from a relatively small institution without a significant research mandate, I’m still fairly shocked at the lack of discussion and awareness among rank-and-file librarians, and within library “scholarship”, regarding many of the core issues you bring up in the piece.

    At a recent symposium on “Next-Generation Catalogs”, it became clear that the overwhelming majority of librarians believe that their Next-Gen system will be a product purchased from a traditional ILS vendor, the difference being that it looks and acts more like Amazon. And maybe they’re right. However, the dialog about web services, open repositories – or even real discussions about a wholesale shift from MARC for standard library collections – is still the domain of researchers and developers. The most promising library-oriented projects that I’m aware of are often emerging from outside of libraries.

    As you mention at one point in this piece, the most open, accessible services are useful without metadata that people want to use, but I have yet to see a broad-based, universal discussion among librarians to that end.

    Unfortunately, so many libraries don’t make room for participation in community-driven development projects, and so remain largely outside the sphere from which their future systems may be developed, and largely ignorant of the issues surrounding them. We risk continuing to be passive consumers of interfaces – a role that doesn’t argue well for librarians’ long-term relevance."

  2. At the time I posted this, Tito Sierra provided this very helpful comment on the mode of development, saying "I have noticed a tendency towards premature optimization in many library systems and standards. How can one optimize a system before a user has had an opportunity to use the system? I believe a better approach is to start small and grow your system based on continual feedback from your users. This requires a willingness to experiment, take risks, and sometimes fail. Though with an iterative approach the costs of failure are small in comparison to the costs of elaborately designed architectures that seem to become irrelevant too quickly."