Historical Background

 

The National Science Foundation has demonstrated an interest in developing support for technology based on the increasing promise of and demand for technology in the scientific community. Although past discussions have occurred most often in the CISE directorate, cyberinfrastructure has become a foundation-wide area of interest. Workshops and panels on cyberinfrastructure at NSF have all indicated that the scientific community believes that NSF is in the best position to support such research. NSF cyberinfrastructure panels have emphasized the importance of quick action and adequate funding which are crucial to the outcome of this undertaking. The purpose of this web site is to help the National Science Foundation build cyberinfrastructure within the Biology Directorate.

 

As one reflects on the partnership between NSF and the community over the past decades, it is apparent that the core of that partnership concerns a shared commitment to the value of well-organized, integrated, synthesized information, or knowledge. Knowledge is enabling for our society, not just our science, in the deepest sense. Our ability to use science for public benefit depends on the set of transactions and translations that take data to information and to knowledge, or in the broadest scheme, to the kinds of empowering understanding we call wisdom. The opportunities in the biological sciences today to advance our fundamental understanding of life and to apply that understanding to societal needs, ranging from the environment to personal well being, is simply extraordinary. We have entered an era characterized by data-intensive research observations. Collecting, managing, and in particular, connecting data from various modalities and on multiple scales of biological systems, from molecules to ecosystems, is essential to turn that data into information. Each biological science discipline also now requires the tools of information technology to probe that information, to interconnect experimental observations and modeling, and contribute to an enriched understanding or knowledge. The central aspect of the challenge is why this century is widely recognized as the "Century of the Biological Sciences"; that is, the very complexity of biology means that the information technology challenges for achieving wisdom, or acumen, for basic and applied life science research are at a level and scale at least as significant, and often more so, than other research areas.

From Bytes to Biology: The Origins of CI

The BIO Directorate is well positioned, given its history, to play a major role in this transformation of science. In 1984, as the High Performance Computing program began, BIOAC held a workshop at Airlie House to evaluate if and how, biologists could use supercomputers. The answer to the first question was a resounding yes, and the answer to the second, with full support from BIOAC management, led the Instrumentation Program to undertake entirely new directions. For example, as a direct result of the commitment by BIO to sustain access, biological science applications using high performance computing moved from a fractional percent of the compute-time available in 1985 to nearly 30% of the time as the major application at NSF supercomputer centers in 1998. The singularly most important information resource for the next decade is the repository for the architectures of macromolecules, the Protein DataBank or PDB. NSF funded the creation of that database decades ago, when few could see its ultimate impact. The revision, updating and expansion of the PDB to serve the entire community of biologists superbly, as NSF has driven and enabled over the past decade, will turn out to be essential for understanding the mechanisms by which the cell’s supramolecular machines work. Similarly, the BIOAC development of an infrastructure of computer and information technology for the extraordinarily successful LTER program has been singularly important for ecosystems research. Analysis of plant and microbial genomes relies on computational tools. The neurosciences, also, have central requirements for simulation and computational modeling. With PDB, LTER, and numerous, increased core research needs from molecules to the mind providing the pull, and pushed by the High Performance Computing and Information Technology initiative, BIO, at a particularly prescient point in time (over a decade ago), began the first programs in government to fund computational biology and bioinformatics (then, database activities). In the following decade, BIO has strengthened this interface across all its own programs and partnered with CISE on each of its subsequent initiatives, including HPCCIT, KDI and ITR. In sum, BIO already has a foundation upon which to build a cyberinfrastructure for the biological sciences. It is natural and even imperative that the Directorate for the Biological Sciences take a leadership role in its full implementation.

Creating a Cyberinfrastructure for the Biological Sciences (CIBIO)

The revolution in the computer science and information technology world, driven by the academic research community and the commercial sector, has happened at the same time as the revolution in the biological sciences. The two have now become ideal partners for each other. Life along the frontier between the biology and computing is truly exciting and already contributions from this frontier are essential for progress in life science research. The development of the computational grid services model, from data and information grids to compute grids to communication grids, will be especially enabling to the biological science community and its widely distributed laboratory environment of individual investigators, as well as to research teams. Grid and cluster computing brings what had been rare and difficult to access technology to the entire biological sciences community. Today, linking knowledge resources together with readily exploited portals is equally essential. Of particular importance in achieving the vision of NSF BIO, the growing world of cyberinfrastructure promises democracy in action for biology, with participation by the entire spectrum of basic scientists supported by NSF BIO, ranging from minority serving institutions to research intensive universities. Even K-12 education will be facilitated by a cyberinfrastructure for biology. Our entire community, the community of biologists in the broadest sense, will participate more fully in the power and joy of discovery and the impact of its consequences. The world wide web has promoted a new kind of dialogue in the life sciences, in which everyone can access the same information and participate in discovery; a high school student can send out a question concerning an some biological detail and mere hours later, a Nobel Laureate from another continent will send back the answer. The expansion of today’s information technologies to create a cyberinfrastructure for biology will accelerate that access, that openness and inclusiveness, while accelerating progress across all biological science research domains.

NSF BIO must integrate today’s ad hoc first steps, architect and organize, and then build out a comprehensive cyberinfrastructure for biology to ensure this vision of democratic access. As early as a decade ago, a molecular biology Nobel Laureate proposed that access to global data would be critical for driving biological science research as well as for individuals sustaining competitiveness (W. Gilbert, Nature, 1991; "Toward a New Paradigm for Molecular Biology"). The Laureate predicted that experimentalists, in deciding what to pursue each day, would come to depend on the world’s production of new findings over the past 24 hours, and that a comprehensive biological information infrastructure and associated computing tools would become deeply embedded in experimental biology. At that time, coupling biology and computing seemed oxymoronic to many; today, that partnership is inherently obvious and must be a central feature of NSF BIO activities from this point forward, if BIO is to lead and accelerate the movement of the life sciences research community into 21st Century Biology.

To establish a cyberinfrastructure for biology, the science drivers that will on the commercial technology to address our needs should first be defined. There will continue to be a major technology push arising from the academic and industrial sectors within the computer and information sciences and engineering, and that technology push will interconnect with science pulls from across the entire domain of scientific investigation. From geochemistry to astronomy, from engineering to oceanography, the scientific communities supported by NSF recognize the opportunities from cyberinfrastructure and are rapidly addressing their own needs, through workshops and white papers. While biology writ large, and BIO specifically, will be able to utilize these visions and advances, BIO must establish the path for the life sciences, to ensure our needs are met and because our community absolutely requires this infrastructure and will drive it further than we can now possibly imagine.

CI: Stepping Stones for Biology in the 21st Century

The nature of the maturation of the biological sciences as we implement 21st Century Biology underpins the expectations for cyberinfrastructure. Beyond the great success of reductionist approaches of the past five decades, biology is moving into an environment to consolidate these gains through information integration. The entry of biology into discovery and synthetic analysis, that is, genome-enabled biology and systems biology as well as the hardening of many biological research tools into high throughput pipelines, serves also to drive the need for cyberinfrastructure. Biodiversity and biocomplexity in the environment are two areas in BIO’s domain for which active scientists and policy makers have already begun to think about the cyberinfrastructure needs, needs that have even been recognized at the White House level though PCAST. The most prominent example of an early application of cyberinfrastructure in biology is BIRN, which serves to link remote neuroscience data, utilizing a multiscale, multimodal database to accelerate new discovery. BIRN can be generalized for most NSF communities, besides cognitive neuroscience and basic neurobiology, and is particular obvious as a model for LTER, NEON, and for numerous activities within environmental biology. Similarly, there are already hundreds of molecular biology databases. Connecting them to research conducted on higher levels of biological organization is important for 21st Century Biology.

Last Updated May 19, 2004