Grant: NSF Science and Engineering Information Integration and Informatics (SEIII) Program Planning Information Infrastructure Through a New Library-Research Partnership (Small Grant for Exploratory Research) (with Janet McCue, director A.Mann Library)

Proposal Summary: Scientists are generating data regarding language acquisition at a fast pace. Although essential to research in Developmental Psychology and Linguistics as well as Cognitive Science, data are accumulating without an infrastructure to enhance collaborative research or to support preservation, storage, access, or dissemination. The collective academic community is forced to rely solely on the individual researcher to maintain data in conjunction with individual research grants, and data are consequently seldom available for scientifically sound comparative analysis. The Cornell Language Acquisition Laboratory (CLAL) comprises data collected over twenty years and across more than twenty languages and countries. These data include audio/video recordings of language samples from thousands of children in over 20 languages (e.g. over 900 tapes in English; 500 in South Asian languages).

With support from the National Science Foundation in 2003, Cornellís CLAL joined with seven other national institutions and several international institutions to create a new Virtual Center for the Study of Language Acquisition. The goal of this new collaborative effort is to enhance cross-linguistic research by sharing data and sharing materials on best practices for the scientific study of language acquisition. Recognizing the unique strength of the library to support archiving, access, cataloging, and outreach, staff in Cornellís Mann Library and researchers in CLAL began examining metadata and preservation issues as they relate to language acquisition data. With the establishment of the Virtual Center and the strong partnership between the library and CLAL, we are now uniquely poised to accomplish a twofold purpose: 1) systematic preservation of data; and 2) testing the viability of a new infrastructure between research lab and University Library for scientific collaboration involving cross-linguistic data.

Through the SGER grant, we will develop and document a prototype infrastructure for preservation and access. This planning will include further collaboration between the research lab (CLAL) and Cornellís Albert R. Mann Library; preparation of documentation of principles of data preservation, access and dissemination, including the creation of a functional prototype for testing and demonstration. The documentation will include best practices for data digitizing and archiving as well as metadata systems that are compliant with the scientific community (i.e. OLAC), standardized annotation systems for existing collections and materials. In addition, we will hold a meeting with MIT in preparation for a future node-to-node proof of concept and the development of a protocol for interinstitutional library cooperation. We will also consult with industrial (e.g. EMC) and academic support organizations (e.g. Cornell Information Technologies) with regard to outsourcing of data processing and storage. We will assess and report both initial and ongoing costs required for the planned infrastructure and research data management.

The plans we will develop will provide the basis for the projectís broader impact, i.e., a test of whether our infrastructure can be extended on a large scale to research institutions and libraries. This planning grant will allow the partners to identify the barriers and test the feasibility of extending the role of large research libraries in supporting value-added services for research data, including access, metadata, outreach, training, and archiving. Research institutions already rely on the library for print and digital collections; the library can similarly ensure that research data are collected and preserved for the benefit of the community-even after a research project ends.