CyRDAS Recommendations

1 March 2004

 

 

The convergence of computation, information management, networking and intelligent sensing is poised to transform the conduct of science. This emerging cyberinfrastructure (CI) has the potential to provide powerful new tools, research methodologies, and processes that will enable scientists to investigate the atmosphere in new and previously unimaginable ways. To ensure rapid progress in the atmospheric sciences, it is essential that the technology that enables frontier research and effective education continues to be developed, deployed and maintained. Given the computing-intensive and data-intensive nature of the atmospheric sciences, an important component of the enabling technology is the CI that undergirds computation and data management and analysis. Development, deployment and evolution of the most effective CI must be driven by science priorities, and the science community must be engaged and remain involved in the planning process.

The planning process for how best to implement a CI, that serves the research and educational needs of the atmospheric sciences, can be viewed from four interrelated perspectives: human resources, data, computing infrastructure and capacity building, and software life cycle. Human resources has the highest importance among these perspectives in terms of the development and widespread use of CI within the atmospheric sciences, improving diversity, and improving the access to research and educational tools and capabilities by traditionally underserved populations.

In the table that follows below, a summary of the issues associated with each of the perspectives and recommendations for how to address these issues are listed. In addition to the specific issues and recommendations listed for each theme area, there are several issues that cut across these areas and stand out as problems that must be addressed in any CI initiative.

 

Cross-Cutting Issues

 

Barriers to entry: There are real barriers to effective development and use of CI in the atmospheric sciences. Among the barriers are the following:

 

Operational and real time needs: A unique aspect of atmospheric sciences is the close relationship among research, education, and operational meteorology – pollution alerts, weather prediction, climate prediction, space weather alerts, etc. The CI that is developed for atmospheric sciences should, at all stages of design and implementation, support this relationship, whenever practical. This has implications for the design of networks (e.g., high-speed connections between research laboratories and operational forecasting centers), data distribution (e.g., real-time meteorological observations in the classroom), computing infrastructure (e.g., on-demand computing), and management and access to data sets and community models.

 

Software life cycle: The high concentration of research resources on innovative development has led to neglect of funding for the full software life cycle, including hardening and deployment.  Further, there is a critical need for funding for training, support, and maintenance of the underlying infrastructure. Without viable options for supporting software throughout its useful lifetime, investments in software infrastructure and tools will be lost.

 

Coordination: There are already many diverse CI planning and implementation efforts within different divisions of the Geosciences, different directorates of the NSF and among the various federal agencies with sizable portfolios in research and development. These activities should be coordinated, whenever possible, to optimize the return on investment and to minimize the duplication of effort.

 

Challenges in the Atmospheric Sciences

 

The atmospheric sciences research agenda is an ambitious one, whose individual elements each require a significant enhancement of and substantial investment in the national computing and data infrastructure. The questions that have been articulated in several venues and reports include the following representative examples:

 

·                     Weather: What is the limit of predictability of the atmosphere and how does it change as a function of location, spatial scale or temporal scale? For example, to what extent can tornadoes be predicted and what will it take to make skillful predictions with lead times that can reduce the associated loss of human life and property?

 

·                     Climate: What are the global, regional and local effects of climate variability and change on the world’s production of food and use of energy, on local ecosystems, and on the spread of disease? What will it take to build integrated Earth system models that include the treatment of atmospheric chemistry, dynamical vegetation and aquatic and terrestrial biogeochemical cycles deemed necessary to realize the predictability of the Earth’s climate and make reliable estimates of regional climate change?

 

·                     Air Quality:  What emission control strategies can effectively reduce the health and ecosystem impacts associated with air pollution? How can we optimally combine measurements with models to provide operational forecasts of air quality and chemical weather for use in emergency response and public health protection? How can we better quantify, and reduce the uncertainties, regarding the role of aerosols in the atmosphere, and their specific connection to health effects and their role in cloud and climate systems?

 

·                     Space Physics and Space Weather: Why does solar activity vary in a regular 11-year cycle? How does the Earth's global space environment respond to solar variations?  What is the probability that specific types of space weather phenomena will occur over periods from hours to days?

 

Addressing these questions, as well as other challenging scientific problems that cut across discipline boundaries within the atmospheric sciences and across the geosciences as a whole, will require substantial new research that will push the limits of the observing systems and the computing and data infrastructure that is currently in place. In particular, these questions will require

·                     massive ensembles of cloud-resolving and eddy-resolving weather, climate, and space physics models;

·                     advanced data assimilation systems capable of accurately analyzing immense volumes of observational data from radar and satellites;

·                     observing and analysis systems capable of delivering accurate and adequately-resolved measurements of atmospheric quantities from the Sun to the Earth’s surface;

·                     data acquisition, data management, data integration, data analysis, data mining, and data visualization tools capable of advancing scientific understanding and forecasting that is carried out in a distributed environment, that can support on-demand and real-time activities, and that involves both small and very large data streams;

·                     configurable workflow orchestration and web services capable of coupling observations, modeling, and data handling, analysis, publishing and visualization through grid-enabled facilities.

 

To provide these capabilities requires computing infrastructure, data systems, software and, importantly, human resources, that go far beyond what is currently available. Substantial investments in CI will be needed to develop, enhance and sustain the systems that can meet these challenges.

 

General Recommendations

 

In order to address these pressing issues, it is recommended that the NSF change the way in which it invests in CI for the geosciences in the following ways:

 

 

While these recommendations are very broad, they are deemed necessary to accelerate progress in the atmospheric sciences. To further articulate the nature of changes that are recommended, the table below provides more details on both the issues and the short term (2-5 years) and long term (5-10 years) recommendations, organized according to the perspectives described above.

 

Expected Outcomes

 

If the recommendations for potentially very large investments by the NSF in CI, as summarized above and detailed below, are adopted, we foresee that the Nation in general, and the atmospheric sciences in particular, will gain both scientifically and more broadly in several ways. The potential outcomes include:

·                     A diverse, informed, educated and well-connected work force that is better able to meet the emerging complex scientific challenges, which inherently lie at the boundaries among disciplines, to determine and quantify the impacts of natural variations in the environment on society, economic interests and ecosystems;

·                     A Nation that is more resilient to disruption by intentional or unintentional influences and better equipped to assist other nations through accelerated progress on the utilization of forecast information to reduce the loss of life and property and the adverse effects on ecosystems associated with severe weather, hurricanes, toxic or lethal pollutant dispersal, floods, droughts, solar flares and other space weather events, and regional climate change;

·                     A Nation that maintains its international competitiveness in the atmospheric sciences; and

·                     A more effective and efficient environment for accelerating scientific discovery, in which researchers spend more time conducting their scientific investigations and less time addressing information technology issues;

·                     New knowledge, improved forecasts, and better understanding of the Earth as a complex environmental system, through increased use of atmospheric data and improved communication and information sharing among researchers and educators.


 

Issues

Short-Term Recommendations

Long-Term Recommendations

General

1. Barriers to entry: Human resources, an essential element of CI, have been significantly underemphasized

Recognize “duality” in CI – physical and human resources

Give human resources issues top priority in CI development and investment

Barriers to entry: Lack of awareness

* Support establishment of a refereed CI journal for AS.

* Include a permanent schedule of CI activities (public forum, student sessions) at the major AMS and AGU meetings.

* Support AS faculty in technical sabbaticals to IT groups

* Establish a “clearing house” organization for promotion and dissemination of CI information to AS community

* Support CI involvement of domain scientists at all levels via targeted solicitations

Barriers to entry: Uneven distribution of CI resources

Assess the types and distribution of resources across the AS in relation to AS objectives in research, education and operational meteorology

* Distribute additional resources to medium-sized projects, which have best chance of producing sustained benefits

* Seek broader distribution of resources to  inform the AS community and empower scientists to select and use the tools that are most effective

Barriers to entry: Opaqueness of CI to domain scientists and students

Establish a clearinghouse for CI information (e.g., digital library)

Concentrate notable expertise at each large center in a small number of  domains

Barrier to entry: Computing infrastructure is not deployed or managed such that major AS challenges can be addressed

This report includes a short list of AS problems that can be addressed with a modest, O(102) increase in computing capability

Develop and maintain a catalog of the largest AS challenges, including expected outcomes and broader impacts, whose large computational requirements are quantified and monitored

2 Two unique aspects of AS are its operational and real-time elements

Encourage more interaction between operations and educational and research activities in the AS

Support development of Geosciences computing environment that supports transition from research to operations

3 The emphasis on investments in software R&D have led to neglect of the full software life cycle that also includes training, support and maintenance

Begin to develop methods for determining applicability and longevity of given software projects

Develop set of options and criteria for renewal of successful software projects that provide adequate funding to ensure usefulness to AS community

4 A wide and complex diversity of CI planning activities threatens to lead to inefficiency, duplication of effort, and retardation of progress

Continue CyRDAS-like planning activity and ensure that CI planning activities across the Geosciences continue to communicate and collaborate on recommendations and implementation plans

Coordinate with other Geosciences, remainder of NSF, and other federal agencies via cross-cutting panels with budget authority

 

 

Issues

Short-Term Recommendations

Long-Term Recommendations

Social and Cultural

1. The academic culture inhibits development/application of CI.

* Encourage NCAR to take lead in rewarding scientists who engage in CI contributions

* Encourage short-term exchanges of technologists among labs

* Establish a peer-reviewed journal for CI in the geosciences

* Encourage sabbaticals that includes a major CI component

* Encourage academic reward process to recognize “published” code and code usage metrics

* Work with AMS Heads and Chairs to develop suitable procedures to reward CI work

2. Many departments are inadequately provided with computing professionals.

Survey AS departments to determine the level of computing support they have and they need

Support departmental level investments in CI personnel.

3. Inadequate education and training: AS education needs to be closely associated with tools and data sets that are used in research

Support workshops and collaboration opportunities to bring together AS educators, tool developers, and data providers with educational technologists, instructional designers and educational evaluators

 

* Support development of a Geosciences computing environment that delivers research-quality data sets to graduate classrooms and substantial data and analysis capabilities to K-12 classrooms

* Encourage departments to train next generation of technologists to work in AS environments

* Encourage degree programs in technical management and development in the Geosciences.

* Encourage interdisciplinary programs and degrees that provide a geosciences emphasis in computer science and engineering departments

Inadequate education and training: Programming languages and networks change faster than domain sciences.

Organize “workplace of the future workshops”

Support software development and open software practices (including verification) to allow researchers to concentrate on science rather than programming

Inadequate education and training: Students are unprepared to understand computational physics, work with complex programming environments and interpret large volumes of data.

* Encourage undergraduate courses in programming for scientists and engineers, including exposure to practical tools (e.g. debuggers)

* Work with AMS Heads and Chairs to develop undergraduate sequence in computational science (also coordinate with other disciplines through their professional societies)

* Require students to demonstrate competency in computational physics, programming and data analysis (analogous to math proficiency)

* Develop computational courses

* Encourage AS/IT courses intended for CS students

4. Culture clash between IT and applications, especially in AS – applications are not attractive for CS researchers; AS does not influence technology trends

* Encourage AS researchers to visit, teach in or work in IT groups

* Encourage connections among professional societies, e.g., AMS, ACM and IEEE

* Encouraged a requirements process to bridge the gap between technical collaborators and Geoscientists

* Support small teams of software “craftspersons”

5. AS research and education is increasingly difficult with too many data streams and sources

Deepen coordination with NOAA and NASA

Encourage development of data grids and consider virtual observatories (metadata standards, common portal design etc.) in AS and Geosciences

6. Collaboratories have not fulfilled their potential

* Consider sociological studies to determine how best to structure collaborative facilities.

* Improve awareness in AS of computer-supported collaborative work and computer-supported collaborative learning.

Support collaboratories only when success depends on collaboration


 

Issues

Short-Term Recommendations

Long-Term Recommendations

Data

1. In AS, data and metadata from diverse geography- and sub-discipline-specific sources must be universally available in a timely manner and seamlessly interoperable

Educate the community about availability of data across the Geosciences.

Support a Geosciences computing environment that enables the development, documentation and distribution of diverse data sets as widely as possible

2. Improve data utility: The value of individual data sets can be greatly enhanced when used with other data sets.

Encourage sharing existing data sets via distributed servers.

 

Support development of many interoperable data sets and catalogs of data sets

Improve data utility: Finding and using data that may be relevant in AS is very difficult, primarily due to lack of metadata standards.

* Develop Geosciences data catalogs

* Encourage development of innovative information retrieval applications for geosciences data sets

* Encourage development of standards for metadata, mass storage and data transport among large centers

* Encourage development of tools for data integration and data assimilation

Improve data utility: Data from different sources must be rigorously compared and validated.

Encourage thematic data grids on specific science problems

Encourage adoption of standards and tools for rigorous comparison of models and observations

3. Scientists must be able to publish and distribute (both formally and informally) their analysis, data mining and machine learning results, visualizations, etc. and tie them to the underlying data.

* Work with AMS to ensure that planning is cognizant of development of online journals etc.

* Ensure that planning is cognizant of intellectual property rights issues

Support development of tools and capabilities for linking publication to underlying data and entire processing stream

4. Valuable, unique data that cannot be recreated must be sustained for the long term.

Provide support for curating AS data sets in multiple locations with on-site data stewards

* Support planning for ongoing media migration, ongoing testing of back-up and recovery plans

* Support the development of strategies for curating and reviewing data archives

 


 

Issues

Short-Term Recommendations

Long-Term Recommendations

Computing Infrastructure and Capacity Building

1. AS will continue to drive the high-end computing requirements of the Nation for the foreseeable future, but it will not necessarily drive the computing infrastructure market

* Begin to forge an effective linkage between the sciences and other drivers of high-end computing and CI

* Assess common requirements among AS and other high-end technology drivers

Continuous, sustained improvements in the capacity and capability computing in the US must be supported and made universally available to AS researchers

2. Distribution of computational resources – individuals, departments, campuses, national centers – is not adequate, balanced, or seamless

Survey AS practitioners in education, research and operations to determine how they do their computing today

* Invest in computing infrastructure and capacity building at all levels (centers, campuses and departments)

* Prepare for Grid computing

3. The proliferation of architectures and computing paradigms and the related lack of effective systems-level tools, e.g., compilers presents difficulties in tracking and using evolving technology in the AS; opinion sharply divided on best architecture for AS

Survey AS practitioners in education, research and operations to determine range of architectures and priorities for investment in system-level tools

* Support the development of a universal Geosciences computing environment that allows the seamless transport of codes from desktop to supercomputers and the Grid

* Conduct in-depth survey to determine how best to invest scarce resources

 


 

Issues

Short-Term Recommendations

Long-Term Recommendations

Software

1. A proliferation of specialized software tools limits productivity with steep learning curves and a fog of ignorance of what is available

Develop a catalog of available tools and capabilities

Support the development of a Geosciences computing environment that provides for universal interoperability of tools for working with data and metadata

2. Many community codes suffer from poor performance and a high error rate due to slow adoption of the “open source” model of software development

Encourage academic recognition for open development activities

Support broader adoption of open source practices in the AS, especially for community codes, to ensure optimally effective code development

3. Students do not have adequate knowledge of operating environments used in AS research and education

 

Encourage training and educational programs that support students transitioning from Windows to AS operating environments

4. Software life cycle:

* Software frameworks and community models suffer from the absence of sustained funding for training, support and maintenance

* Large investments in community models has led to neglect of model development in the academic community

Fund applications groups and IT development groups jointly to motivate users to adopt new frameworks

* Integrate data and software tools at all stages for science and decision making through development of standards and frameworks

 

* Plan investments in software development with long-term support in mind

* Invest in a wider variety of models targeted at various AS sub-disciplines

5. Steep learning curve of many tools makes them impractical in educational setting

Support development of training modules and documentation for most widely-used tools

 

Encourage partnerships with specialists in educational technology, cognitive science, and human-computer interaction

6. GIS software has the potential to offer a computational framework for integrating weather, climate, environmental, and socio-economic data

Encourage projects that seek greater levels of integration and interoperability between scientific data systems and GIS data sets and tools

Encourage participation by AS community in OpenGIS and other organizations to promote seamless integration of scientific information systems and GIS