CyRDAS Recommendations
The convergence of computation, information management, networking and intelligent sensing is poised to transform the conduct of science. This emerging cyberinfrastructure (CI) has the potential to provide powerful new tools, research methodologies, and processes that will enable scientists to investigate the atmosphere in new and previously unimaginable ways. To ensure rapid progress in the atmospheric sciences, it is essential that the technology that enables frontier research and effective education continues to be developed, deployed and maintained. Given the computing-intensive and data-intensive nature of the atmospheric sciences, an important component of the enabling technology is the CI that undergirds computation and data management and analysis. Development, deployment and evolution of the most effective CI must be driven by science priorities, and the science community must be engaged and remain involved in the planning process.
The planning process for how best to implement a CI,
that serves the research and educational needs of the atmospheric sciences, can
be viewed from four interrelated perspectives: human resources, data, computing
infrastructure and capacity building, and software life cycle. Human resources
has the highest importance among these perspectives in terms of the development
and widespread use of CI within the atmospheric sciences, improving diversity,
and improving the access to research and educational tools and capabilities by
traditionally underserved populations.
In the table that follows below, a summary of the
issues associated with each of the perspectives and recommendations for how to
address these issues are listed. In addition to the specific issues and
recommendations listed for each theme area, there are several issues that cut
across these areas and stand out as problems that must be addressed in any CI
initiative.
Barriers to
entry:
There are real barriers to effective development and use of CI in the atmospheric
sciences. Among the barriers are the following:
Operational
and real time needs: A unique aspect of
atmospheric sciences is the close relationship among research, education, and
operational meteorology – pollution alerts, weather prediction, climate
prediction, space weather alerts, etc. The CI that is developed for atmospheric
sciences should, at all stages of design and implementation, support this relationship,
whenever practical. This has implications for the design of networks (e.g.,
high-speed connections between research laboratories and operational
forecasting centers), data distribution (e.g., real-time meteorological
observations in the classroom), computing infrastructure (e.g., on-demand
computing), and management and access to data sets and community models.
Software life
cycle: The high concentration of
research resources on innovative development has led to neglect of funding for
the full software life cycle, including hardening and deployment. Further, there is a critical need for funding
for training, support, and maintenance of the underlying infrastructure.
Without viable options for supporting software throughout its useful lifetime,
investments in software infrastructure and tools will be lost.
Coordination:
There are already many diverse CI planning and implementation efforts within
different divisions of the Geosciences, different directorates of the NSF and
among the various federal agencies with sizable portfolios in research and
development. These activities should be coordinated, whenever possible, to
optimize the return on investment and to minimize the duplication of effort.
The atmospheric sciences research agenda is an
ambitious one, whose individual elements each require a significant enhancement
of and substantial investment in the national computing and data
infrastructure. The questions that have been articulated in several venues and
reports include the following representative examples:
·
Weather: What is the limit of predictability of the
atmosphere and how does it change as a function of location, spatial scale or
temporal scale? For example, to what extent can tornadoes be predicted and what
will it take to make skillful predictions with lead times that can reduce the
associated loss of human life and property?
·
Climate: What are the global, regional and local effects of
climate variability and change on the world’s production of food and use of
energy, on local ecosystems, and on the spread of disease? What will it take to
build integrated Earth system models that include the treatment of atmospheric
chemistry, dynamical vegetation and aquatic and terrestrial biogeochemical
cycles deemed necessary to realize the predictability of the Earth’s climate
and make reliable estimates of regional climate change?
·
Air Quality: What emission control strategies can
effectively reduce the health and ecosystem impacts associated with air
pollution? How can we optimally combine measurements with models to provide
operational forecasts of air quality and chemical weather for use in emergency
response and public health protection? How can we better quantify, and reduce
the uncertainties, regarding the role of aerosols in the atmosphere, and their
specific connection to health effects and their role in cloud and climate
systems?
·
Space Physics and Space Weather: Why does solar activity
vary in a regular 11-year cycle? How does the Earth's global space environment
respond to solar variations? What is the
probability that specific types of space weather phenomena will occur over
periods from hours to days?
Addressing these questions,
as well as other challenging scientific problems that cut across discipline
boundaries within the atmospheric sciences and across the geosciences as a
whole, will require substantial new research that will push the limits of the
observing systems and the computing and data infrastructure that is currently
in place.
In particular, these questions will require
·
massive ensembles of cloud-resolving and
eddy-resolving weather, climate, and space physics models;
·
advanced data assimilation systems capable of
accurately analyzing immense volumes of observational data from radar and
satellites;
·
observing and analysis systems capable of delivering
accurate and adequately-resolved measurements of atmospheric quantities from
the Sun to the Earth’s surface;
·
data acquisition, data management, data integration,
data analysis, data mining, and data visualization tools capable of advancing
scientific understanding and forecasting that is carried out in a distributed
environment, that can support on-demand and real-time activities, and that
involves both small and very large data streams;
·
configurable workflow orchestration and
web services capable of coupling observations, modeling, and data handling,
analysis, publishing and visualization through grid-enabled facilities.
To provide these capabilities requires computing
infrastructure, data systems, software and, importantly, human resources, that
go far beyond what is currently available. Substantial investments in CI will
be needed to develop, enhance and sustain the systems that can meet these
challenges.
In order to address these pressing issues, it is
recommended that the NSF change the way in which it invests in CI for the
geosciences in the following ways:
While these recommendations are very broad, they are
deemed necessary to accelerate progress in the atmospheric sciences. To further
articulate the nature of changes that are recommended, the table below provides
more details on both the issues and the short term (2-5 years) and long term
(5-10 years) recommendations, organized according to the perspectives described
above.
If the recommendations for potentially very large
investments by the NSF in CI, as summarized above and detailed below, are
adopted, we foresee that the Nation in general, and the atmospheric sciences in
particular, will gain both scientifically and more broadly in several ways. The
potential outcomes include:
·
A diverse, informed, educated and well-connected
work force that is better able to meet the emerging complex scientific
challenges, which inherently lie at the boundaries among disciplines, to
determine and quantify the impacts of natural variations in the environment on
society, economic interests and ecosystems;
·
A Nation that is more resilient to disruption by
intentional or unintentional influences and better equipped to assist other
nations through accelerated progress on the utilization of forecast information
to reduce the loss of life and property and the adverse effects on ecosystems
associated with severe weather, hurricanes, toxic or lethal pollutant
dispersal, floods, droughts, solar flares and other space weather events, and
regional climate change;
·
A Nation that maintains its international
competitiveness in the atmospheric sciences; and
·
A more effective and efficient environment for
accelerating scientific discovery, in which researchers spend more time
conducting their scientific investigations and less time addressing information
technology issues;
·
New knowledge, improved forecasts, and better understanding
of the Earth as a complex environmental system, through increased use of
atmospheric data and improved communication and information sharing among
researchers and educators.
|
Issues |
Short-Term Recommendations |
Long-Term Recommendations |
|
General |
||
|
1.
Barriers to entry: Human resources, an essential element of CI, have
been significantly underemphasized |
Recognize
“duality” in CI – physical and human resources |
Give
human resources issues top priority in CI development and investment |
|
Barriers
to entry:
Lack of awareness |
*
Support establishment of a refereed CI journal for AS. *
Include a permanent schedule of CI activities (public forum, student
sessions) at the major AMS and AGU meetings. *
Support AS faculty in technical sabbaticals to IT groups |
*
Establish a “clearing house” organization for promotion and dissemination of
CI information to AS community *
Support CI involvement of domain scientists at all levels via targeted
solicitations |
|
Barriers
to entry:
Uneven distribution of CI resources |
Assess
the types and distribution of resources across the AS in relation to AS
objectives in research, education and operational meteorology |
*
Distribute additional resources to medium-sized projects, which have best
chance of producing sustained benefits *
Seek broader distribution of resources to
inform the AS community and empower scientists to select and use the
tools that are most effective |
|
Barriers
to entry:
Opaqueness of CI to domain scientists and students |
Establish
a clearinghouse for CI information (e.g., digital library) |
Concentrate
notable expertise at each large center in a small number of domains |
|
Barrier
to entry:
Computing infrastructure is not deployed or managed such that major AS
challenges can be addressed |
This
report includes a short list of AS problems that can be addressed with a
modest, O(102) increase in computing capability |
Develop
and maintain a catalog of the largest AS challenges, including expected
outcomes and broader impacts, whose large computational requirements are
quantified and monitored |
|
2
Two unique aspects of AS are its operational and real-time elements |
Encourage
more interaction between operations and educational and research activities
in the AS |
Support
development of Geosciences computing environment that supports transition
from research to operations |
|
3
The emphasis on investments in software R&D have led to neglect of the
full software life cycle that also includes training, support and maintenance |
Begin
to develop methods for determining applicability and longevity of given
software projects |
Develop
set of options and criteria for renewal of successful software projects that
provide adequate funding to ensure usefulness to AS community |
|
4
A wide and complex diversity of CI planning activities threatens to lead to
inefficiency, duplication of effort, and retardation of progress |
Continue
CyRDAS-like planning activity and ensure that CI planning activities across
the Geosciences continue to communicate and collaborate on recommendations
and implementation plans |
Coordinate
with other Geosciences, remainder of NSF, and other federal agencies via
cross-cutting panels with budget authority |
|
Issues |
Short-Term
Recommendations |
Long-Term
Recommendations |
|
Social and Cultural |
||
|
1.
The academic culture inhibits development/application of CI. |
* Encourage NCAR to take
lead in rewarding scientists who engage in CI contributions *
Encourage short-term exchanges of technologists among labs *
Establish a peer-reviewed journal for CI in the geosciences |
*
Encourage sabbaticals that includes a major CI component *
Encourage academic reward process to recognize “published” code and code
usage metrics *
Work with AMS Heads and Chairs to develop suitable procedures to reward CI
work |
|
2.
Many departments are inadequately provided with computing professionals. |
Survey
AS departments to determine the level of computing support they have and they
need |
Support
departmental level investments in CI personnel. |
|
3.
Inadequate education and training: AS education needs to be closely
associated with tools and data sets that are used in research |
Support
workshops and collaboration opportunities to bring together AS educators,
tool developers, and data providers with educational technologists,
instructional designers and educational evaluators |
*
Support development of a Geosciences computing environment that delivers
research-quality data sets to graduate classrooms and substantial data and
analysis capabilities to K-12 classrooms *
Encourage departments to train next generation of technologists to work in AS
environments *
Encourage degree programs in technical management and development in the
Geosciences. *
Encourage interdisciplinary programs and degrees that provide a geosciences
emphasis in computer science and engineering departments |
|
Inadequate
education and training: Programming languages and networks change faster than domain
sciences. |
Organize
“workplace of the future workshops” |
Support
software development and open software practices (including verification) to
allow researchers to concentrate on science rather than programming |
|
Inadequate
education and training: Students are unprepared to understand computational physics, work
with complex programming environments and interpret large volumes of data. |
*
Encourage undergraduate courses in programming for scientists and engineers,
including exposure to practical tools (e.g. debuggers) *
Work with AMS Heads and Chairs to develop undergraduate sequence in
computational science (also coordinate with other disciplines through their
professional societies) |
*
Require students to demonstrate competency in computational physics,
programming and data analysis (analogous to math proficiency) *
Develop computational courses *
Encourage AS/IT courses intended for CS students |
|
4.
Culture clash between IT and applications, especially in AS – applications
are not attractive for CS researchers; AS does not influence technology
trends |
*
Encourage AS researchers to visit, teach in or work in IT groups *
Encourage connections among professional societies, e.g., AMS, ACM and IEEE |
*
Encouraged a requirements process to bridge the gap between technical
collaborators and Geoscientists *
Support small teams of software “craftspersons” |
|
5.
AS research and education is increasingly difficult with too many data
streams and sources |
Deepen
coordination with NOAA and NASA |
Encourage
development of data grids and consider virtual observatories (metadata
standards, common portal design etc.) in AS and Geosciences |
|
6.
Collaboratories have not fulfilled their potential |
*
Consider sociological studies to determine how best to structure
collaborative facilities. *
Improve awareness in AS of computer-supported collaborative work and
computer-supported collaborative learning. |
Support
collaboratories only when success depends on collaboration |
|
Issues |
Short-Term
Recommendations |
Long-Term
Recommendations |
|
Data |
||
|
1.
In AS, data and metadata from diverse geography- and sub-discipline-specific
sources must be universally available in a timely manner and seamlessly
interoperable |
Educate
the community about availability of data across the Geosciences. |
Support
a Geosciences computing environment that enables the development,
documentation and distribution of diverse data sets as widely as possible |
|
2.
Improve data utility: The value of individual data sets can be greatly
enhanced when used with other data sets. |
Encourage
sharing existing data sets via distributed servers. |
Support
development of many interoperable data sets and catalogs of data sets |
|
Improve
data utility: Finding and using data that may be relevant in AS is very difficult,
primarily due to lack of metadata standards. |
*
Develop Geosciences data catalogs *
Encourage development of innovative information retrieval applications for
geosciences data sets |
*
Encourage development of standards for metadata, mass storage and data
transport among large centers *
Encourage development of tools for data integration and data assimilation |
|
Improve
data utility: Data from different sources must be rigorously compared and
validated. |
Encourage
thematic data grids on specific science problems |
Encourage
adoption of standards and tools for rigorous comparison of models and
observations |
|
3.
Scientists must be able to publish and distribute (both formally and
informally) their analysis, data mining and machine learning results,
visualizations, etc. and tie them to the underlying data. |
*
Work with AMS to ensure that planning is cognizant of development of online
journals etc. *
Ensure that planning is cognizant of intellectual property rights issues |
Support
development of tools and capabilities for linking publication to underlying
data and entire processing stream |
|
4.
Valuable, unique data that cannot be recreated must be sustained for the long
term. |
Provide
support for curating AS data sets in multiple locations with on-site data
stewards |
*
Support planning for ongoing media migration, ongoing testing of back-up and
recovery plans *
Support the development of strategies for curating and reviewing data
archives |
|
Issues |
Short-Term
Recommendations |
Long-Term Recommendations |
|
Computing Infrastructure and Capacity Building |
||
|
1.
AS will continue to drive the high-end computing requirements of the Nation
for the foreseeable future, but it will not necessarily drive the computing
infrastructure market |
*
Begin to forge an effective linkage between the sciences and other drivers of
high-end computing and CI *
Assess common requirements among AS and other high-end technology drivers |
Continuous,
sustained improvements in the capacity and capability computing in the US
must be supported and made universally available to AS researchers |
|
2.
Distribution of computational resources – individuals, departments, campuses,
national centers – is not adequate, balanced, or seamless |
Survey
AS practitioners in education, research and operations to determine how they
do their computing today |
*
Invest in computing infrastructure and capacity building at all levels
(centers, campuses and departments) *
Prepare for Grid computing |
|
3.
The proliferation of architectures and computing paradigms and the related
lack of effective systems-level tools, e.g., compilers presents difficulties
in tracking and using evolving technology in the AS; opinion sharply divided
on best architecture for AS |
Survey
AS practitioners in education, research and operations to determine range of
architectures and priorities for investment in system-level tools |
*
Support the development of a universal Geosciences computing environment that
allows the seamless transport of codes from desktop to supercomputers and the
Grid *
Conduct in-depth survey to determine how best to invest scarce resources |
|
Issues |
Short-Term
Recommendations |
Long-Term
Recommendations |
|
Software |
||
|
1.
A proliferation of specialized software tools limits productivity with steep
learning curves and a fog of ignorance of what is available |
Develop
a catalog of available tools and capabilities |
Support
the development of a Geosciences computing environment that provides for
universal interoperability of tools for working with data and metadata |
|
2.
Many community codes suffer from poor performance and a high error rate due
to slow adoption of the “open source” model of software development |
Encourage
academic recognition for open development activities |
Support
broader adoption of open source practices in the AS, especially for community
codes, to ensure optimally effective code development |
|
3.
Students do not have adequate knowledge of operating environments used in AS
research and education |
|
Encourage
training and educational programs that support students transitioning from
Windows to AS operating environments |
|
4.
Software life cycle: *
Software frameworks and community models suffer from the absence of sustained
funding for training, support and maintenance *
Large investments in community models has led to neglect of model development
in the academic community |
Fund
applications groups and IT development groups jointly to motivate users to
adopt new frameworks |
* Integrate data and
software tools at all stages for science and decision making through
development of standards and frameworks *
Plan investments in software development with long-term support in mind *
Invest in a wider variety of models targeted at various AS sub-disciplines |
|
5.
Steep learning curve of many tools makes them impractical in educational setting |
Support
development of training modules and documentation for most widely-used tools |
Encourage
partnerships with specialists in educational technology, cognitive science,
and human-computer interaction |
|
6.
GIS software has the potential to offer a computational framework for
integrating weather, climate, environmental, and socio-economic data |
Encourage
projects that seek greater levels of integration and interoperability between
scientific data systems and GIS data sets and tools |
Encourage
participation by AS community in OpenGIS and other organizations to promote
seamless integration of scientific information systems and GIS |