TASK 5: Extend the VisTrails infrastructure to provenance-enable the Uintah problem-solving environment to capture provenance information, track simulation parameters, and provide support for publishing and sharing the simulation results and associated data products.
Exploration of large-scale scientific systems using computational simulations produces massive amounts of data that must be managed and analyzed. Because of the volume of data manipulated, and the complexity of the simulations and analysis workflows it is crucial to maintain detailed provenance (i.e., an audit trail) of the derived results. Provenance is necessary to ensure reproducibility as well as enable verification and validation of the simulation codes and results.
In order to manage large-scale simulations and the analysis of their results, we will build upon and substantially extend VisTrails (http://www.vistrails.org), an open-source provenance management and scientific workflow system. A distinguishing feature of the VisTrails system is a comprehensive provenance infrastructure that maintains detailed information about the steps followed and data derived in the course of an exploratory task.
In this project, our focus will be on scalability and knowledge sharing. VisTrails was originally designed as a tool to perform data exploration for single-user environments, where computations are performed on a desktop. To support large-scale simulations and the manipulation of large volumes of data, we will extend the system in different directions. Besides investigating different mechanisms for supporting pipeline execution in distributed computing environments within the Uintah framework, we will also design mechanisms for visualizing the simulation results on a 98 million-pixel display wall.
To facilitate knowledge sharing, we will build a system that adopts the model used by social Web sites to provide our scientists a rich collaborative environment. This system will allow scientists to share not only their data but also the specifications of their analyses and visualizations, as well as their provenance. By using the shared information, project members can benefit from the collective wisdom: by querying analysis specifications which make sophisticated use of tools, along with data products and their provenance, users can learn by example from the reasoning and/or analysis strategies of experts; expedite their scientific training in disciplinary and inter-disciplinary settings; and potentially reduce the time lag between data acquisition and scientific insight.