Stimulating Reproducible Software Artifacts
Proceedings of the Third International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS, 2020)
Concerns about software-based science reproducibility are ubiquitous throughout the research community as nearly every field of science depends on experiments using computational artifacts. Recently, federally funded research has been mandated releasing data artifacts, but not software and workflow artifacts. Whenever accessible, software artifacts are typically hard to install/use, and experiment workflows are poorly documented and/or ad hoc. This increases the difficulty of replicating results and reusing computational artifacts, slowing down scientific progress. Several software systems have emerged to address this challenge. In this paper, we describe an approach to evaluate these software systems and to determine well they meet needs to replicate and reuse software, data, and workflow artifacts. As an example of the approach, we evaluate our own Occam system.
Long-term Preservation of Repeatable Builds in Occam
CANOPIE-HPC, SC'19
In order to provide transparency, wide availability, and easier reuse of scholarly software, there is a need for greater emphasis on code preservation. Yet, not just mirroring the source code, but preserving the ability to build it. Occam is a tool that offers preservation and distribution using containerization to provide repeatable execution in both building and running software. This paper gives detail about the design of Occam and its potential use within the scholarly community and beyond.
Supporting Thorough Artifact Evaluation with Occam
Rescue-HPC 2018, SC'18
Efforts such as Artifact Evaluation (AE) have been growing, gradually making software evaluation an integral part of scientific publication. In this paper, we describe how Occam can help to mitigate some of the challenges faced by both authors and reviewers. For authors, Occam provides the means to package their artifacts with enough detail to be used within experiments that can be easily repeated. For the reviewers, Occam provides the means to thoroughly evaluate artifacts by: allowing them to repeat the author’s experiments; providing the ability to modify inputs, parameters, and software to run different experiments.
Software Provenance: Track the Reality not the Virtual Machine
Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS, 2018)
The growing use of computers and massive storage by individuals is driving interest in digital preservation. The scientific method demands accountability through digital reproducibility, adding another strong motivation for preservation. However, data alone can become obsolete if the interactivity of software required to interpret the data is lost. Virtual machines (VMs) may preserve interactivity however do so at the cost of obscuring the nature of what lies within. Occam, instead, builds VMs on-the-fly while storing and distributing well-described software packages. Thus, the system can track the exact components inside VMs without storing the machines themselves, allowing software to be repeatably built and executed. For Occam to recreate VMs, it needs to know exactly what software was used within. Through this tracking, such software can even be modified and rebuilt. Occam keeps track of all such components in manifests, allowing anybody to know exactly what is in each VM, and the origins of each component.
Supporting Long-term Reproducible Software Execution
Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS, 2018)
A recent widespread realization that software experiments are not as easily replicated as once believed brought software execution preservation to the science spotlight. As a result, scientists, institutions, and funding agencies have recently been pushing for the development of methodologies and tools that preserve software artifacts. Despite current efforts, long term reproducibility still eludes us. In this paper, we present the requirements for software execution preservation and discuss how to improve long-term reproducibility in science. In particular, we discuss the reasons why preserving binaries and pre-built execution environments is not enough and why preserving the ability to replicate results is not the same as preserving software for reproducible science. Finally, we show how these requirements are supported by Occam, an open curation framework that fully preserves software and its dependencies from source to execution, promoting transparency, longevity, and re-use. Specifically, Occam provides the ability to automatically deploy workflows in a fully-functional environment that is able to not only run them, but make them easily replicable.
Evaluating Interactive Archives
Science Gateways 2017
The concept of reproducibility has been the keystone of both ancient and modern scientific methods. In spite of this, digital science has recently been put to task to improve its failing record of repeatable experimentation. A plethora of digital archives have appeared in response, yet the community has not defined the end goal. There exists no means of comparing or evaluating digital archives nor the quality of preserved software, and thus no means of knowing if the tools are valid toward that goal. A metric for evaluating software sustainability is provided and used to define a metric for evaluating and comparing interactive software archives.
Active Curation of Artifacts is Changing the Way Digital Libraries will Operate
4th Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4, 2016)
Open Curation and Repeatability for Scientific Artifact Evaluation
Science Gateways 2017
Supporting Long-term Reproducible Software Execution
Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS, 2018)
Artifact Execution Curation for Repeatability in Artifact Evaluation
Private Tutorial: OCCAM / SST 2020
University of Pittsburgh
Private Tutorial 2020
University of Pittsburgh
October 14, 2020
Laboratory for Physical Sciences (LPS) Tutorial 2018
University of Maryland, BC
July 31, 2018
Laboratory for Physical Sciences (LPS) Tutorial, 2017
University of Maryland, BC
July 18, 2017
Solving and Sharing the Puzzle: Modeling and Simulation of Computer Architectures with SST and OCCAM
44th International Symposium on Computer Architecture (ISCA, 2017)
June 24, 2017
Solving and Sharing the Puzzle: Modeling and Simulation of Computer Architectures with SST and OCCAM
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16)
November 14, 2016
Laboratory for Physical Sciences (LPS) Tutorial 2016
University of Pittsburgh
June 23, 2016
Open Curation for Computer Architecture Modeling
42nd International Symposium on Computer Architecture (ISCA, 2015)
June 14, 2015