Simulated domain-specific provenance

Pinar Alper; Elliot Fairweather; Vasa Curcin

doi:10.1007/978-3-319-98379-0_6

Simulated domain-specific provenance

Pinar Alper, Elliot Fairweather^*, Vasa Curcin

^*Corresponding author for this work

University of Luxembourg

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

The main driver for provenance adoption is the need to collect and understand knowledge about the processes and data that occur in some environment. Before analytical and storage tools can be designed to address this challenge, exemplar data is required both to prototype the analytical techniques and to design infrastructure solutions. Previous attempts to address this requirement have tried to use existing applications as a source; either by collecting data from provenance-enabled applications or by building tools that can extract provenance from the logs of other applications. However, provenance sourced this way can be one-sided, exhibiting only certain patterns, or exhibit correlations or trends present only at the time of collection, and so may be of limited use in other contexts. A better approach is to use a simulator that conforms to explicitly specified domain constraints, and generate provenance data synthetically, replicating the patterns, rules and trends present within the target domain; we describe such a constraint-based simulator here. At the heart of our approach are templates - abstract, reusable provenance patterns within a domain that may be instantiated by concrete substitutions. Domain constraints are configurable and solved using a Constraint Satisfaction Problem solver to produce viable substitutions. Workflows are represented by sequences of templates using probabilistic automata. The simulator is fully integrated within our template-based provenance server architecture, and we illustrate its use in the context of a clinical trials software infrastructure.

Original language	English
Title of host publication	Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, Proceedings
Editors	Khalid Belhajjame, Ashish Gehani, Pinar Alper
Publisher	Springer Verlag
Pages	71-83
Number of pages	13
ISBN (Print)	9783319983783
DOIs	https://doi.org/10.1007/978-3-319-98379-0_6
Publication status	Published - 1 Jan 2018
Event	7th International Provenance and Annotation Workshop, IPAW 2018 - London, United Kingdom Duration: 9 Jul 2018 → 10 Jul 2018

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11017 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	7th International Provenance and Annotation Workshop, IPAW 2018
Country/Territory	United Kingdom
City	London
Period	9/07/2018 → 10/07/2018

Access to Document

10.1007/978-3-319-98379-0_6

Cite this

Alper, P., Fairweather, E., & Curcin, V. (2018). Simulated domain-specific provenance. In K. Belhajjame, A. Gehani, & P. Alper (Eds.), Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, Proceedings (pp. 71-83). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11017 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-98379-0_6

Alper, Pinar ; Fairweather, Elliot ; Curcin, Vasa. / Simulated domain-specific provenance. Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, Proceedings. editor / Khalid Belhajjame ; Ashish Gehani ; Pinar Alper. Springer Verlag, 2018. pp. 71-83 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{88635364175e4d9cb8255233bf39c490,

title = "Simulated domain-specific provenance",

abstract = "The main driver for provenance adoption is the need to collect and understand knowledge about the processes and data that occur in some environment. Before analytical and storage tools can be designed to address this challenge, exemplar data is required both to prototype the analytical techniques and to design infrastructure solutions. Previous attempts to address this requirement have tried to use existing applications as a source; either by collecting data from provenance-enabled applications or by building tools that can extract provenance from the logs of other applications. However, provenance sourced this way can be one-sided, exhibiting only certain patterns, or exhibit correlations or trends present only at the time of collection, and so may be of limited use in other contexts. A better approach is to use a simulator that conforms to explicitly specified domain constraints, and generate provenance data synthetically, replicating the patterns, rules and trends present within the target domain; we describe such a constraint-based simulator here. At the heart of our approach are templates - abstract, reusable provenance patterns within a domain that may be instantiated by concrete substitutions. Domain constraints are configurable and solved using a Constraint Satisfaction Problem solver to produce viable substitutions. Workflows are represented by sequences of templates using probabilistic automata. The simulator is fully integrated within our template-based provenance server architecture, and we illustrate its use in the context of a clinical trials software infrastructure.",

author = "Pinar Alper and Elliot Fairweather and Vasa Curcin",

year = "2018",

month = jan,

day = "1",

doi = "10.1007/978-3-319-98379-0_6",

language = "English",

isbn = "9783319983783",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "71--83",

editor = "Khalid Belhajjame and Ashish Gehani and Pinar Alper",

booktitle = "Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, Proceedings",

address = "Germany",

note = "7th International Provenance and Annotation Workshop, IPAW 2018 ; Conference date: 09-07-2018 Through 10-07-2018",

}

Alper, P, Fairweather, E & Curcin, V 2018, Simulated domain-specific provenance. in K Belhajjame, A Gehani & P Alper (eds), Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11017 LNCS, Springer Verlag, pp. 71-83, 7th International Provenance and Annotation Workshop, IPAW 2018, London, United Kingdom, 9/07/2018. https://doi.org/10.1007/978-3-319-98379-0_6

Simulated domain-specific provenance. / Alper, Pinar; Fairweather, Elliot ; Curcin, Vasa.
Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, Proceedings. ed. / Khalid Belhajjame; Ashish Gehani; Pinar Alper. Springer Verlag, 2018. p. 71-83 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11017 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Simulated domain-specific provenance

AU - Alper, Pinar

AU - Fairweather, Elliot

AU - Curcin, Vasa

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The main driver for provenance adoption is the need to collect and understand knowledge about the processes and data that occur in some environment. Before analytical and storage tools can be designed to address this challenge, exemplar data is required both to prototype the analytical techniques and to design infrastructure solutions. Previous attempts to address this requirement have tried to use existing applications as a source; either by collecting data from provenance-enabled applications or by building tools that can extract provenance from the logs of other applications. However, provenance sourced this way can be one-sided, exhibiting only certain patterns, or exhibit correlations or trends present only at the time of collection, and so may be of limited use in other contexts. A better approach is to use a simulator that conforms to explicitly specified domain constraints, and generate provenance data synthetically, replicating the patterns, rules and trends present within the target domain; we describe such a constraint-based simulator here. At the heart of our approach are templates - abstract, reusable provenance patterns within a domain that may be instantiated by concrete substitutions. Domain constraints are configurable and solved using a Constraint Satisfaction Problem solver to produce viable substitutions. Workflows are represented by sequences of templates using probabilistic automata. The simulator is fully integrated within our template-based provenance server architecture, and we illustrate its use in the context of a clinical trials software infrastructure.

AB - The main driver for provenance adoption is the need to collect and understand knowledge about the processes and data that occur in some environment. Before analytical and storage tools can be designed to address this challenge, exemplar data is required both to prototype the analytical techniques and to design infrastructure solutions. Previous attempts to address this requirement have tried to use existing applications as a source; either by collecting data from provenance-enabled applications or by building tools that can extract provenance from the logs of other applications. However, provenance sourced this way can be one-sided, exhibiting only certain patterns, or exhibit correlations or trends present only at the time of collection, and so may be of limited use in other contexts. A better approach is to use a simulator that conforms to explicitly specified domain constraints, and generate provenance data synthetically, replicating the patterns, rules and trends present within the target domain; we describe such a constraint-based simulator here. At the heart of our approach are templates - abstract, reusable provenance patterns within a domain that may be instantiated by concrete substitutions. Domain constraints are configurable and solved using a Constraint Satisfaction Problem solver to produce viable substitutions. Workflows are represented by sequences of templates using probabilistic automata. The simulator is fully integrated within our template-based provenance server architecture, and we illustrate its use in the context of a clinical trials software infrastructure.

UR - http://www.scopus.com/inward/record.url?scp=85053858459&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-98379-0_6

DO - 10.1007/978-3-319-98379-0_6

M3 - Conference contribution

AN - SCOPUS:85053858459

SN - 9783319983783

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 71

EP - 83

BT - Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, Proceedings

A2 - Belhajjame, Khalid

A2 - Gehani, Ashish

A2 - Alper, Pinar

PB - Springer Verlag

T2 - 7th International Provenance and Annotation Workshop, IPAW 2018

Y2 - 9 July 2018 through 10 July 2018

ER -

Alper P, Fairweather E , Curcin V. Simulated domain-specific provenance. In Belhajjame K, Gehani A, Alper P, editors, Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, Proceedings. Springer Verlag. 2018. p. 71-83. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-98379-0_6

Simulated domain-specific provenance

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this