Second Provenance Challenge Template
Participating Team
Differences from First Challenge
Note here any changes in your provenance representation, workflow enactment or system since the first challenge. Alternatively, if you did not participate in the first challenge, please provide the same details as were required for those who did (particularly workflow representation and provenance representation).
Karma has a provenance collection part in the form of provenance activities collected from workflow executions and a provenance dissemination part in the form of views generated from the activities. In the
First Challenge, we exposed the views of the provenance collected, such as Workflow Trace and Data Provenance. The views were themselves not sufficient to answer all queries and the actual provenance activities are necessary. Hence, in addition to the provenance views, we are also submitting the provenance activities we collect that gives the complete description for the workflow run that has sufficient information to answer all queries.
Provenance Data for Workflow Parts
Give links here to your provenance data files for the workflow parts of the challenge: three parts for the original workflow and three parts for the modified workflow (as per provenance query 7). The data files could be attached to the results page.
Workflow Model
You can view the documents in the
XBaya Workflow Composer [a java web Start application].
Provenance Activities (Provenance Data to Import)
Each activity is a individual XML document. For convenience, they have been concatenated into a single XML document under a single root element
for each workflow.
(For an example of a provenance view that is generated in Karma from these activities, see Stage I WorkflowTrace, Stage II WorkflowTrace, Stage III WorkflowTrace, and Alternate Stage III WorkflowTrace. These views are for sample only and not expected to be used for importing.)
Provenance Activity Description & Utilities
Here is a description of the key provenance activities that can help with importing the data. The activity types described here encapsulate all information required to answer the Challenge queries. Other activity types present in the dump above are useful for more complex querying and monitoring requirements.
In our model, we make a distinction between and abstract service and a service instance. Workflows are composed by connecting abstract services, while instances of services are used during workflow invocation through late binding. Service instances are identified using a globally unique 'serviceID'. Workflows are also considered as a type of service. Hence, you can compose an abstract workflow out of abstract services and out of other abstract workflows. Workflow instances also have a globally unique 'serviceID' that identifies the workflow instance. Data products are identified by a globally unique ID and they optionally have a replica URL associated with them when they appear in activities.
Service instances are usually invoked in the context of a workflow. An invocation is identified by 4 parts:
- the serviceID of the service being invoked,
- the (parent) workflow instance in whose context this invocation takes place is identified using the 'workflowID' attribute (whose value equals the serviceID for the workflow),
- the 'workflowTimestep' gives a logical time for the service invocation in the workflow lifecycle, and
- the 'workflowNodeID' uniquely identifies a node in the workflow graph (the same abstract service used multiple times in an abstract workflow will have different workflowNodeIDs).
All activities have these 4-IDs set as the notification source for that activity. They also optionally have the ID for the client that invokes this service/receives the response. All activities have a timestamp file that gives the time at which they were generated, a human readable description, and an optional XML 'annotation' field for extensions.
Some key activities in a workflow's lifecycle (in order of generation) are:
- Service/Workflow Initialized : These activities are generated at the start of a workflow instance or a service instance. These are generated only if a new instance is created, so they may not be published for all workflow runs if a compatible service/workflow instance was already available. Since these activities are not generated in the context of a service/workflow invocation (since it can be reused by multiple workflows), the notification source for this activity only has the serviceID field set.
- Service/Workflow Invoked : An invocation consists of a request message and an optional response message, depending on the message exchange pattern. For the challenge workflows, both are present. The Service/Workflow Invoked activity is produced when the request message of the invocation is received by the service instance. Other than default fields, it contains an optional request header and request body elements for putting the SOAP (or equivalent) request Header and Body.
- Data Consumed : Describes a data product that is used by this invocation. Provides better guarantees than looking for data IDs in SOAP messages.
- Data Produced : Describes a data product that is generated by this invocation.
- Sending Result : This is produced when the response message of the invocation sent by the service instance to the receiver (client). Other than default fields, it contains an optional result header and result body elements for putting the SOAP (or equivalent) response Header and Body.
- Service/Workflow Terminated : Decribes a service or workflow instance that is shutting down.
Karma libraries are available for reading the XML activities as Apache XML Bean objects in Java.
Model Integration Results
State here which combinations of teams' models you have managed to perform the provenance query over
Translation Details
Describe details regarding how data models were translated (or otherwise used to answer the query following the team's approach), any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query, and any data which was excluded in translation from a downloaded model because it was extraneous
Benchmarks
Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system
Further Comments
Provide here further comments.
Conclusions
Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting.
-- YogeshSimmhan - 22 Feb 2007
to top