Transactional Processes in Self-Organizing Information Systems

From GRK-Wiki

Jump to: navigation, search

My research activities about Transactional Processes in Self-Organizing Systems are related to the research field C (Distributed Information Systems) and subproject D1 of the Graduiertenkolleg Metrik.

Contents

Thesis Topics in Short

  • transactional processes/workflows in distributed information systems
  • partitioning algorithm for multiple concurrent and data-dependent processes/workflows
  • parallelizing physical assignment of workflow partitions by running several constraint-solvers
  • efficient recovery strategies of processes/workflows (unified model for atomicity and isolation)
  • selection of cost-optimal serialization graphs

People

Motivation

Recent disasters, such as the 2004 Asian tsunami, the 2005 hurricane Katrina or the 2007 forest fire in Greece, showed the shortcomings of existing information systems for disaster rescue and recovery. In particular, a better preparation for future disaster events as well as an efficient coordination of emergency processes short after a disaster is crucial for a successful disaster management. Systems that support collaborative work among many distributed entities, such as people, components, software services, machines are referred to as Groupware. The majority of existing groupware system, such as Workflow Management Systems, have in common that there is a lack of support for scalability and adaptiveness. These process-aware information systems are rather focused on well-known business processes that are executed in a static environment where resources are guaranteed to be available, communication is reliable and potential partners are known beforehand.

However, workflows can be applied in other, more dynamic application scenarios, such as wilderness exploration or the disaster management. Our graduate school envisions an distributed information systems that consists of (wireless) communicating sensor networks that act within two roles: (a) as an early-warning system before the disaster event and (b) as additional information unit that provides humans - i.e. rescue forces equipped with hand-held devices - with important data short after a disaster event. Generally, process execution has to cope with heterogeneous and distributed participant including backend server, sensor motes, software services, distributed databases, humans and so on.


Hierarchical System Architecture for Disaster Response and their Functionalities

An efficient disaster response poses new technical challenges for modern information systems: New heuristic scheduling and planning techniques are required that are rather suboptimal but faster than cost and time-intensive algorithms. Existing scheduling algorithms are mostly centralized thus failing in calculating a scalable assignment solution. Additionally, resources capabilities of devices as well as the humans capacity are limited to a certain degree requiring better heuristics for careful resource selection and assignment respectively. Finally, the underlying network topology may change during the execution time; (physical) nodes, i.e. humans as well as devices may fail leaving process activities uncompleted. Communication links may be temporary broken preventing the information/data flow from machines to machines, machines to humans or humans to humans. Hence, new concepts are required to guarantee a robust and correct execution of process activities with respect to transactional properties.

Problem Statement

The overall goal of my dissertation thesis is to support a robust execution of processes in such distributed, dynamic changing information system. In our understanding, robustness is defined as a system behavior that leads to a appropriate, self-organized reaction to certain unforeseen failure events. However, an appropriate reaction on a failure event is just one measure to achieve a fault-tolerant system. Preventive strategies are also required as complement to a reactive strategy to achieve robustness. In my thesis, we focus on following building blocks:

  • Designing a suitable graph-based process and system model for disaster events
  • Designing an efficient partitioning and assigning algorithm for process activities and process data
  • Designing algorithms to perform recovery for failed process activities and process data

Considering the mentioned research goals we are focusing on, none of the existing approaches, neither industrial nor academic, have addressed these challenges all at once.

Suggested Solutions

To address the research challenges described in the problem statement, we focus on following approaches:

  • Process and System Model: First, we need a suitable model that can be used to describe processes during a disaster management. Workflow concepts are well suited for scenarios where many distributed entities work collaboratively together to achieve a common goal. Workflows are conventionally seen as a collection of activities executed in a specific temporal/causal sequence order. In addition to the temporal and causality constraints, we also incorporate complex resource allocation constraints in our process model. Since we are rather interested on the physical execution of workflow instances, than on the analysis/verification on organizational level, we decide to use graph-based representations for workflows. In our model, we distinguish between a global workflow set and so-called local workflow partitions. The former contains multiple, possibly interconnected workflows whereas the latter determines the concurrent execution of partial workflows. The system itself is divided into several network groups (e.g. several rescue forces) which the local partitions are assigned to.
  • Distribution Issues: Given the process and system model, we are interested in a suitable distribution of workflow activities and data to workflow participants (e.g. server,sensor motes, distributed databases, people) as part of a preventive strategy. The goal is to organize efficiently the partitioning of activities and data to enhance the life span of the system including people and devices on the one hand and to support inter-process concurrency on the other hand. Several research issues arise regarding the distribution process: How should we partition a workflow set? How much should we partition? What is a correct decomposition? How should we allocate? What are the necessary information for partition and allocation? To answer these questions, we propose a multi-stage procedure divided into a logical (partition) step and a physical (allocation) step. We first identify good partitions by considering the given data flows within as well as among different workflows. Then, these partitions are assigned to each network group whereas the actual assignment of activities is conducted locally within each group by using constraint-solver in parallel.
Logical Partitioning and Physcial Allocation of Workflows
  • Recovery Issues: Changes can dynamically appear on the process level as well as on the system level. Focusing on the latter case, our goal is to design a unified model for atomicity and isolation applicable for disaster management applications. Thereby, we distinguish between different dynamic events that can trigger recovery. We classify failure events into communication failures and node failures. For each of these events, we aim to provide reactive strategies to re-schedule activities. Re-scheduling is only possible if there is an alternative execution path existing. So-called retriable activities are activities which eventually succeed even in the case of (temporary) node failure. We propose an recovery algorithm that finds a suitable alternative execution paths locally within a group or among the network groups (if a re-scheduling within the group is not possible). For this purpose, we use the concept of serialization graphs that cover the (observed) concurrent execution of partial processes. We aim to select the graph from a set of possible serialization graphs that is cost-optimal regarding the resource capabilities and other criteria.

Related Work

Publications

  • Artin Avanes, Johann-Christoph Freytag: Adaptive Workflow Scheduling Under Resource Allocation Constraints and Network Dynamics.
    Proc. of the 34th International Conference on Very Large Data Bases (VLDB),pages 1631-1637, Auckland, New Zealand, August 2008
  • Artin Avanes: An Adaptive Process and Data Infrastructure for Disaster Management.
    The 5th International Conference on Information Systems for Crisis Response and Management (ISCRAM), PhD Colloquium, Washington,D.C. USA, May 2008
  • Artin Avanes, Johann-Christoph Freytag, Christof Bornhoevd: Distributed Service Deployment in Mobile Ad-Hoc Networks.
    Proc. of the 4th IEEE International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous 2007), pages 1-8, Philadelphia, USA, August 2007.
  • Christof Bornhövd, Holger Ziekow, Artin Avanes: Service Composition and Deployment for a Smart Items Infrastructure.
    Proc. of the 14th International Conference on Cooperative Information Systems (CooPIS), pages 10-11, Montpellier, France, November 2006.
  • Timo Mika Gläßer, Markus Scheidgen, Artin Avanes: Self-Organizing Information Systems for Disaster Management.
    3. GI/ITG KuVS Fachgespräch "Ortsbezogene Anwendungen und Dienste", Berlin, Germany, September 2006
Personal tools
Language