Warning: Parameter 1 to Language::getMagic() expected to be a reference, value given in /usr/share/mediawiki/includes/StubObject.php on line 58
Scheduling Processes in Distributed and Resource-Constrained Systems – GRK-Wiki

Scheduling Processes in Distributed and Resource-Constrained Systems

Aus GRK-Wiki

Wechseln zu: Navigation, Suche

My research activities about Scheduling Processes In Distributed and Resource-Constrained Systems are related to the research field C (Distributed Information Systems) and subproject D1 of the Graduiertenkolleg Metrik.

Inhaltsverzeichnis

Thesis Topics in Short

  • processes and data in distributed information systems
  • resource allocation constraints
  • partitioning algorithm for multiple concurrent and data-dependent process graphs
  • parallelizing physical assignment of workflow partitions by running several constraint systems
  • hypergraph decomposition
  • flexible model for cooperative processes (unified model for atomicity and isolation)
  • efficient, resource-oriented re-scheduling protocol

People

Motivation

Modern information systems will be increasingly built on top of decentralized, highly dynamic and wireless networks, such as sensor networks or hand-held devices equipped with sensing and computational abilities. This fact has become even more evident with the growth of resource capacities of embedded devices allowing the execution of cooperative processes. Because of this development, recent research has been focused on building process-supporting systems to enable new kinds of applications, such as realtime business monitoring, efficient disaster recovery or wilderness exploration. However, the integration of resource-constrained devices poses new technical challenges for ensuring a robust process execution. This thesis presents a general-purpose framework for scheduling and executing cooperative processes in an efficient and failure-tolerant fashion.

New heuristic scheduling and planning techniques are required that are rather suboptimal but faster than cost and time-intensive algorithms. Existing scheduling algorithms are mostly centralized thus failing in calculating a scalable assignment solution. Additionally, resources capabilities of devices as well as the humans capacity are limited to a certain degree requiring better heuristics for careful resource selection and assignment respectively. Finally, the underlying network topology may change during the execution time; (physical) nodes, i.e. humans as well as devices may fail leaving process activities uncompleted. Communication links may be temporary broken preventing the information/data flow from machines to machines, machines to humans or humans to humans. Hence, new concepts are required to guarantee a robust and correct execution of process activities with respect to transactional properties.

Problem Statement

The overall goal of my dissertation thesis is to support a robust execution of processes in such distributed, dynamic changing information system. In our understanding, robustness is defined as a system behavior that leads to a appropriate, self-organized reaction to certain unforeseen failure events. However, an appropriate reaction on a failure event is just one measure to achieve a fault-tolerant system. Preventive strategies are also required as complement to a reactive strategy to achieve robustness. In the thesis, we focus on following building blocks:

  • Designing a suitable graph-based process and system model
  • Designing an efficient partitioning and assigning algorithm for process activities and process data
  • Analyzing the complexity of the distributed/parallel scheduling algorithm based on hypergraphs
  • Designing algorithms to perform a resource-oriented and transactional re-scheduling of failed/conflicting process activities

Considering the mentioned research goals we are focusing on, none of the existing approaches, neither industrial nor academic, have addressed these challenges in an integrated and complete fashion.

Suggested Solutions

To address the research challenges described in the problem statement, we focus on following approaches:

  • Process and System Model: First, we need a suitable model that can be used to describe processes and the underlying execution system. We model a process as a directed, acyclic graph and present the execution system as a strong connected graph which consists of multiple components. Subsequently, we define the scheduling problem as a mapping of a process graph to a strong connected system graph where each component represents a cluster. In addition to the temporal/causality constraints expressed by control and dataflow edges in the process graph, we also incorporate complex resource allocation constraints in our process model. In our model, we distinguish between a global process set and local process partitions. The former contains multiple, possibly interconnected process instances whereas the latter determines the concurrent execution of of partial processes, i.e. it executes a subset of activities given in the process set. The system itself is divided into several network clusters which the local partitions are assigned to.
Parallel Process Scheduling as a Graph Mapping Problem
  • Distribution Issues: Given the process and system model, we are interested in a scalable and efficient distribution of process activities to distributed resources (e.g. server,sensor motes, distributed databases, people) as part of a preventive strategy. Several research issues arise regarding the distribution process: How should we partition a process set? How much should we partition? What is a correct decomposition? How should we allocate? What are the necessary information for partitioning and allocation? To answer these questions, we propose a multi-stage procedure divided into a logical partitioning step and a parallel physical allocation step. We first identify good partitions by considering the given data flows within as well as among different process graphs. Then, these partitions are assigned to each network clluster whereas the actual assignment of activities is conducted locally within each cluster by using constraint-solving techniques in parallel fashion.


Process Partitioning Using Data Links


  • Recovery Issues: Changes can dynamically appear on the process (instance) level as well as on the system level. Focusing on physical node failures, that may appear at any time, the goal is to develop a re-scheduling algorithm that considers transactional properties, limited resource capabilities as well as dynamic topology changes in an integrated fashion. The majority of re-scheduling or failure handling mechansim just considers a subset of these three properties. So, in summary, the re-scheduling protocol should be (1) flexibile - the process designer should be able to determine how far re-scheduling will be conducted, (2) resource-oriented - limited resource capabilities should be considered at run-time, (3) efficient - the distributed scheduling component should be leveraged to achieve a scalable and fast re-deployment.


Re-Scheduling Algorithm - Overview

Related Work

Publications

  • Artin Avanes, Johann-Christoph Freytag: Flexible Failure Handling for Cooperative Processes in Distributed Systems.
    Proc. of the 5th IEEE Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), Washington, D.C., USA, November 2009
  • Artin Avanes, Johann-Christoph Freytag: Adaptive Workflow Scheduling Under Resource Allocation Constraints and Network Dynamics.
    Proc. of the 34th International Conference on Very Large Data Bases (VLDB),pages 1631-1637, Auckland, New Zealand, August 2008
  • Artin Avanes: An Adaptive Process and Data Infrastructure for Disaster Management.
    The 5th International Conference on Information Systems for Crisis Response and Management (ISCRAM), Washington,D.C. USA, May 2008
  • Artin Avanes, Johann-Christoph Freytag, Christof Bornhoevd: Distributed Service Deployment in Mobile Ad-Hoc Networks.
    Proc. of the 4th IEEE International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous 2007), pages 1-8, Philadelphia, USA, August 2007.
  • Christof Bornhövd, Holger Ziekow, Artin Avanes: Service Composition and Deployment for a Smart Items Infrastructure.
    Proc. of the 14th International Conference on Cooperative Information Systems (CooPIS), pages 10-11, Montpellier, France, November 2006.
  • Timo Mika Gläßer, Markus Scheidgen, Artin Avanes: Self-Organizing Information Systems for Disaster Management.
    3. GI/ITG KuVS Fachgespräch "Ortsbezogene Anwendungen und Dienste", Berlin, Germany, September 2006
Persönliche Werkzeuge
Sprache