Self-aware Adaptive Service Networks with Dependability Guarantees

Aus GRK-Wiki

Wechseln zu: Navigation, Suche

Advisor: Prof. Dr. Miroslaw Malek

Abstract

In the last decades metropolitan areas have been growing considerably. This trend is expected to stay for the medium term future. Being centers of societal life they provide critical infrastructure for a rising number of people. Disasters striking in these areas pose a significant risk for the development and growth of modern societies. The impact of any disaster -- be it of natural or terroristic origin -- would be severe. Thus the prevention of and preparation for disasters are important factors when planning projects in metropolitan areas. In case a disaster strikes, fast and safe mitigation of damages is important. Information and communication technology (ICT) plays a crucial role in helping reconnaissance and first response teams on disaster sites.

Every team brings its own network equipment and uses several IT services. The participating teams and the course of mitigation action are likely to be different on every site. However, several services (e.g.: infrastructure, communication) could be shared between teams, various are critical for operation. Despite being unique in every scenario, described service networks inhibit certain common properties. First of all, they arise spontaneously with administrative configuration only for single teams. Second, depending on the course of action the numbers of nodes and services are subject to high fluctuation. Third, the capabilities of participating nodes and the types of their services are also fluctuating strongly throughout the lifetime of the service network.

A single network for all participating nodes with the possibility to publish and discover services would be of great benefit in described disaster management scenarios. Because of the mentioned properties, it would need to automatically configure all the network layers involved. Not only is there no distinguished central administrative authority, it is impossible to configure in advance what arises ad-hoc. But given an ad-hoc, autoconfiguring service network, how can we guarantee dependability properties such as availability, performability and survivability for critical services? So far there exists no comprehensive evaluation for such a heterogenous and dynamic service network. The approach of this dissertation is to embed a run time cycle into the service network. In this cycle, the network is monitored and based on monitoring data, dependability properties are evaluated. If necessary, adaptation measures are triggered. This cycle is the base of a self-aware adaptive service network.

A central idea is to integrate a distributed monitoring layer into the network. Service discovery mechanisms basically provide availability monitoring. So it seems sensible to employ these mechanisms to monitor availability as the most basic service dependability property. But to provide reliable results for service evaluation, the monitoring inside the network also needs to give quality of service guarantees. In the last decade a number of promising technologies (e.g: Zeroconf, UPnP) have emerged that provide distributed autoconfiguring service environments based on IP networks. These technologies have been proven mostly in scenarios where dependability requirements are lax and not clearly defined. To benefit from their distributed autoconfiguration capabilities in disaster management scenarios and especially for monitoring purposes, it needs to be evaluated if and with which guarantees they can provide them. If they cannot, what improvements are needed to make reliable distributed availability monitoring possible? After these steps are taken the existing service discovery mechanisms can be extended to include monitoring for arbitrary parameters. There always is a trade off between monitoring reliability and the efficiency of the used network protocols. This trade off needs to be separately evaluated depending on the concrete expected scenario.

Once the service network is monitored, dependability evaluation at run time becomes possible. The problem remains however that it is yet unclear how exactly distributed service based applications can be evaluated at run time regarding availability, performability or survivability. These methods need to be developed. Depending on the evaluated service and dependability property, significant monitoring variable selection is important to improve accuracy. Based on the gathered monitoring data, fault diagnosis and failure prediction could be realized. Even if there would be no possibilities to automatically adapt the system to adverse changes as described in the next step, run time awareness of critical states is already a huge benefit. Depending on the outcome of the evaluation phase, adaptation measures could be triggered. These include measures at the service level, like retrying a service instance, rebinding to another instance or recomposing the application with different services. Other pro-active fault management strategies are possible like service placement to improve reliability or the graceful degradation of an application. What measures are feasible depends on cost and utility functions of to the current scenario.

Using this cycle of monitoring, evaluation and adaptation it will be possible to provide service networks with guaranteed dependability properties. Those properties are monitored and guaranteed by adaptation at run time. If in certain scenarios it is impossible to give guarantees that suffice the requirements, at least this will be known at run time. After the various problems described above have been addressed, the described approach will be shown both with theoretical models and in simulation.


Back to my main page.

Persönliche Werkzeuge
Sprache