Netzwerk fÃ¼r genÃ¼gend lange Zeit gegenseitig auf zuverlÃ¤ssige SchÃ¤tz- werte ihrer ...... 1Mica 2, Texas Instruments CC1000, focus.ti.com/lit/ds/sy...

2 downloads 15 Views 2MB Size

Synchronization and Symmetry Breaking in Distributed Systems

Diss. ETH No. 19459 Diss. TIK No. 121

Hartung-Gorre Verlag Konstanz 2011

Reprint of Diss. ETH No. 19459

Series in Distributed Computing Volume 14 edited by Roger Wattenhofer

examiner co-examiner co-examiner

Roger Wattenhofer Danny Dolev Berthold Vöcking

Bibliographic information published by Die Deutsche Nationalbibliothek Die Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the internet at http://dnb.d-nb.de.

Copyright © 2011 by Christoph Lenzen First Edition 2011 Hartung-Gorre Verlag Konstanz

ISSN 1861-1591 ISBN 3-86628-390-3 and 978-3-86628-390-9

Abstract An emerging characteristic of modern computer systems is that it is becoming ever more frequent that the amount of communication involved in a solution to a given problem is the determining cost factor. In other words, the convenient abstraction of a random access memory machine performing sequential operations does not adequately reflect reality anymore. Rather, a multitude of spatially separated agents cooperates in solving a problem, where at any time each individual agent has a limited view of the entire system’s state. As a result, coordinating these agents’ efforts in a way making best possible use of the system’s resources becomes a fascinating and challenging task. This dissertation treats of several such coordination problems arising in distributed systems. In the clock synchronization problem, devices carry clocks whose times should agree to the best possible degree. As these clocks are not perfect, the devices need to perpetually resynchronize by exchanging messages repeatedly. We consider two different varieties of this problem. First, we examine the problem in sensor networks, where for the purpose of energy conservation it is mandatory to reduce communication to a minimum. We give an algorithm that achieves an asymptotically optimal maximal clock difference throughout the network using a minimal number of transmissions. Subsequently, we explore a worst-case model allowing for arbitrary network dynamics, i.e., network links may fail and (re)appear at arbitrary times. For this model, we devise an algorithm achieving an asymptotically optimal gradient property. That is, if two devices in a larger network have access to precise estimates of each other’s clock values, their clock difference is much smaller than the maximal one. Naturally, this property can only hold for devices that had such estimates for a sufficiently long period of time. We prove that the time span necessary for our algorithm to fully establish the gradient property when better estimates become available is also asymptotically optimal. Many load balancing tasks can be abstracted as distributing n balls as evenly as possible into n bins. In a distributed setting, we assume the balls and bins to act as independent entities that seek to coordinate at a minimal communication complexity. We show that under this constraint, a natural class of algorithms requires a small, but non-constant number of communication rounds to achieve a constant maximum bin load. We complement the respective bounds by demonstrating that if any of the preconditions of the lower bound is dropped, a constant-time solution is possible. Finally, we consider two basic combinatorial structures, maximal independent sets and dominating sets. A maximal independent set is a subset of the agents containing no pair of agents that can communicate directly, while there is no agent that can be added to the set without destroying this property. A dominating set is a subset of the agents that—as a whole—can contact all agents by direct communication. For several families of graphs, we shed new light on the distributed complexity of computing dominating sets of approximatively minimal size or maximal independent sets, respectively.

Zusammenfassung Moderne Computersysteme zeichnen sich in zunehmendem Maße dadurch aus, dass das Kommunikationsvolumen den bestimmenden Kostenfaktor bei der L¨ osung eines gegebenen Problems darstellt. In der Folge wird die klassische Abstraktion einer random access machine, die sequentielle Operationen ausf¨ uhrt, der Realit¨ at heutigen Rechnens nicht mehr gerecht. Vielmehr wird die L¨ osung durch eine Vielzahl interagierender Systemkomponenten bestimmt, die f¨ ur sich genommen zu keiner Zeit Zugriff auf den Gesamtzustand des Systems haben. Vor diesem Hintergrund erweist es sich als ebenso fordernde wie fesselnde Aufgabe, die einzelnen Teile des Systems derart zu koordinieren, dass eine optimale Nutzung der verf¨ ugbaren Resourcen erreicht wird. In dieser Dissertation behandeln wir verschiedene Koordinationsprobleme, die in verteilten Systemen auftreten. Uhrensynchronisation ist eine Aufgabe, die sich in verteilten Systemen daraus ergibt, dass die lokalen Uhren einzelner Komponenten nicht exakt gleich schnell laufen. Wir behandeln zwei Spielarten dieses Themas. Zun¨ achst untersuchen wir Sensornetzwerke, in denen begrenzte Energiereserven es erfordern, den Funkverkehr auf ein Minimum zu beschr¨ anken. Wir beschreiben einen Algorithmus, der unter diesen Bedingungen die maximale Uhrendifferenz im System asymptotisch minimiert. Anschliessend diskutieren wir ein worst-case Modell, in dem das Netzwerk sich beliebig a ¨ndert, das heißt Verbindungen zu beliebigen Zeiten ausfallen und aufgebaut werden k¨ onnen. Wir pr¨ asentieren einen Algorithmus mit optimaler Gradienteneigenschaft. Dies bedeutet, dass wann immer zwei Teilnehmer in einem gr¨ osseren Netzwerk f¨ ur gen¨ ugend lange Zeit gegenseitig auf zuverl¨ assige Sch¨ atzwerte ihrer Uhrenwerte zugreifen k¨ onnen, die Differenz ihrer Uhrenwerte deutlich kleiner als die maximale im System ist. Unser Algorithmus minimiert asymptotisch die Zeitspanne, die eine Verbindung existieren muss, bis sie der Gradienteneigenschaft gen¨ ugt. In vielen F¨ allen k¨ onnen Lastverteilungsaufgaben durch ein abstraktes Modell beschrieben werden, in dem n B¨ alle n Urnen zugeordnet werden. In einem verteilten System nimmt man dabei an, dass sowohl B¨ alle als auch Urnen eigenst¨ andig operieren. Ziel ist, bei minimaler Kommunikation die maximale Anzahl B¨ alle in einer Urne konstant zu beschr¨ anken. Wir werden zeigen, dass f¨ ur eine nat¨ urliche Klasse von Algorithmen die daf¨ ur n¨ otige Anzahl von Kommunikationsrunden zwar langsam wachsend, jedoch nicht unabh¨ angig von n ist. Wir erg¨ anzen dieses Ergebnis durch den Nachweis, dass Fallenlassen einer beliebigen Voraussetzung der entsprechenden unteren Schranke eine L¨ osung des Problems in konstant vielen Kommunikationsrunden erm¨ oglicht. Schliesslich untersuchen wir zwei grundlegende kombinatorische Strukturen. Eine maximale stabile Menge ist eine nicht vergr¨ osserbare Teilmenge der Komponenten, so dass kein Paar aus dieser Menge direkt kommunizieren kann. Ein dominierende Menge ist eine Teilmenge der Komponenten, die zusammengenommen das gesamte System direkt kontaktieren kann. Wir zeigen f¨ ur verschiedene Graphfamilien Komplexit¨ atsschranken f¨ ur die Berechnung von maximalen stabilen Mengen beziehungsweise kleinen dominierenden Mengen.

Acknowledements First of all, I would like to express my gratitude to my advisor Roger Wattenhofer. Not only did he guide me during the three years of my graduate studies, also he always tried to understand me even when I clearly was not making any sense. Thanks go to my co-examiners Berthold V¨ ocking and Danny Dolev for reviewing this thesis and giving valuable feedback. I am indebted to my co-authors Fabian Kuhn, Thomas Locher, Rotem Ohsman, Yvonne-Anne Pignolet, Philipp Sommer, and Jukka Suomela, who contributed in many ways to this work. In addition, Fabian, Thomas, Yvonne-Anne, and Philipp carefully proofread large parts of this thesis and suggested numerous improvements. I am grateful to all current and former collaborators from the Distributed Computing Group for making my stay a pleasant and joyful experience. In particular, I take my hat off to my office mates Raphael Eidenbenz and Tobias Langner (and for reasons named elsewhere also to Yvonne-Anne Pignolet) who created a good working atmosphere while enduring my unconditional monologues. For varying reasons, special thanks go to Keren Censor, Sebastian Daum, Michael Kuhn, Tobias Langner, Topi Musto, Christian Scheideler, Ulrich Schmid, Reto Sp¨ ohel, and Jukka Suomela. Finally, I am beholden to all the people that are not named here explicitly, but supported me and this work during my time as Ph.D. student.

Contents

1 Introduction

1

2 Preliminaries 2.1 Basic Model and Notation . . . . . . . . . . . . . . . . . . . . 2.2 Standard Definitions and Tools . . . . . . . . . . . . . . . . .

3 3 5

I

Clock Synchronization

11

3 Introduction to Clock Synchronization 4 Synchronization in Wireless Networks 4.1 Model . . . . . . . . . . . . . . . . . . 4.2 Overview . . . . . . . . . . . . . . . . 4.3 Lower Bound . . . . . . . . . . . . . . 4.4 PulseSync . . . . . . . . . . . . . . . . 4.5 Analysis . . . . . . . . . . . . . . . . . 4.6 Concluding Remarks . . . . . . . . . .

13

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

17 17 22 26 28 31 36

5 Gradient Clock Synchronization 5.1 Model . . . . . . . . . . . . . . . . . . . . . . . 5.2 Overview . . . . . . . . . . . . . . . . . . . . . 5.3 Bounding the Global Skew . . . . . . . . . . . . 5.4 An Algorithm with Optimal Gradient Property 5.5 Analysis of Algorithm Aµ . . . . . . . . . . . . 5.6 Discussion . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

39 40 43 45 50 52 67

. . . . . .

. . . . . .

. . . . . .

. . . . . .

II

Load Balancing

73

6 Introduction to Load Balancing 6.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . .

75 76 77

7 Lower Bound on Symmetric Algorithms 7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Proof of the Lower Bound . . . . . . . . . . . . . . . . . . . .

83 83 87

8 Balls-into-Bins Algorithms 8.1 Optimal Symmetric Algorithm . . . . . . 8.2 Optimal Asymmetric Algorithm . . . . . . 8.3 Symmetric Solution Using ω(n) Messages 8.4 An Application . . . . . . . . . . . . . . .

III

. . . .

99 . 99 . 105 . 110 . 111

Graph Problems in Restricted Families of Graphs

115

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

9 Introduction to Graph Problems 117 9.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 9.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 10 MIS on Trees 127 10.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 10.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 11 An MDS Approximation Lower Bound 141 11.1 Definitions and Preliminary Statements . . . . . . . . . . . . 141 11.2 Proof of the Lower Bound . . . . . . . . . . . . . . . . . . . . 143 12 MDS in Graphs of Bounded Arboricity 147 12.1 Constant-Factor Approximation . . . . . . . . . . . . . . . . . 147 12.2 Uniform Deterministic Algorithm . . . . . . . . . . . . . . . . 150 13 MDS in Planar Graphs 157 13.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 13.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

14 Conclusions

167

Chapter 1

Introduction “Can’t you use shorter sentences?” – My girlfriend after reading some random lines from this thesis.

In large parts, the invention of electronic computing has shaped—and still is shaping—our modern society. Traditionally, distributed computing contributed to this process in areas like fault-tolerant computing, sensor networks, and the Internet. Incessantly, it has been of growing importance for day-to-day technology. Nowadays, this is still true. For one thing, computational power has become incredibly cheap. Today, even the most simple “mobile phone” is in fact a portable computer, faster than processors employed in supercomputers about three decades ago [20]. Arguably, advances in software developing and testing, programming languages, and last but not least basic algorithms had an even greater impact on the capabilities of current standard devices. Considering that these devices get more and more interconnected, be it via the Internet or direct wireless communication, all ingredients of a powerful distributed system are present. This opens the door to a multitude of applications, ranging e.g. over social networking, exchanging and evaluating data, environmental monitoring, and controlling other devices. It is less noticeable, but maybe even more dramatic, that we are hitting physical barriers preventing to amass ever more sequential computing power in a single processor. It becomes increasingly difficult to miniaturize chip components further, making it harder and harder to maintain the illusion of a monolithic system operating in synchronous steps. This motivates hardware vendors, eager to maintain Moore’s law, to switch to an exponentially growing number of cores. It is important to understand that this constitutes

2

CHAPTER 1. INTRODUCTION

a fundamental change. In a sequential system, the necessary effort to solve a given task can be concisely expressed in terms of the number of required computational steps. Even in a distributed system where all nodes (i.e., participants) have the same capabilities, one cannot simply divide this measure by the number of these nodes to understand the distributed complexity of a problem. Some tasks are inherently sequential and simply cannot be parallelized. In case parallelism is indeed possible, communication becomes an essential part of the process. This communication serves to exchange intermediate results, but also to establish coordination among the nodes in the network. Coordination can be achieved in various ways. Obviously, one distinguished node could manage the system by commanding the others. However, this trivial approach has considerable drawbacks. On the one hand, it requires to collect and process all relevant information in a single spot. This might without need reduce the amount of concurrency achieved, as merely one node does the respective computations. Sometimes, it is actually outright impossible, because no single device has sufficient capabilities. On the other hand, a centralized authority is a single point of failure, throwing away the possibility to perform a given task despite a minority of the individual components failing. One possible alternative is that each node collects information from other nodes which are “close” in the sense that they can be contacted quickly, and acts according to a scheme avoiding conflicting actions. Depending on the task to solve, such local algorithms can be surprisingly efficient. For many of these algorithms, it is imperative to first break symmetry in order to avoid conflicting or redundant actions, which otherwise would thwart progress or waste resources. Moreover, typically nodes need to synchronize their actions. This can be done explicitly by message exchange, or implicitly by means of timing information. In this thesis, we will investigate a number of such basic distributed coordination tasks. We will present and analyze primitives for clock synchronization (Part I), randomized load balancing (Part II), and graph problems on restricted families of graphs (Part III). Our main goal is to extend the knowledge on the fundamental limits to the degree of concurrency and efficiency at which these problems can be solved. We aim at a mathematically rigid assessment of the distributed complexity of our algorithms as well as the amount of resources that must be used by any algorithm for the respective task. This demands abstract models, which however still must capture the crucial properties of the considered system. We hope to have succeeded in the tightrope walk between oversimplification and getting lost in details, obtaining clear theoretical statements that are meaningful in practice.

Chapter 2

Preliminaries

“Distributed computing? Shouldn’t be that different from ordinary computing, right?” – Synopsis of my knowledge on distributed computing at the time when I began my graduate studies.

In this chapter, we summarize basic notation and some well-known results we will rely on. We will not prove the given statements; the goal of this chapter is to provide a reference in order to avoid lack of clarity in subsequent chapters. Consequently, the reader is encouraged to quickly review the notation, skip the lemmas and theorems, and come back to this chapter if required later on.

2.1

Basic Model and Notation

By N we denote the set of natural numbers and by N0 := N ∪ {0} the natural numbers together with 0. Similarly, R denotes the Reals, R+ := {x ∈ R | x > 0} the strictly positive Reals, and R+ 0 := {x ∈ R | x ≥ 0} the non-negative Reals. We will use Landau notation with respect to the asymptotics towards +∞, i.e., according to the following definitions. Definition 2.1 (Landau Symbols). Given f, g : A → R+ 0 , where A ⊆ R with sup A = ∞, we define

4

CHAPTER 2. PRELIMINARIES

f ∈ O(g)

⇔

f ∈ o(g)

⇔

∃C1 , C2 ∈ R+ 0 ∀x ∈ A : f (x) ≤ C1 g(x) + C2 f (x) =0 lim sup g(x) x∈A C∈R+ 0 C→∞ x≥C

f ∈ Ω(g)

⇔

f ∈ ω(g)

⇔

∃C1 , C2 ∈ R+ 0 ∀x ∈ A : C1 f (x) + C2 ≥ g(x) g(x) =0 lim sup f (x) x∈A C∈R+ 0 C→∞ x≥C

f ∈ Θ(g)

⇔

f ∈ O(g) ∩ Ω(g).

Definition 2.2 (Logarithms and Polylogarithmic Bounds). For x ∈ R+ , by log x and ln x we denote the logarithms to base 2 and e, respectively, where e = limx→∞ (1 + 1/x)x is Euler’s number. We define log(i) x (for feasible values of x) to be the i ∈ N times iterated logarithm, whereas logr x := (log x)r for any r ∈ R+ . We say that the function f (x) ∈ polylog x if f (x) ∈ O(logC x) for a constant C ∈ R+ . Definition 2.3 (Tetration and log∗ ). For k ∈ N and b ∈ R+ , the kth tetration of b is given by ) .b k

b := bb

..

k times.

For x ∈ R+ , we define log∗ x recursively by ( 1 + log∗ log x ∗ log x := 0 In particular, log∗ k 2 = k for all k ∈ N.

if x > 1 otherwise.

Throughout this thesis, we will describe distributed systems according to the standard message passing model. The network will be modelled by a simple graph G = (V, E), where V is the set of nodes and {v, w} ∈ E means that v and w share a bidirectional communication link. We will employ the following basic notation. Definition 2.4 (Paths, Distances, and Diameter). Given a graph G = (V, E), a path of length k ∈ N is a sequence of nodes (v0 , . . . , vk ) ∈ V such that {vi , vi−1 } ∈ E for all i ∈ {1, . . . , k}. The distance d(v, w) of two nodes v, w ∈ V is the length of a shortest path between v and w. The diameter D of G is the maximum distance between any two nodes in the graph. Definition 2.5 (Neighborhoods). Given the graph G = (V, E), we define • the (exclusive) neighborhood Nv := {w ∈ V | {v, w} ∈ E} of node v ∈V,

2.2. STANDARD DEFINITIONS AND TOOLS

5

• the degree δv := |Nv | of v ∈ V , • the maximum degree ∆ := maxv∈V {δv }, (k)

• for k ∈ N the (inclusive) k-neighborhood Nv k} of v ∈ V ,

:= {w ∈ V | d(v, w) ≤

• and the (inclusive) neighborhood NA+ :=

Nv

S

v∈A

(1)

of set A ⊆ V .

To facilitate intuition, we will denote the inclusive 1-neighborhood of node (1) v ∈ V by Nv+ := Nv . In bounded-delay networks (which are considered in Part I of this thesis), nodes react to events, which are triggered by receiving messages or reaching a (previously defined) value on a local clock. When an event is triggered, a node may perform local computations, send messages that will be received within bounded time, and define future local times at which events will be triggered locally. These actions take no time; in case two events are triggered at a node at the same time, they are ordered arbitrarily and processed sequentially. We will use events triggered by the local clock implicitly, as we employ a highlevel description of our algorithms. We point out, however, that one can translate all algorithms into this framework. The state of each node is thus a function of the real time t ∈ R+ 0 . If at time t the state of a variable (function, etc.) x changes instantaneously, we define x(t) to be the value after this change has been applied. In contrast, in Parts II and III of our exposition we employ a synchronous model, where computation advances in rounds. In each round, nodes send messages, receive messages sent by their neighbors, and perform local computations. The state of a node thus becomes a function of the current round r ∈ N. Despite aiming for simple algorithms, we do not impose any constraints on nodes’ memory and the local computations they may perform. Note, however, that one should avoid techniques like e.g. collecting the whole topology of neighborhood up to a certain distance and subsequently solve NP-hard problems locally.

2.2

Standard Definitions and Tools

Probabilistic Tools All random variables in this thesis will be real-valued, hence we will not repeat this in our statements. We denote the expectation and variance of random variable X by E[X] and Var[X], respectively.

6

CHAPTER 2. PRELIMINARIES

Theorem 2.6 (Markov’s Bound). For any random variable X and any C ∈ R+ , it holds that E[X] . P [|X| ≥ C] ≤ C When deriving probabilistic bounds, we will strive for results that are not certain, but almost guaranteed to hold. Definition 2.7 (With High Probability). A stochastic event E(c), where c ∈ R+ is arbitrary, is said to occur with high probability (w.h.p.), if P [E(c)] ≥ 1 − 1/nc . Throughout this thesis, we will use c with this meaning only and will therefore not define it again. When it comes to Landau notation, c is treated as a constant, e.g. the values C1 and C2 from the definition of O(·) may depend on c. The advantage of this definition lies in its transitivity, as for instance the statements “Each node completes phase i of the algorithm in O(log n) rounds w.h.p.”, where i ∈ {1, . . . , O(log n)}, imply the statement “All phases complete in O(log2 n) rounds w.h.p.” Formally, the following lemma holds. Lemma 2.8. Assume that events Ei (c), i ∈ {1, . . . , N }, occur V w.h.p., where N ≤ nC for some constant C ∈ R+ . Then event E(c) := N c) occurs i=1 Ei (˜ w.h.p., where c˜ := c + C. Proof. The Ei occur w.h.p., so for any value c ∈ R+ we may choose c˜ := c c+C ∈ R+ and have P [Ei (˜ c)] ≥ 1−1/nc˜ ≥ 1−1/(N PNn ) for all i ∈ {1, . c. . , N }. By the union bound this implies P [E(c)] ≥ 1 − i=1 P [Ei ] ≥ 1 − 1/n . We will not invoke this lemma explicitly in our proofs. The purpose of this statement rather is to demonstrate that any number of asymptotic statements holding w.h.p. that is polynomial in n is also jointly true w.h.p., regardless of dependencies. With this in mind, we will make frequent implicit use of this lemma. Definition 2.9 (Uniformity and Independence). A discrete random variable is called uniform, if all its possible outcomes are equally likely. Two random variables X1 and X2 are independent, if P [X1 = x1 ] = P [X1 = x1 |X2 = x2 ] for any two x1 , x2 ∈ R (and vice versa). A set {X1 , . . . , XN } of random variables is independent if, for all i ∈ {1, . . . , N }, Xi is independent from (X1 , . . . , Xi−1 , Xi+1 , . . . , XN ), i.e., the tuple listing the outcomes of all Xj 6= Xi . The set {X1 , . . . , XN } is uniformly and independently random (u.i.r.) if it is independent and consists of uniform random variables. Two sets of random variables X = {X1 , . . . , XN } and Y = {Y1 , . . . , YM } are independent of each other if all Xi ∈ X are independent from (Y1 , . . . , YM ) and all Yj ∈ Y are independent from (X1 , . . . , XN ).

2.2. STANDARD DEFINITIONS AND TOOLS

7

Frequently w.h.p. results are deduced from Chernoff bounds, which provide exponential probability bounds regarding sums of Bernoulli variables (which are either one or zero). Common formulations assume independence of these variables, but the following more general condition is sufficient. Definition 2.10 (Negative Association). The set of random variables Xi , i ∈ {1, . . . , N }, is negatively associated if and only if for all disjoint subsets I, J ⊆ {1, . . . , N } and all functions f : R|I| → R and g : R|J| → R that are either increasing in all components or decreasing in all components we have E[f ((Xi )i∈I ) · g((Xj )j∈J )] ≤ E[f ((Xi )i∈I )] · E[g((Xj )j∈J )]. Note that independence trivially implies negative association, but not vice versa. Theorem 2.11 (Chernoff’s Bound: Upper Tail). P Given negatively associated + Bernoulli variables X1 , . . . , XN , define X := N i=1 Xi . Then for any δ ∈ R , we have that E[X] eδ . P [X > (1 + δ)E[X]] < (1 + δ)1+δ Theorem 2.12 (Chernoff’s Bound: Lower Tail). Given negatively associated PN Bernoulli variables X1 , . . . , XN , define X := i=1 Xi . Then for any δ ∈ (0, 1], it holds that P [X < (1 − δ)E[X]] <

e−δ (1 − δ)1−δ

E[X] .

Corollary 2.13. For negatively associated Bernoulli variables X1 , . . . , XN , P define X := N X i . Then i=1 p (i) X ∈ E[X] + O log n + E[X] log n w.h.p. (ii) E[X] ∈ O(1) ⇒ X ∈ O (iii) E[X] ∈ O

√ 1 log n

log n log log n

⇒X∈O

q

w.h.p. log n log log n

w.h.p.

(iv) P [X = 0] ≤ e−E[X]/2 (v) E[X] ≥ 8c log n ⇒ X ∈ Θ(E[X]) w.h.p. (vi) E[X] ∈ ω(log n) ⇒ X ∈ (1 ± o(1))E[X] w.h.p. We need a means to show that random variables are negatively associated.

8

CHAPTER 2. PRELIMINARIES

Lemma 2.14. (i) If X1 , . . . , XN are Bernoulli variables satisfying {X1 , . . . , XN } is negatively associated.

PN

i=1

Xi = 1, then

(ii) Assume that X and Y are negatively associated sets of random variables, and that X and Y are mutually independent. Then X ∪ Y is negatively associated. (iii) Suppose {X1 , . . . , XN } is negatively associated. Given I1 , . . . , Ik ⊆ {1, . . . , N }, k ∈ N, and functions hj : R|Ij | → R, j ∈ {1, . . . , k}, that are either all increasing or all decreasing, define Yj := hj ((Xi )i∈Ij ). Then {Y1 , . . . , Yk } is negatively associated. This lemma and Corollary 2.13 imply strong bounds on the outcome of the well-known balls-into-bins experiment. Lemma 2.15. Consider the random experiment of throwing M balls u.i.r. into N bins. Denote by Y k = Yik i∈{1,...,N } the set of Bernoulli variables being 1 if and only if at least (at most) k ∈ N0 balls end up in bin i ∈ {1, . . . , N }. Then, for any k, Y k is negatively associated. The following special case will prove to be helpful. Corollary 2.16. Throw M ≤ N ln N/(2 ln ln n) balls u.i.r. into N bins. Then (1 ± o(1))N e−M/N bins remain empty w.h.p. Another inequality that yields exponentially falling probability bounds is typically referred to as Azuma’s inequality. Theorem 2.17 (Azuma’s Inequality). Assume that X is a random variable that is a function of independent random variables X1 , . . . , XN . Assume that changing the value of a single Xi for some i ∈ {1, . . . , N } changes the outcome of X by at most δi ∈ R+ . Then for any t ∈ R+ 0 we have P [|X − E[X]| > t]

≤

2 δ2 i=1 i

t − PN

2e

2

.

Normally Distributed Random Variables Definition 2.18 (Normal Distribution). The random variable X is normally distributed if its density function is the bell curve (x−E[X])2 1 − f (x) = p e 2 Var[X] . 2π Var[X]

Sums of normally distributed variables are again normally distributed.

2.2. STANDARD DEFINITIONS AND TOOLS

9

Lemma 2.19. Given PN normally distributed random variables X1 , . . . , XN , distributed with expectation E[X] = their sum X := i=1 Xi is normally PN PN E[X ] and variance Var[X] = i i=1 i=1 Var[Xi ]. For our purposes, normally distributed random variables exhibit a very convenient behaviour. Lemma 2.20. For any given normally distributed random variable X, we have that i h p P |X − E[X]| > Var[X] ∈ Ω(1), i.e., the probability to deviate by more than one standard deviation is constant, whereas i h p 2 P |X − E[X]| ≤ δ Var[X] ∈ 1 − e−Ω(δ log δ) for any δ ∈ R+ .

Simple Linear Regression Definition 2.21 (Simple Linear Regression). Given data points (xi , yi ), i ∈ {1, . . . , N }, such that not all xi are the same, their linear regression is the line fˆ(x) = sˆx + tˆ, where sˆ, tˆ ∈ R are minimizing the expression N X

fˆ(xi ) − yi

2

.

i=1

Denoting by · the average of the respective values ·i , i ∈ {1, . . . , N }, we have sˆ

=

tˆ =

xy − x ¯y¯ x2 − x2 y − sˆ x.

Using linear regression on a set of measurements of a linear relation that is inflicted with errors, one can significantly reduce the overall deviation of the estimated line from the true one. Theorem 2.22. Assume that we are given a set of measurements (xi , yˆi ) of data points (xi , yi ), i ∈ {1, . . . , N }, obeying the relation f (xi ) = yi , where f (x) = sx + t. Furthermore, assume that yˆi = yi + Xi , where the Xi are identically and independently normally distributed random variables with expectation µ and variance σ 2 . Denote by fˆ(x) = sˆx + tˆ the linear regression of the data set {(xi , yˆi )}i∈{1,...,N } . Then we have that

10

CHAPTER 2. PRELIMINARIES

(i) sˆ is normally distributed with E[ˆ s] = s and Var[ˆ s] = σ 2 /

PN

¯) i=1 (xi − x

2

,

(ii) fˆ(¯ x) − f (¯ x) is normally distributed with mean µ and variance σ 2 /N .

Miscellaneous In Chapter 5 we will exploit the fact that the maximum of functions which increase at a bounded rate does not grow faster than the maximum of the respective bounds. Theorem 2.23. Suppose f1 , . . . , fk : T → R are functions that are differentiable at all but countably many points, where T ⊆ R. Then f := max{f1 , . . . , fk } : T → R is differentiable at all but countably many points, and it holds for all t ∈ T for which all involved derivatives exist that d d f (t) = max fi (t) . i∈{1,...,k} dt dt fi (t)=f (t)

In Chapter 13 we will need the following basic statements about planar graphs. Lemma 2.24. A minor of a planar graph is planar. A planar graph of n ≥ 3 nodes has at most 3n − 6 edges. A basic combinatorial structure that will be briefly mentioned in Part III is a node coloring. Definition 2.25 (Node Coloring). A node coloring with k ∈ N colors is a mapping C : V → {1, . . . , k} such that no two neighbors have the same color, i.e., {v, w} ∈ E ⇒ C(v) 6= C(w).

Part I

Clock Synchronization

Chapter 3

An Introduction to Clock Synchronization “I believed the topic was dead.” – Christian Scheideler’s opening to a question concerning a talk about clock synchronization.

In distributed systems, many tasks rely on—or are simplified by—a common notion of time throughout the system. Globally coherent local times allow for implicit synchronization of the actions of distant devices [12] or chronological ordering of events occurring at distinct nodes. If time is not abstract, but to be understood in the physical sense as provided by e.g. a watch or a system clock, this clears the path for numerous further applications. For instance, the precision up to which an acoustic event can be located by a group of adjacent sensor nodes crucially depends on the exact times when the sensors detect its sound waves. This distinction between “abstract” and “physical” time is decisive. The goal of synchronizing a distributed system to the degree that nodes have access to a common round counter is addressed by so-called synchronizers [3] and ordering events within the system has been—by and large—understood already in the early days of distributed computing [57]. Having a physically meaningful clock is more demanding in that it requires not only consistent clock values throughout the distributed system, but also clearly defined progress speeds of clocks. This is for instance important when a trajectory is to be (re)constructed out of sensor readings: If clock speeds are arbitrary, the velocity of the observed target cannot be determined accurately. Putting it simply, a second should last about a second, not between zero and ten seconds. If one does not care about the progress speed of clocks, clock skew,

14

CHAPTER 3. INTRODUCTION TO CLOCK SYNCHRONIZATION

i.e., difference between clock values, can easily be kept small, as one can slow down clocks until stragglers catch up whenever necessary. But what makes it difficult to prevent that clock skews arise if clocks must make progress? To begin with, there is a wide range of scenarios in which it is infeasible that all participants of the system directly access a sufficiently precise source of timing information. Sensor nodes, for instance, can be equipped with GPS receivers, but this might be prohibitively expensive in terms of energy consumption or the network could be indoors. Giving another example, signal propagation speed on computer chips depends on many uncontrollable factors like (local) temperature, variations in quality of components, or fluctuations in supply voltage. Thus, a canonical approach is to equip the participants of the system with their own clocks, which however will exhibit different and varying clock drifts for very much the same reasons. Depending on the desired quality of synchronization, it may take more or less time until the clock skew that builds up over time becomes critical. In any case, eventually the devices must communicate in order to adjust their clocks. At this point another obstacle comes into play: the time it takes to transmit a message and process it at the target node can neither be predicted nor be measured precisely. Even if this would be the case, this obstacle could not be overcome completely. Within the time it takes to communicate a value, the clock value of the sender increases by an amount that cannot be determined by the receiver exactly. Thus, nodes suffer from uncertainty about neighbors’ clock values, and even more so about clock values of remote nodes. In this thesis, we examine two different models of clock synchronization. The first one is tailored to represent the main characteristics of wireless sensor networks with regard to the clock synchronization problem. In this context, we assume the system to behave comparatively benign. Clock drifts do not change quickly with respect to the time frame relevant for the respective algorithm and are thus kept constant for analysis purposes. The fluctuations in message transmission times are random and independent between transmissions. Although abstracting away from the peculiarities of wireless communication, our theoretical insights are supported by test results from an implementation of PulseSync, the algorithm we propose in Section 4.4. As frequently is the case with clock synchronization, our results reveal that the precise model matters a lot. Denoting by D the diameter√of the communication network, we prove a tight probabilistic bound of Θ( D) on the global skew of PulseSync, i.e., the maximum clock skew between any pair of nodes in the system. In contrast, traditional worst-case analysis yields a lower bound of Ω(D) on the global skew [16]. In Chapter 5 we examine a worst-case model, where clock drifts and uncertainties may vary arbitrarily within possibly unknown bounds. Moreover, we consider dynamic graphs, where the edges of the graph appear and disap-

15

pear in a worst-case manner. Thus, any upper bound in this model is highly robust, being resilient to anything but maliciously behaving (“Byzantine”) nodes. Note that in a system with Byzantine faults, it is mandatory to make sure that an erroneously behaving node cannot pollute the state of others. Obviously, this is impossible if a Byzantine node controls all communication between two parts of a graph. This observation shows that in a Byzantine environment the problem is tied to the topology much stronger and requires more complicated algorithms. For these reasons, Byzantine faults are beyond the scope of our work. To the best of our knowledge, so far the literature has been concerned with Byzantine fault-tolerant clock synchronization algorithms under the assumption of full connectivity only [99]. Even then, the problem of achieving both Byzantine fault tolerance and self-stabilization [26] (see Definition 5.27 and Corollary 5.28) is intricate [11, 27, 28, 35]. It is not difficult to show that the best possible worst-case guarantee on the global skew is linear in the (possibly dynamic) diameter [16, 52, 99]. More surprisingly, even if the graph is static, it is impossible to ensure that the local skew, the maximum skew between neighbors, satisfies a bound that is independent of the network diameter [33, 60, 79]. This is of significant interest, as in fact many applications do not necessitate good global synchronization, but merely rely on guarantees on the local skew. For instance, for the aforementioned purpose of acoustic localization we need that nodes that are close to a specific event have closely related clock values. Naturally, these physically clustered nodes will be communicating with each other via a small number of hops. Similarly, time division multiple access protocols, where a common channel is accessed by the sharing devices according to a mutually exclusive assignment of time slots, depend on the respective devices having tightly synchronized clocks. Alongside the primary designation of the channel, it can be used to directly exchange timing information between the devices using it. Hence, an efficient utilization of the channel can be achieved provided that the local skew is kept small. We will show that in any graph, an optimal bound on the local skew on the edges that have been continuously present for a sufficiently long period of time can be maintained by a simple algorithm. This bound is logarithmic in D with a large base of the logarithm, implying that even if the global skew is large, applications depending on the local skew can exhibit a good worstcase scaling behaviour. Moreover, for the proposed algorithm the stabilization time, i.e., the time until the strong local skew bounds apply to a newly formed edge, is linear in the bound on the global skew, which is also asymptotically optimal. Remarkably, the stable local skew achieved in the subgraph induced by the edges that have been operational without interruption for this time period is almost identical to the local skew that can be guaranteed in a static graph where nodes and edges never fail.

Chapter 4

Clock Synchronization in Wireless Networks “All the time you said trees are bad. Now, all of a sudden, you want me to change the entire implementation to a tree protocol?” – Philipp Sommer’s response to my first sketch of PulseSync.

In the last two decades, a lot of research has been dedicated to wireless networks. Since such networks do not require a fixed wiring, they are easy to deploy and can be formed on-the-fly when there is need for cooperation between otherwise unrelated mobile devices. On the downside, wireless communication suffers from interference, complicating information exchange between the participants of the system. The fact that energy is typically a scarce resource in wireless networks aggravates this issue further, as one wants to minimize radio usage. In this chapter, we examine the clock synchronization problem in this particular setting. The presented material is based on work co-authored by Philipp Sommer [63].

4.1

Model

In a wireless network, communication takes place by radio. In theory, in order to send a message, a node powers up its radio, transmits the message, and powers the radio down. In practice, of course, there are a number of issues. Does the receiver listen on the respective channel? Is there interference with other transmissions? Is an acknowledgement to be sent? If so, was it successfully received, etc. We will not delve into these matters, although

18

CHAPTER 4. SYNCHRONIZATION IN WIRELESS NETWORKS

Table 4.1: Energy consumption of radios used in common sensor nodes. Active radios drain by roughly 5 orders of magnitude more power than sleeping devices. The vendors’ terms for the mode denoted by “sleep” differ. sensor node

transmit [mA]

power [dBm]

listen [mA]

sleep [µA]

Mica2 Tmote Sky Crossb. IRIS TinyNode

16.5 17.4 14.5 33

0 0 1 5

9.6 18.8 15.5 14

0.2 0.02 0.02 0.2

one has to keep the peculiarities of wireless communication in mind when devising clock synchronization protocols for such systems. Having said this, we choose a simplistic description of the network as a static graph G = (V, E), where V is the set of nodes and E is the set of bidirectional, reliable communication links. If node v ∈ V sends a message, all neighbors w ∈ Nv listening on the channel can receive this message. We focus on the following aspects of wireless systems: • Communication is expensive. The energy consumption of a node whose radio is powered on is orders of magnitude larger than that of a sleeping node (cf. Table 4.1).1 In fact, in many cases radio usage determines the life-time of a sensor node. Therefore, we want to minimize the amount of communication dedicated to the synchronization routine. Consequently, we require that nodes send and receive (on average) one message per beacon interval B only. • Communication is inexact. As mentioned before, it is not possible to learn the exact clock values of communication partners. In the wireless setting, this is mainly due to two causes. Firstly, transmission times vary. This effect can be significantly reduced by MAC layer time-stamping [77], yet a fraction of the transmission time cannot be determined exactly. Secondly, the resolution of the sensor nodes’ clocks is limited. Thus, rounding errors are introduced that make it impossible to determine the time of arrival of a message precisely (this can also be improved [97]). As these fluctuations are typically not related between different messages, we model them as independently distributed 1 Mica 2, Texas Instruments CC1000, focus.ti.com/lit/ds/symlink/cc1000.pdf Tmote Sky, Texas Instruments CC2420, focus.ti.com/lit/ds/symlink/cc2420.pdf Crossbow IRIS, Atmel AT86RF230, atmel.com/dyn/resources/prod documents/doc5131.pdf TinyNode, Semtech XE1205, semtech.com/images/datasheet/xe1205.pdf

4.1. MODEL

19

random variables. For the sake of our analysis, we assume their distributions to be identical and refer to the respective standard deviation as jitter J . Our results hold for most “reasonable” distributions. For simplicity, we will however assume normally distributed variables with zero mean in this thesis, a hypothesis which is supported by empirical study [30]. • Sending times are constrained. We discussed that in wireless networks one cannot simply send a message whenever it is convenient. In order to account for this, we define the time it takes in each beacon interval between a node receiving and sending a message to be predefined and immutable by the algorithm. For the reason that every node will receive and transmit only once during every interval, we need only a single value τv ∈ R+ for each node v ∈ V , denoting the time difference between receiving and sending the respective messages. This time span also accounts for the fact that it takes some time to receive, send, and process messages. Note that this is a simplification in that this time span is variable for several reasons. However, the respective fluctuations are small enough to have negligible effects, as in a real system the fact that radios are powered down most of the time necessitates that nodes can predict when the next message arrives in order to activate the receiver and listen on the appropriate channel. • Message size is constrained. The number of bits in radio messages should be small for various reasons. This is addressed by our algorithm in that the “payload” of a message consists of a small (constant) number of values. We do not formalize this in our model; in particular, we assume unbounded clocks. In practice, a limited number of bits is used to represent clock values and a wrap-around is implemented. • Dynamics. Which nodes can communicate directly may depend on various environmental conditions, in particular interference from inside or outside the network. Thus, in contrast to the previous definition of G, the communication graph is typically not static. Moreover, the speed of the nodes’ clocks will vary, primarily due to changes in the nodes’ temperatures (see Figure 4.1; we remark that nodes equipped with temperature sensors can significantly reduce this influence [97]). We do not capture these aspects in our model, which assumes a static configuration of the system, both with regard to communication and clock rates. This aspect is addressed by the design of the proposed algorithm, which strives for dependency of computed clock values on a short period of time. Thus, the algorithm will adapt fast to changes in topology or clock speeds.

20

CHAPTER 4. SYNCHRONIZATION IN WIRELESS NETWORKS

921816

Frequency (Hz)

921814

921812

921810

921808 -15

-10

-5

0

5 10 15 Temperature (°C)

20

25

30

35

Figure 4.1: Hardware clock frequency of a Mica2 sensor node for different ambient temperatures. A difference of five degrees alters the clock speed by up to one microsecond per second. Let us now formalize the clock synchronization problem in this commu+ nication model. Each node v ∈ V has a local hardware clock Hv : R+ 0 → R0 . It is an affine linear function Hv (t) = ov + hv · t, where ov ∈ R+ 0 is the offset and hv is the rate of v’s clock. Node v has access to Hv (t) only, i.e., it can read its local clock value, but does neither know ov nor hv . The rate hv determines by how much a local measurement of a difference between two points in time deviates from the correct value. We require that the relative drift of Hv is bounded, i.e., ρv := |hv − 1| ≤ ρ < 1. Here ρ is independent of the number of nodes n, meaning that each clock progresses at most by a constant factor slower or faster than real time. Typical hardware clocks in sensor nodes exhibit drifts of at most 50 ppm, i.e., ρ ≤ 5 · 10−5 .

4.1. MODEL

21

Observe that given an infinite number of messages, two neighboring nodes could estimate each other’s clock values arbitrarily well. Sending clock updates repeatedly and exploiting independence of message jitters, a node v ∈ V can approximate the function Hw , w ∈ Nv , arbitrarily precisely in terms of Hv with probability arbitrarily close to 1. For theory, it is thus mainly interesting to study local clocks with fixed drift in combination with algorithms whose output at time t depends on a bounded number of messages only. In light of our previous statements, the same follows canonically from our goals to (i) minimize the number of messages nodes send in a given time period and (ii) enable the algorithm to deal with dynamics by making it oblivious to information that might be outdated. If we relied on clock values from a large period of time (where the meaning of “large” depends on the speed of changes in environmental conditions), the assumption of clock drifts being constant (up to negligible errors) would become invalid. A clock synchronization algorithm is now asked to derive at each node + v ∈ V a logical clock Lv := R+ 0 → R0 based on local computations, its hardware clock readings, and the messages exchanged. The algorithm strives to minimize the global skew G(t) := max {|Lv (t) − Lw (t)|} v,w∈V

at any time t, using few messages and only recent information. Observe that so far a trivial solution would be to simply set Lv (t) := 0 for all times t and all nodes v ∈ V . As mentioned in the introduction, this is not desired as we want Lv to behave like a “real” clock. In particular, we expect clock speeds to be close to one and clock values to be closely related to real time. To avoid a cluttered notation, in this chapter we will adopt the following convention. There is a distinguished root node r ∈ V that has a perfect clock, i.e., Hr (t) = t = Lr (t) at all times t, and nodes try to synchronize their logical clocks with Lr . This is known as external synchronization in the literature, as opposed to the closely related concept of internal synchronization that we will consider in Chapter 5. Observe that max{|Lv (t) − Lr (t)|} ≤ G(t) ≤ 2 max{|Lv (t) − Lr (t)|}, v∈V

v∈V

i.e., minimizing the global skew is essentially equivalent to synchronizing clocks with respect to the real time t = Lr (t) in this setting. We do not impose explicit restrictions on the progress speeds of the logical clocks in this chapter. However, we note that one can ensure smoothly progressing clocks by interpolation techniques, without weakening the synchronization guarantees.

22

CHAPTER 4. SYNCHRONIZATION IN WIRELESS NETWORKS

FTSP

PulseSync

Synchronization error (us)

100

80

60

40

20

0 5

10 Distance (Hops)

15

5

10

15

Distance (Hops)

Figure 4.2: Synchronization error versus distance from the root node for FTSP (left) and PulseSync (right). See [63] for details on the testbed setup.

4.2

Overview

In the following, we study the probabilistic bounds that can be derived on the quality of global synchronization in the presented model. We begin by deriving a lower bound stating that on a path of length d where on average kd, kp∈ N, messages are transmitted in kB time, the expected skew must be Ω(J d/k). Essentially, this is a consequence of the fact that the variance of the transmission delays adds up linearly along the path to J 2 d, whereas averaging over k repeated transmissions reduces the variance by factor k. p The resulting standard deviation thus must be in Ω(J d/k). In our communication model, from this bound it can p be derived that for any algorithm, the expected global skew must be Ω(J D/k). Opposed to that, we present PulseSync, an algorithm matching the stated lower bound. Basically, PulseSync floods estimates of the node r through a breadth-first-search (BFS) tree rooted at r. All nodes strive to synchronize their clocks relative to r. In order to reduce the effect of the jitter, nodes keep track of the last k received values and compute a regression line mapping the own hardware clock to the estimated clock values of the root node. This approach is not new and has been implemented in the well known Flooding Time Synchronization Protocol (FTSP) [77]. However, the synchronization quality of FTSP becomes poor with growing network diameter due

4.2. OVERVIEW

23

Hw(t) 3J/2 J m2

m1

B

B/2 tr

Hv(t) ts

Figure 4.3: FTSP logical clock computation scheme for the special case of k = 2 data points. Here the first data point m1 is perfectly accurate, while m2 suffers an error of J because the respective message traveled slowly, i.e., the receiving node underestimated the time it took to transmit the message. Hence, the regression line has a value that is too small precisely by J at the receiving time tr . FTSP nodes send clock updates in time slots that are chosen independently at random, once every B time. Thus, at a time ts , which is at least tr + B/2 with probability 1/2, the node will send a message with clock estimate read from the regression line. This estimate will be J + (ts − tr )J /B smaller than the true value, because the error of J on the value received at time tr also implies that the slope of the regression line is J /B too small. In summary, the error on the second received value is amplified by factor at least 3/2 with probability at least 1/2. Since sending slots are chosen independently, this happens independently with probability at least 1/2 at each hop, leading to an exponential amplification of errors.

to two reasons. Firstly, FTSP sends messages according to a random schedule, where nodes transmit one beacon every B (local) time. Therefore, the expected time it takes until information propagates to a leaf in distance D from the root is DB/2. In contrast, PulseSync aligns sending times in a pattern matching the BFS tree along which clock updates propagate, implying that—in practical networks—new clock values will reach all nodes within a single beacon interval B (a “pulse”). Apart from reducing the global skew, this technique enables that logical clocks depend on a preceding time period of length Θ(kB) only, as opposed to Ω(DkB) for FTSP.2 2 This is a result of forwarded clock values being read out of the regression constructed from the last k received values. The dependency on very old values is weak, however,

24

CHAPTER 4. SYNCHRONIZATION IN WIRELESS NETWORKS

Table Size 2 Table Size 8 Table Size 32

1e+10

Average Skew (us)

1e+08

1e+06

10000

100

1 0

10

20

30

40

50

Network Diameter

Figure 4.4: Simulation of FTSP for line topologies with different diameters using varying sizes k ∈ {2, 8, 32} of the regression table. Errors clearly grow exponentially in the diameter, for all three values of k. Mean synchronization errors are averaged over five runs, error bars indicate values of runs with maximum and minimum outcome. See [63] for details.

Secondly, despite being designed as a multihop synchronisation protocol, FTSP exhibits exponentially growing global skew with respect to the network diameter (see Figure 4.2), rendering the protocol unsuitable for largediameter deployments (which indeed occur in practice, cf. [46]). This undesired behavior is a result of the way the algorithm makes use of regression lines. The estimates nodes send to their children in the tree are read from the same regression line used to compute logical clocks. Thus, they are extrapolated from the previously received, erroneous estimates. This can lead to an amplification of errors exponential in the hop distance. For k = 2 this can easily be understood (see Figure 4.3). For larger k, the situation gets more complicated, but for any constant k “bad luck” will frequently overcome the Ω(D + k) of the most recent values contribute significantly to the outcome of the computation.

4.2. OVERVIEW

25

radius = 1

Figure 4.5: A simple unit disk graph (see Definition 9.7). Nodes are arranged into clusters of size four. The clusters form a line. Each node is connected to all nodes within its own and neighboring clusters.

dampening effect (see [96]) of the regression if n is large. Moreover, simulation indicates that even for large values of k the problem quickly becomes devastating when the network diameter grows (see Figure 4.4). This problem can be avoided if one uses independent estimates of the nodes’ clock rates to compensate drift during the time period in which nodes locally increase received clock estimates until they can be forwarded to their children. Since PulseSync forwards clock values as quickly as possible, in our test setting it was already sufficient to rely on the unmodified hardware clock readings to resolve this issue. Nonetheless, we will prove that if nodes use independent clock estimates to compute approximations of their hardware clock rates, the bound on the global skew becomes entirely independent of hardware clock drift. It has been argued that in some graphs one may exploit that the root is connected to each node by multiple paths. Consider for instance the unit disk graph in Figure 4.5. If the “clusters” transmit sequentially from left to right, each node could obtain multiple estimates of equal quality within a single pulse. This will decrease the variance of estimates by the number of nodes in each cluster. In general, one can express the possible gains of this strategy for any node v ∈ V in terms of the resistance between the root r and v if each link in the network is replaced by a unit resistor [39]. However, since message size should remain small, this approach necessitates that nodes

26

CHAPTER 4. SYNCHRONIZATION IN WIRELESS NETWORKS

receive multiple messages in each beacon interval. This contradicts our goal of keeping energy consumption low. If on the other hand we accept a larger battery drain, we can achieve better synchronization by simply reducing B and increasing k accordingly (see Corollaries 4.8 and 4.9).

4.3

Lower Bound

In order to derive our lower bound on the global skew, we first examine how well a neighbor of the root can synchronize its clock with respect to Lr . Lemma 4.1. Assume that r sends at most k ∈ N messages within a time interval T := (t − kB, t] ⊂ R+ ˆv (t) 0 . Suppose v ∈ Nr computes an estimate o of ov (t) := Hv (t) − t without—directly or indirectly—relying on √ any events preceding T . Then the probability that oˆv (t) has an error of J / k is at least constant, i.e., J P |ˆ ov (t) − ov (t)| ≥ √ ∈ Ω(1). k Proof. We claim that w.l.o.g. (i) no other nodes relay information about r’s clock values to v, (ii) r sends all messages at time t and (iii) each message contains only the clock value at the time of sending. To see this, observe first that even if another node knew the state of r exactly, it could not do better than r itself as its messages are subject to the same variance in delay as r’s. Next, including several values into a single message does not help in estimating ov (t), as the crucial point is that the time of delivery of the message in comparison to the expected time of its delivery is unknown to both sender and receiver. Thus, all estimates that v derives on r’s clock values are inflicted with exactly the same error due to jitter. Moreover, sending a different value than the one at the time of sending only meant that v had to guess, based on its local clock and the messages from r, the value of Hv (t0 ) at the time t0 when r read the respective clock value. This however could only reduce the quality of the estimate. As we deliberately lifted any restrictions r had on sending times, there is no advantage in sending the message at a different time than t. Finally, since we excluded the use of any information on times earlier than t − kB in the preconditions of the lemma, r has no valuable information to share except its hardware clock reading at time t. In summary, r can at best send k messages containing t at time t, such that v will learn that r sent k messages at time t that have been registered at local times Hv (t + Xi ), where Xi , i ∈ {1, . . . , k}, are independent normally distributed random variables with zero mean and variance J 2 . Since Xi is unknown to v, it cannot determine Hv (t)−t. The best it can do is to read the k values Hv (t + Xi ) and take each value Hv (t + Xi ) − t as an estimate. This

4.3. LOWER BOUND

27

can be interpreted as k measurements of ov (t) suffering from independent normally distributed errors hv Xi ∈ Θ(Xi ) (as |hv − 1| = ρv ≤ ρ and ρ < 1 is a constant). Hence, oˆv (t) is (at best) the mean of v’s clock readings minus t. According to Lemma 2.19, this value is normally distributed with mean ov (t) and variance Θ(J 2 /k), which by Lemma 2.20 gives the claimed bound. At first glance, it seems tempting to circumvent this bound by just increasing the time interval information is taken from. Indeed this improves synchronization quality as long as the model assumption that clock rates and topology do not change remains (approximately) valid. As soon as conditions change quickly, the system will however require more time to adapt to the new situation, thus temporarily incurring larger clock skews. The given bound on the synchronization quality between neighbors generalizes to multihop communication easily. Corollary 4.2. Given a shortest path (v0 := r, v1 , . . . , vd ), assume that kd messages, for some k ∈ N, are sent and received by the nodes on the path within a time interval T := (t − kB, t] ⊂ R+ 0 . Suppose vd computes an estimate oˆvd (t) of its hardware clock offset ovd (t) at time t that does not rely onpany events before T . Then the probability that oˆvd (t) has an error of J d/k is constant, i.e., " √ # J d ∈ Ω(1). P |ˆ ovd (t) − ovd (t)| ≥ √ k Proof. Assume w.l.o.g. that hvi = 1 for all i ∈ {1, . . . , d}. Denote by oi := ovi (t) − ovi−1 (t), i ∈ {1, . . . , d} the offset between the clocks of vi and vi−1 . Consider the following scheme. First v1 determines an estimate oˆ1 (t) of ov1 (t) = o1 , then v2 an estimate oˆ2 of the offset o2 towards v1 , and so on. Thus, by incorporating the results Pi into the messages, vi , i ∈ {1, . . . , d}, can estimate ovi (t) by oˆvi (t) = ˆj (t). Since clocks do not drift and j=1 o there are no “shortcuts” as (v0 , . . . , vd ) is a shortest path, this scheme is at least as good as an optimal one (obeying the model constraints). Let ki , i ∈ {1, . . . , d}, denote the number of messages node vi receives from its predecessor. As seen in the proof of Lemma 4.1, oˆi is normally distributed with mean oi and variance J 2 /ki . By LemmaP2.19, it follows that oˆP vd is normally distributed with mean ovd and variance di=1 J 2 /ki . Because di=1 ki = kd, this variance is minimized by the choice ki = k for all i ∈ {1, . . . , d}. We get that Var[ˆ ovd (t)] ≥ J 2 d/k, which by Lemma 2.20 yields the desired statement. Next, we infer our lower bound on the global skew.

28

CHAPTER 4. SYNCHRONIZATION IN WIRELESS NETWORKS

Theorem 4.3. Suppose that k ∈ N and each node sends and receives on average at most one message in B time. If a clock synchronization algorithm determines Lv (t) at all nodes v ∈ V and times t ∈ R+ 0 depending on events that happened after time t − kB only, then at uniformly random times t from a sufficiently large time interval we have that √ ! J d √ , E[|Lv (t) − t|] ∈ Ω k where d is the distance of v from the root. Proof. Let (v0 := r, v1 , . . . , vd := v) denote a shortest path from r to v. Because all nodes receive on average at most one message in B time, for symmetry reasons we may w.l.o.g. assume that all estimates v obtains on its offset depend on messages along this path only. Let E be the event that at a time t sampled uniformly at random from a sufficiently large time period it holds that the nodes vi , i ∈ {0, . . . , d − 1}, sent and received in total at most 2kd messages during the interval (t − kB, t]. Because nodes send and receive on average at most one message in B time, linearity of expectation and Markov’s bound imply that the probability of E must be at least 1/2. By Corollary 4.2, we have that any estimate v may compute of Hv (t) − t has an √ √ error of J d/ 2k with at least constant probability, proving the claim. Seen from a different angle, this result states how quickly the system may adapt to dynamics. It demonstrates a trade-off between the contradicting goals of minimizing message frequency, global skew, and the time period logical clock values depend on. Given a certain stability of clock rates and having fixed a desired bound on the global skew, for instance, one can derive a lower bound on the number of messages nodes must at least send in a given time period to meet these conditions. Similarly, the theorem yields a lower bound on the time span it takes until a node (re)joining the network may achieve optimal synchronization for a given message frequency, granted that the other nodes make no additional effort to support this end.

4.4

PulseSync

The central idea of the algorithm is to distribute information on clock values as fast as possible, while minimizing the number of messages required to do so. In particular, we would like to avoid that it takes Ω(BD) time until distant nodes learn about clock values broadcast by the root node r. Obviously, a node cannot forward any information it has not received yet, enforcing that information flow is directed. An intermediate node on a line topology has to wait for at least one message from a neighbor. On the other hand, after

4.4. PULSESYNC

29

reception of a message it ought to forward the derived estimate as quickly as possible in order to spread the new knowledge throughout the network. Thus, we naturally end up with flooding a pulse through the network. In order to keep the number of hops small, the flooding takes place on a breadth-first search tree. To keep clock skews small at all times, each node v ∈ V does not only minimize its offset towards the root whenever receiving a message, but also employs a drift compensation, i.e., tries to estimate hv and increase its logical clock at the speed of Hv divided by this estimate. Considering that we modeled Hv as an affine linear function and the fluctuations of message delays as independently normally distributed random variables, linear regression is a canonical choice as a means to compute Lv (t) out of Hv (t) and the last k clock updates received. We need to specify how nodes that are not children of the root obtain accurate estimates of r’s clock. Recall that nodes are not able to send a message at arbitrary times. Thus, it is necessary to account for the time span τv that passes between the time when node v ∈ V receives a clock estimate from a parent and the time when it can send a (derived) estimate to its children. The most simple approach here is that if v obtains an estimate tˆ of the root’s clock value Lr (t) = t from a message received at time t, it sends at time t + τv the value tˆ + (Hv (t + τv ) − Hv (t)) to its children. Thus, the quality of the estimate will deteriorate by at most |(Hv (t + τv ) − Hv (t)) − ((t + τv ) − t)| = |hv − 1|τv ≤ ρτv . We will refer to this as simple forwarding. √ Intuitively, √ granted that τv is small enough, i.e., maxv∈V {ρv τv } J / D (here D comes into play as jitters are likely to cancel out partially), the additional error introduced by simple forwarding is dominated by message jitter and thus negligible. In our test setting, this technique already turned out to be sufficient for achieving good results. However, this might not be true in general, due to different hardware, larger networks, harsh environments, etc. Hence we devise a slightly more sophisticated scheme we call stabilized forwarding. As discussed before, it is fatal to replace the term Hv (t + τv ) − Hv (t) by Lv (t + τv ) − Lv (t), i.e., approximate the progress of real time by means of the regression line that is computed partially based on the estimate tˆ obtained ˆ v of hv to compensate at time t. Instead, we use an independent estimate h the drift. To this end, given k ∈ 2N, node v ∈ V computes the regression line defining Lv according to the k/2 most recent messages only. The remaining k/2 messages nodes may take information from are used to provide clock

30

CHAPTER 4. SYNCHRONIZATION IN WIRELESS NETWORKS

estimates with simple forwarding. From these values a second regression line is determined, whose slope s should be close to 1/hv . As we know that ˆ v to 1 − ρ if the outcome is too small and to hv ∈ [1 − ρ, 1 + ρ], nodes set h 1 + ρ if it is too big. All in all, 1 − ρ if 1/s ≤ 1 − ρ ˆ v := 1 + ρ if 1/s ≥ 1 + ρ h 1/s otherwise. Apart from sending tˆ + Hv (t + τv ) − Hv (t) at time t + τv after receiving a message at time t, node v now also includes the value Hv (t + τv ) − Hv (t) tˆ + ˆv h into the message. This (usually) more precise estimate is then used to derive the regression line defining Lv from the k/2 most recent messages. Obviously, one cannot use stabilized forwarding until nodes received sufficiently many clock estimates; for simplicity, we disregard this in the pseudocode of the algorithm. We remark that a similar approach has been proposed for high latency networks where the drift during message transfer is a major source of error [101]. The pseudocode of the algorithm for non-root nodes is given in Algorithm 4.2, whereas the root follows Algorithm 4.1. In the abstract setting, a message needs to contain the two estimates of the root’s clock value only. For clarity, we utilize sequence numbers i ∈ N, initialized to one, in the pseudocode of the algorithm. In practice, a message may contain additional useful information, such as an identifier, the identifier of the (current) root, or the (current) depth of a node in the tree. For the root node, the logical clock is simply identical to the hardware clock. Any other node computes Lv (t) as the linear regression of the k/2 most recently stored pairs of hardware clock values and the corresponding estimates with stabilized forwarding, evaluated ˆ v (t) is computed out of the k/2 estiat Hv (t). As stated before, the value h mates with simple forwarding from the preceding pulses, as the inverse slope of the linear regression of these values. Algorithm 4.1: Whenever Hr (t) mod B = 0 at the root node r. 1 2 3

wait until time t + τr when allowed to send send ht + τr , t + τr , ii // recall that Hr (t + τr ) = t + τr i := i + 1

4.5. ANALYSIS

31

Algorithm 4.2: Node v 6= r receives its parent’s message htˆ, t˜, ii with sequence number i at local time Hv (t). 1 2 3 4 5

delete h·, ·, ·, i − k + 1i store hHv (t), tˆ, t˜, ii wait until time t + τv when allowed to send ˆ v , ii send htˆ + Hv (t + τv ) − Hv (t), t˜ + (Hv (t + τv ) − Hv (t))/h i := i + 1

4.5

Analysis

In this section, we will prove a strong probabilistic upper bound on the global skew of PulseSync. To this end, we will first derive a bound on the accuracy of the estimates nodes compute of their hardware clock rates. Then we will proceed to bounding the clock skews themselves. Definition 4.4 (Pulses). Pulse i ∈ N is complete when all messages with sequence number i have been sent. We say that pulses are locally separated if for all i ∈ N each node sends its message with sequence number i at least αB time before receiving the one with sequence number i + 1, where α ∈ R+ is a constant. After the initialization phase is over, i.e., as soon as all nodes could fill their regression tables, nodes are likely to have good estimates on their clock rates. Interestingly, the quality of the estimates is independent of the hardware clock drifts, as the respective systematic errors are the same for all estimates of the root’s clock and thus cancel out when computing the slope of the regression line. Lemma 4.5. For v ∈ V and arbitrary δ ∈ R+ define √ δJ D ∆h := min 2ρ, 3/2 . k B Suppose that pulses are locally separated. Then, at any time t when at least k ∈ 2N pulses are complete, it holds that # " 2 hv P − 1 ≤ ∆h ∈ 1 − e−Ω(δ log δ) . ˆ v (t) h Proof. Assume that (v0 := r, v1 , . . . , vd := v) is the path from r to v in the BFS tree (i.e., in particular d ≤ D). Consider a simply forwarded estimate tˆ that has been received by v at time t. Backtracking the sequence of messages

32

CHAPTER 4. SYNCHRONIZATION IN WIRELESS NETWORKS

leading to this value and applying Lemma 2.19, we see that r sent P its respective message at a time that is normally distributed around t − d−1 i=0 τvi with variance dJ 2 . Thus, since node vi increases each simply forwarded estimate Pd−1 at rate hvi , tˆ − t is normally distributed with mean i=0 (hvi − 1)τvi and variance at most d−1 X ((1 + ρvi )J )2 ≤ 4dJ 2 . i=0

By Theorem 2.22, thus the slope sˆ of the regression line v computes is normally distributed with mean 1/hv and variance DJ 2 dJ 2 ⊆ O . O hv k3 B 2 k3 B 2 PN Here we used that pulses are locally separated, implying that i=1 (xi − x ¯)2 ∈ Ω(hv k3 B 2 ) (in terms of the notation from the theorem). Recall that ˆ v ∈ [1 − ρ, 1 + ρ]. Thus, hv ≥ 1 − ρv ≥ 1 − ρ > 0 and we made sure that h ˆ v − 1| = |hv s − 1| is bounded both by we can infer that the error |hv /h 1/(1 − ρ) ∈ O(1) times the deviation of the slope from its mean and 2ρ. Hence, the claim follows by Lemma 2.20. Based on the preceding observations, we can now prove a bound on the skew between a node and the root. Theorem 4.6. Suppose pulses are locally separated. Denote by (v0 := r, v1 , . . . , vd := v) the shortest path from the root to v P ∈ V along which d estimates of r’s clock values are forwarded. Set Tv := i=1 τvi , i.e., the expected time an estimate “travels” along the path. Suppose t1 < t2 are two consecutive times when v receives a message and suppose that at time t1 at least 3k/2, k ∈ 2N, pulses are complete. Then for any δ, ε ∈ R+ and ∆h as in Lemma 4.5 it holds that q P ∀t ∈ [t1 , t2 ) : |Lv (t) − t| ≤ εJ D + ∆ T h v k ∈1−

kD −Ω(δ 2 log δ) e 2

2

− e−Ω(ε

log ε)

.

Proof. Since at least 3k/2 pulses are complete, according to Lemma 4.5 during the last k/2 pulses we had at any time t and for any node vi , i ∈ ˆ v (t) − 1| ≤ ∆h with probability 1 − e−Ω(δ2 log δ) . {0, . . . , d − 1} that |hvi /h i Denote by E the event that the last k/2 estimates with stabilized forwarding that v received have been increased at all nodes on the way at a rate differˆ v only changes when a message is ing by no more than ∆h from 1. Since h i received, we can apply the union bound to see that E occurs with probability 2 at least 1 − kDe−Ω(δ log δ) /2.

4.5. ANALYSIS

33

2

Assume now that E happened and also that 1 − kDe−Ω(δ log δ) /2 > 0 (as otherwise the bound is trivially satisfied). Consider the errors of the above k/2 estimates. Each estimate has been “on the way” for expected time Tv , i.e., the absolute of the mean of its error is bounded by ∆h Tv . The remaining ˆ v are fraction of the error is due to message jitter. Note that if the estimates h i very bad, this might amplify the effect of message jitter. However, since the ˆ v are uniformly bounded by 1 − ρ and 1 + ρ, we can account for this effect h i by multiplying J with a constant. Thus, we can still assume that at each hop, a normally distributed random variable with zero mean and variance O(J 2 ) is added to the respective estimate of r’s current clock value, yielding a random experiment which stochastically dominates the true setting (with respect to clock skews). In summary, by Lemma 2.19 w.l.o.g. each estimate that v obtains suffers an independently and normally distributed error with mean µ ∈ [−∆h Tv , ∆h Tv ] and variance O(dJ 2 ) ⊆ O(DJ 2 ). By Theorem 2.22, the slope of the regression line utilized to compute Lv suffers a normally distributed error of zero mean and standard deviation √ O(J D/(k3/2 B)). Denote by t¯ the mean of the times when v received the k/2 messages it computes the regression from. As for all times t¯ < t ∈ [t1 , t2 ) we have that t − t¯ ≤ (k/2 + 1)B, we can bound3 hv (t) − 1 (t¯ − t) , |Lv (t) − t| ≤ |Lv (t¯) − t¯| + ˆv h where the second p term is normally distributed with zero mean and standard deviation O(J D/k). Again by Theorem 2.22, the deviation of the line at the time t¯p itself is normally distributed with mean µ and a standard deviation of O(J D/k). Thus, applying Lemma 2.20 and the union bound yields that, conditional to E, the event E 0 that r D ∀t ∈ [t1 , t2 ) : |Lv (t) − t| ≤ εJ + ∆ h Tv k 2

occurs with probability at least 1 − e−Ω(ε P [E 0 ]

≥ ∈ ⊆

log ε)

. We conclude that

P [E] · P [E 0 |E] 2 kD −Ω(δ2 log δ) 1− e 1 − e−Ω(ε log ε) 2 2 kD −Ω(δ2 log δ) 1− e − e−Ω(ε log ε) 2

as claimed. 3 Note that the use of the expression Lv (t¯) here is an abuse of notation, as we refer to the y-value the regression line that v computes at time t assigns to x-value Hv (t¯).

34

CHAPTER 4. SYNCHRONIZATION IN WIRELESS NETWORKS

The term Tv occurring in the bound provided by the theorem motivates the following definition. Definition 4.7 (Pulse Time). Denote for v ∈ V by (v0 := r, v1 , . . . , vd := v) the shortest path from r to v along which PulseSync sends messages and set P Tv := d−1 τ i=0 vi . The pulse time then is defined as T := max {Tv } . v∈V

Theorem 4.6 implies that the proposed technique is indeed optimal provided a comparatively weak relation between B and P is satisfied. Corollary 4.8. Suppose pulses are locally separated and that s log(kD) T . B≥ k log log(kD) Then the expected clock skew of any node v ∈ V in distance d from the root at any time t when at least 3k/2 pulses are complete is bounded by r ! d E[|Lv (t) − t|] ∈ O J . k Proof. W.l.o.g., we assume that d = D (otherwise just consider the subgraph induced by all nodes within distance d from r). For i ∈ N, set s log(kD)D iJ ∆h (i) := 3/2 . k B log log(kD) By assumption, we have that s r log(kD)D iJ T D ∆h (i)Tv ≤ 3/2 ≤ iJ , k k B log log(kD) p giving by Theorem 4.6 for δ = i log(kD)/ log log(kD) and ε = i that " r # 2 D P |Lv (t) − t| > 2iJ ∈ e−Ω(i log i) . k It follows that E[|Lv (t) − t|]

≤

∈

⊆

∞ X

"

# r D D P |Lv (t) − t| > 2iJ 2J k k i=0 ! r ∞ X 2 D 1+ e−Ω(i log i) 2J k i=1 r ! D O J . k r

4.5. ANALYSIS

35

Note that Corollary 4.8 requires a lower bound on B, while in practice we are interested in choosing B large to minimize energy consumption. Of course, a beacon interval that is too large is undesired because one wants the system to adapt quickly to dynamics. However, the pulse time is a trivial lower bound on this response time, i.e., it does not p make sense to choose log(kD)/ log log(kD) Bk ∈ o(T ). Thus, we remain with a small gap of O to countervail the fact that the drift compensations the nodes on a path of length D employ are dependent, making best use of the limited number of recent clock estimates that are available. In practice, it is important to bound clock skews at all times and all nodes, as algorithms may fail if presumed bounds are violated even once. Naturally, a probabilistic bound cannot hold with certainty; indeed, in our model arbitrary large skews must occur if we just wait for sufficiently long. However, for time intervals that are bounded by a polynomial in n times B we can state a strong bound that holds w.h.p. Corollary 4.9. For i ∈ N, let ti denote the time when the ith pulse is complete. Provided that the prerequisites of Theorem 4.6 are satisfied, the total number of pulses p is polynomial in n, and B ≥ T /k, we have that s ! D log n max {G(t)} ∈ O J t∈[t3k/2 , tp ] k log log n w.h.p. Proof. Observe that 3k/2 ≤ p or nothing is to show as tp < t3k/2 . As also D < n, we have thatkD is polynomially bounded in n. Thus, values p δ, ε ∈ O log n/ log log n exist such that 1−

2 kD −Ω(δ2 log δ) 1 e − e−Ω(ε log ε) ≥ 1 − c+1 . 2 pn

We apply Theorem 4.6 to each node v ∈ V and each pulse i ∈ [3k/2, p]. Due to the bound B ≥ T /k ≥ Tv /k and the definition of ∆h , we get for all times t from the respective pulse that s s ! ! D log n D log n |Lv (t) − t| ∈ O J + ∆ h Tv ⊆ O J k log log n k log log n with probability at least 1 − 1/(pnc+1 ). The statement of the corollary then follows by the union bound applied to all nodes and all pulses i ∈ [3k/2, p].

36

CHAPTER 4. SYNCHRONIZATION IN WIRELESS NETWORKS

80

Table Size 2 Table Size 8 Table Size 32

70

Global Skew (us)

60

50

40

30

20

10

0 0

50

100

150

200

Network Diameter

Figure 4.6: Simulations of PulseSync on line topologies. Results agree well with predictions. In particular, it can be observed that the global skew grows roughly as the square-root of the network diameter. For details the reader is referred to [63].

Note that when considering all nodes, the lower bound on B relaxes to T /k, i.e., we can achieve the stated bound despite the fastest possible adaption to dynamics. Moreover, comparing the previous bounds to the results from simulation and implementation of the algorithm (Figures 4.6 and 4.7), we find the predictions on the synchronization quality of the algorithm well met.

4.6

Concluding Remarks

In this chapter, we proposed a clock synchronization algorithm that exhibits asymptotically optimal synchronization quality in the given model. Testbed results from a prototypical implementation indicate that our abstract model is appropriate for capturing crucial aspects of sensor networks. Considering that we derive skew bounds that are independent of hardware clock drifts

4.6. CONCLUDING REMARKS

37

40 35

Global Skew (us)

30 25 20 15 10 5 0 0

5000

10000

15000

20000

Time (s)

Figure 4.7: Global skew of an execution of PulseSync with B = 30 s and k = 8 on a line topology comprising 20 Mica2 sensor nodes. For times t > 100 s, we observed a maximum skew of 38 µs. See [63] for details.

provided that jitter is not too large, our results may also be of interest for high-latency networks, e.g. acoustic underwater networks. However, our presentation has not been exhaustive in the sense that one can effortlessly derive a protocol suitable for practice, as several issues still need to be resolved. The test runs of the algorithm were executed on a simple line topology in a controlled environment. In order to finish pulses quickly on arbitrary topologies, an efficient broadcast routine is in demand. This problem has been studied extensively and is by now wellunderstood [7, 17, 18, 21, 87], hence we refrain from a discussion here. We confine ourselves to mentioning that it is sufficient to solve the broadcast problem on a sparse backbone of the network, since the remaining nodes may simply listen to pulses and derive their clocks without ever transmitting by themselves. Computing such a backbone, i.e., a connected dominating set, can also be solved efficiently [98]. Finally, in order to exploit the full potential of the algorithm, a good implementation must deal with message loss, changes in topology, and varying

38

CHAPTER 4. SYNCHRONIZATION IN WIRELESS NETWORKS

hardware clock rates. Ideally, one would choose kB adaptively, reducing it in face of high volatility of clock rates and/or connectivity and increasing k (to gain synchronization quality) or B (to save energy) again when the system becomes more stable.

Chapter 5

Gradient Clock Synchronization “I got used to switching off once they started talking about this.” – Yvonne-Anne Pignolet on my lively discussions with Thomas Locher regarding clock synchronization.

In the previous chapter, we considered clock synchronization in the context of a concrete system. Exploiting the properties of wireless networks, we p obtained a bound of O( D log n/ log log n) on the global skew that holds w.h.p. This result is however fragile with respect to changes in the model. We made strong assumptions, in particular independently normally distributed deviations in message transmission times, constant clock drifts, and fixed topology. Moreover, PulseSync does not attempt to minimize the local skew. In fact, in a circle the two leafs of a BFS tree are likely to be the pair of nodes experiencing the largest clock skews, yet they are neighbors. We drop the former assumptions in favor of a worst-case model with regard to communication, clock drifts, and topology changes. We will devise an algorithm featuring an optimal gradient property as introduced in [33], i.e., the worst-case skew between any two nodes that have been connected by a path of length d for sufficiently long is an asymptotically minimal function of d. At the same time, the algorithm is capable of extending the gradient property to newly appearing edges as quickly as possible. This chapter is based on joint work with Fabian Kuhn, Thomas Locher, and Rotem Oshman [50]. In large parts, it builds on a preceding line of publications together with Thomas Locher [58, 59, 60]. The proposed gradient clock synchronization algorithm and its analysis have been extended from the model considered in these articles to the one introduced by Kuhn et al. [52], which differs mainly in that the graph changes dynamically. A proof of the

40

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

gradient property of the algorithm in the general model is given in [51]; we strive for a simplified, more accessible presentation of the core concepts in this thesis. For a detailed introduction to the static setting and its analysis we refer the interested reader to [71].

5.1

Model

We point out that in the sequel we will adopt a model as simple as possible to still capture the main aspects of the problem. As a consequence, some of the assumptions in the model may seem artificial at first glance. For the sake of a straightforward presentation, we postpone the respective discussion to Section 5.6. Similarly to Chapter 4, each device v ∈ V is equipped with a differentiable + hardware clock Hv : R+ 0 → R0 , where Hv (0) := 0. The drift of Hv is bounded, i.e., d hv (t) := Hv (t) ∈ [1 − ρ, 1 + ρ], dt where 0 < ρ 1 is the maximum clock drift, or simply drift. As before, v does neither know the real time t nor the rate at which Hv (currently) increases, but may read Hv (t). At all times t, node v must compute a (differentiable) logical clock Lv (t), where Lv (0) := 0. We require that the logical clock + Lv : R+ 0 → R0 also progresses at a controlled speed, i.e., there is a µ ∈ O(1) such that d lv (t) := Lv (t) ∈ [1 − ρ, (1 + µ)(1 + ρ)]. (5.1) dt Node v is called fast at time t if lv (t) = (1 + µ)hv (t) and slow at time t if lv (t) = hv (t). The margin µ is introduced to permit v to increase its logical clock by a factor of 1 + µ faster than its hardware clock in order to catch up with other nodes. Since nodes do not know hv (t), the condition on lv imposes that lv (t) ∈ [hv (t), (1 + µ)hv (t)] at all times. Thus it must hold that µ>

2ρ , 1−ρ

as otherwise it might happen that 2ρ (1 + µ)hv (t) ≤ 1 + (1 − ρ) = 1 + ρ = hw (t), 1−ρ for some v, w ∈ V , rendering it impossible for v to catch up with w. We have not specified yet what information v ∈ V may access in order to compute Lv (t). Apart from its hardware clock readings, each node v ∈ V has V a dynamic set of neighbors Nv : R+ → . This relationship is symmetric, 0 2

5.1. MODEL

41

i.e., it induces a simple dynamic graph G(t) = (V, E(t)), t ∈ R+ 0 . We say that edge e exists during the time interval [t1 , t2 ] if e ∈ E(t) for all t ∈ [t1 , t2 ]. The statement that {v, w} ∈ E(t) at time t is equivalent to v having an estimate ˆw L v (t) of w’s logical clock value Lw (t) at time t and vice versa. This estimate is inaccurate; it may be off by the uncertainty U ∈ R+ , i.e., ˆw ∀t ∈ R+ 0 : |Lv (t) − Lw (t)| ≤ U. The value of U depends on several factors, such as fluctuations in message transmission times, the manner in which estimates are obtained, clock drifts, the frequency at which nodes communicate in order to update their estimates, etc. A fundamental lower bound [16] shows that in a static network, the global skew grows linearly with the network diameter. In dynamic networks there is no immediate equivalent to a diameter. Informally, the diameter corresponds to the number of hops it takes (at most) for information to spread from one end of the network to the other. To formalize this idea we adopt the following definitions. Definition 5.1 (Flooding). A flooding that originates at node v is a process initiated when node v sends a flooding message to all its neighbors. We normalize the maximal message delay, i.e., each individual message is in transit for at most one time unit. Each node that receives the message for the first time forwards it immediately to all its neighbors. We say that the flooding is complete when all nodes have received a flooding message. Definition 5.2 (Diameter and Flooding Jitter). We say that the dynamic graph G has a diameter of D if a flooding originating at any node and any time always completes in at most D time units. It has a flooding jitter of JD ∈ [0, D] if each node can determine up to JD the time span between a flooding being initiated and the node receiving the first flooding message. Note that in general JD might be very different from DU . On the one hand, nodes might e.g. use reference broadcasts [30, 55] to obtain local, accurate estimates U , while flooding suffers from large jitters J at each hop, implying that JD ≥ DJ DU .1 On the other hand, nodes might derive local estimates from infrequent direct communication every τ time. Observe that U ≥ 2ρτ because logical clocks must run at least at the hardware clock speed, i.e., in absence of new information estimated and true clock values may drift apart at rate 2ρ. Thus, if τ 1 + J /ρ, we have DU ≥ 2ρτ D (ρ + J )D. As it is possible to ensure that JD ∈ O((ρ + J )D), in this case we have JD DU . 1 The first inequality follows from the fact that in the worst case either all jitters may increase the time the flooding takes or all may decrease it. This is formalized by well-known shifting arguments (cf. [74]).

42

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

To measure the quality of an algorithm, we consider two kinds of requirements: a global skew constraint which gives a bound on the difference between any two logical clock values in the system, and a dynamic gradient skew constraint which becomes stronger if nodes have been connected by a short path for sufficiently long time. In particular, for nodes that remain neighbors for a long time, the dynamic gradient skew constraint requires a much smaller skew than the global skew constraint. In the probabilistic model of the previous chapter, we considered the maximum clock skew G(t) at a given time t, since skews could become arbitrarily large. In the worst-case setting it makes sense to use the following more stringent definition. Definition 5.3 (Global Skew). A clock synchronization algorithm A has a global skew of G(D, JD , ρ), if for any execution of A with drift ρ and flooding jitter JD on any dynamic graph of diameter D it holds that ∀v, w ∈ V, t ∈ R+ 0 : |Lv (t) − Lw (t)| ≤ G. For notational convenience, we omit the arguments D, JD , and ρ of G in the following. Informally, no matter what the graph or the execution, skews should at all times be bounded by a function of the parameters D, JD , and ρ only. An ideal dynamic gradient skew constraint would state that any two nodes that are close in G(t) have closely synchronized logical clocks. Apparently, this is not possible immediately after a “shortcut” has been formed, as it needs time to reduce skews between previously distant nodes. Rather, any path between two nodes v, w ∈ V that has been part of the graph for sufficiently long imposes a constraint on |Lv − Lw | that is tighter the shorter the path is. To state precisely what “sufficiently long” means, we need the following definition. Definition 5.4 (Stable Subgraphs). For T ∈ R+ , the T -stable subgraph of G is defined as ( ! )! V 0 0 GT (t) := V, e ∈ ∀t ∈ [t − T, t] : e ∈ E(t ) . 2 For two nodes v, w ∈ V we denote by dT (v, w) the distance of v and w in GT . We could now formulate the dynamic gradient skew as a function bounding the skew on each path in terms of its length and the time span it existed without interruption. However, after a certain amount of time T passed, skew bounds have converged to a value depending on the length of the path

5.2. OVERVIEW

43

only. It turns out that skews on such a path may remain large for Ω(T ) time. For that reason, we simply express the dynamic skew constraint in terms of the distance between two nodes in the T -stable subgraph of G. Definition 5.5 (Stable Gradient Skew and Stabilization Time). We say that Algorithm A exhibits a stable gradient skew of S(G, ρ, U ) : R+ → R+ with stabilization time T , if for any execution of A with global skew G, drift ρ, and uncertainty U it holds that ∀v, w ∈ V, t ∈ R+ 0 : |Lv (t) − Lw (t)| ≤ S (dT (v, w)) . We will omit the arguments G, ρ, and U of S from the notation. Of particular interest is the stable local skew S(1) of an algorithm, as any two nodes that have been neighbors for T time are guaranteed to have logical clock values that are merely S(1) off, which for the algorithm we are going to propose is typically small compared to G.

5.2

Overview

We will start by describing a simple technique to obtain a global skew of Λ G := (1 − ρ)(JD + 1) + 2ρ D + 2 + 1+ρ for arbitrary Λ > 0. From results presented in [60], we will infer that this bound is essentially optimal if algorithms are required to guarantee the best possible envelope condition ∀v ∈ V, t ∈ R+ 0 : |Lv (t) − t| ≤ ρt, i.e., logical clocks do not differ further from the real time t than the hardware clocks do in the worst case. Note that even if this condition is dropped, one cannot reduce the global skew by more than roughly a factor of two [16]. Observe that for maintaining a small global skew it is sufficient that nodes with the largest logical clock value in the system do not run faster than nodes with the smallest clock value whenever skews become “critical”. Thus, it is not surprising that the respective rules never conflict with our goal to ensure a strong gradient property. Subsequently, we expound our gradient clock synchronization algorithm Aµ . Given G, ρ, and U , for any constant µ ≥ 4ρ/(1 − ρ) it guarantees a stable gradient skew of G Sµ (d) ∈ Θ U d logµ/ρ Ud

44

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

with stabilization time Tµ ∈ Θ(G/µ). In fact, any µ larger than the minimum of 2ρ/(1 − ρ) will do. However, the base of the logarithm in the skew bound tends to 1 as µ approaches 2ρ/(1 − ρ) and the stabilization time deteriorates like the function f (x) = 1/x for x → 0. Thus, for µ ∈ Θ(ρ), we have a large stabilization time of Θ(G/ρ) and a constant base, but in turn the logical clocks mimic the behavior of the hardware clocks with a slightly worse drift of µ + (1 + µ)ρ ∈ Θ(ρ). On the other hand, choosing µ ∈ Θ(1) leads to a stabilization time of Θ(G) and a large base of the logarithm. Considering that typically D is smaller than 1/ρ (and certainly, say, ρ−5 ), the latter choice in practice implies a local skew that is at most a constant multiple of U . Moreover, if the longest path in GTµ has a length of O(D), i.e., the diameter of G is not small because of many short, but unreliable paths, we obtain a stronger bound on the maximum clock skew. Slightly abusing notation,2 we show that G ∈ O(U D) and thus D Sµ (d) ∈ O U d logµ/ρ . d In other words, the rules employed to keep the local skew small by themselves impose a linear bound on the global skew. In a highly dynamic setting, however, the flooding technique presented in Section 5.3 is more reliable, as it does not build on the existence of (short) stable paths. The above gradient property is asymptotically optimal as already in static graphs one can enforce comparable skews [60]. Indeed, Aµ is (1 + o(1))competitive3 in this regard. It is important to mention that a second lower bound, also given in [60], proves that µ ∈ ω(1) is of no avail when trying to obtain a stronger gradient property. Intuitively, this arises from the fact that increasing clock speeds by more than a constant factor “amplifies” clock drifts, allowing an adversary to introduce skew into the system faster. Since the speed at which information spreads remains the same, this effect negates the advantage of being able to reduce clock skews more quickly. Given these tight constraints on the stable skew, the stabilization time of Aµ is asymptotically optimal as well [50], a statement that follows from a lower bound by Kuhn et al. [52]. Again, due to the limited speed of information propagation, µ ∈ ω(1) is not sufficient to achieve a stabilization time of o(G). In general, in order to allow for a faster integration of new edges, one must also accept a weaker gradient property (cf. [52]). If edge failures are short-lived, however, there is hope to reintegrate the respective edges sooner. A second algorithm presented in [50] achieves the same stable skew, but uses a different technique than Aµ to stabilize new edges. This 2 We demanded that G is a function of D, JD , and ρ only. The second bound instead takes the maximal diameter of GTµ , U , and ρ as arguments. 3 Here we refer to asymptotics with respect to fixed U and (ρ, Sµ (1)) → (0, ∞).

5.3. BOUNDING THE GLOBAL SKEW

45

algorithm gradually increases the relevance of skew observed on a given edge for the computation of the current logical clock rate. As a temporary failure of an edge does not mean that the stronger synchronization guarantees due to that edge are lost instantaneously, one could slowly decrease the relevance of the edge until it reappears and then start to increase it again.

5.3

Bounding the Global Skew

Ensuring an (almost) optimal worst-case global skew is surprisingly simple. Essentially, one cannot do better than the uncertainty of information on clock values between the most far apart nodes. Even if one node of such a pair tries to keep its clock exactly in the center of the interval in which the other must be according to its best guess, still a global skew of Ω(JD + ρD) cannot be avoided. However, if a node runs fast without knowing that somebody is ahead, this might be a bad choice, as we might unnecessarily lose accuracy offered by the hardware clocks. Definition 5.6 (Time Envelope). An algorithm satisfies the envelope condition provided that ∀v ∈ V, t ∈ R+ 0 : |Lv (t) − t| ≤ ρt for any execution on any graph. Intuitively, this means that the system maintains the best possible worstcase approximation to real time that the hardware clocks permit. For algorithms satisfying this condition, the following stronger lower bound holds. Corollary 5.7. For clock synchronization algorithms for which the envelope condition holds, the global skew is at least G ≥ (1 − ρ)JD + 2ρD. If Condition (5.1) is dropped, i.e., clock rates are unrestricted, we still have D . G ≥ (1 − ρ)JD + 2ρ 2 These bounds hold also if ρ, JD , and D are known to the algorithm and the graph is static. Proof Sketch. Suppose that U ≥ 1 + ρ, i.e., for two nodes in distance D, the estimate obtained by a flooding is always the most accurate source of information about each other’s clock values. For this setting, in [60] it is shown that in a certain execution a skew of (1 − ρ)JD can be built up and maintained indefinitely. As soon as this

46

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

skew is attained, all nodes have hardware and logical clock rate 1 − ρ. If we change the hardware clock rate of the node vmax with the largest clock value to 1 + ρ, we can make sure that it takes at least D time until the node vmin with the smallest clock value learns about this and may increase its logical clock rate in response. In this time, the skew grows by 2ρD. If decreasing logical clock rates is permitted, we make all nodes’ clocks up to distance bD/2c from vmax (with respect to the flooding4 ) run fast, such that both vmax and vmin do not learn about this for bD/2c time. This yields the second bound. We will now show how to almost match this bound. Each node v ∈ V ˆ max maintains a local estimate L of the maximum clock value in the system, v which satisfies the following definition. ˆ max Definition 5.8 (Max Estimates). Denote by L (t) the max estimate of v max ˆ node v ∈ V at time t, where Lv (0) := 0. We require the following properties. ˆ max (i) d L (t) = hv (t) (except when it is set to a larger value, see below). v dt

(ii) For some value Λ ∈ R+ , each node initiates a flooding distributing ˆ max ˆ max ˆ ∈ ΛN0 . L (t) at any time t when L (t) = L v v (iii) When receiving such a flooding message at time t that contains value ˆ and was sent at time ts , v sets L n o ˆ max ˆ max ˆ + (1 + ρ)(t − tˆs ) , L (t) := max L (t), L v v where tˆs ≤ t is such that t − tˆs ≥ t − (ts + JD ), i.e., t − tˆs is at least the time span for which v can be sure that the message was en route. ˆ max ˆ + Λ and v has not yet If L (t) was set to a value larger than L v participated in a flooding containing a value of at least the maximum ˆ max ˆ max multiple of Λ exceeded by L (t), v distributes L (t) by a flooding. v v If we want to satisfy the envelope condition, this simple method guarantees an almost optimal estimator. Lemma 5.9. Suppose an algorithm satisfies Definition 5.8 and define n o max max max ˆ ˆ ˆ L (t) := max Lv (t), max Lv (ts ) + (1 + ρ)(t − ts ) . v∈V v flooded at ts flooding not complete at t

Then it holds that 4 In general this is not clearly defined because we did not specify the flooding mechanism and the graph changes dynamically. For the example of BFS flooding and a static graph, however, it is.

5.3. BOUNDING THE GLOBAL SKEW

47

ˆ max (t) ≥ (1 − ρ)t (i) ∀v ∈ V, t ∈ R+ 0 : Lv ˆ max (t2 ) − L ˆ max (t1 ) ≤ (1 + ρ)(t2 − t1 ) (ii) ∀t1 ≤ t2 ∈ R+ 0 : L (iii) ∀v ∈ V, t ∈ R+ 0 : ˆ max ˆ max (t) − (1 − ρ)(JD + 1) − 2ρ D + 2 + L (t) > L v

Λ 1+ρ

.

Proof. Property (i) immediately follows from Property (i) in Definition 5.8, the minimum hardware clock speed, and the fact that the estimates are never ˆ max decreased. Since the L change only finitely often in finite time, they are v d ˆ max differentiable at all but countably many points with dt Lv (t) = hv (t). By d ˆ max Theorem 2.23, this implies that dt L (t) exists at all but countably many points and is bounded by 1 + ρ. Hence, for any interval (t1 , t2 ] during which Lmax is continuous, we have Z t2 d max Lmax (t2 ) = Lmax (t1 ) + L (τ ) dτ ≤ Lmax (t1 ) + (1 + ρ)(t2 − t1 ). t1 dt On the other hand, observe that when a value increases according to Property (iii) from Definition 5.8, it becomes at most ˆ + (1 + ρ)(t − tˆs ) ≤ L ˆ max (ts ) + (1 + ρ)(t − ts ). L ˆ max may only drop at discontinuities, we conclude that Property (ii) is As L satisfied. Regarding Property (iii), observe that Properties (i) and (ii) show the statement for times t < D + 1, i.e., we may w.l.o.g. assume that t ≥ D + 1. Thus, any node v ∈ V has received at least one flooding message. Denote by ˆ ˆ ∈ ΛN the largest multiple of Λ such that for the time t ˆ with L ˆ max (t ˆ ) = L L L L it holds that tLˆ ≤ t − (D + 1). ˆ max (t ˆ ) = L ˆ + Λ. From Propthe time when L Similarly, denote by tL+Λ ˆ L+Λ erty (ii), we get that ˆ max (t) ≤ L ˆ + Λ + (1 + ρ)(t − t ˆ ). L L+Λ ˆ max and Properties (ii) and (iii) of the max estimates, there By definition of L ˆ at a time ts ≥ t ˆ , must be some node w initiating a flooding containing L L ˆ max ˆ either because L (t ) = L or it receives a message from an ongoing flooding ˆ w L ˆ max ˆ causing it to set L ˆ ). Note that w (ts ) to a value of at least L + (1 + ρ)(ts − tL v receives a respective flooding message at a time tr ≤ t, as the definition of the dynamic diameter and the normalization of message delays imply that tr ≤ ts + D ≤ tLˆ + D + 1 ≤ t.

48

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

As delays are at most one, the estimate v receives at time tr is at least ˆ L+(1+ρ)(t s −tL ˆ −1). Thus, by Properties (i) and (iii) of the max estimates, we have that ˆ max ˆ + (1 + ρ)(max{tr − (t ˆ + JD ), 0} − 1) + (1 − ρ)(t − tr ). L (t) ≥ L v L ˆ max Observe that decreasing tr below tLˆ + JD will only increase L (t) due v max ˆ to the term (1 − ρ)(t − tr ), without affecting the bound on L (t). Thus, w.l.o.g., tr ≥ tLˆ + JD and hence ˆ max ˆ + (1 + ρ)(tr − (t ˆ + JD ) − 1) + (1 − ρ)(t − tr ). L (t) ≥ L v L Moreover, we have that t < tL+Λ + D + 1 by the definitions of D, tr and ˆ tL+Λ . Combining these bounds with the above inequalities yields ˆ ˆ max (t) − L ˆ max L (t) v ≤

Λ + (1 + ρ)(JD + 1 − (tL+Λ − tLˆ )) + 2ρ(t − tr ) ˆ

<

Λ + (1 + ρ)(JD + 1 − (tL+Λ − tLˆ )) + 2ρ(tL+Λ + D + 1 − (tLˆ + JD )) ˆ ˆ

=

Λ − (1 + ρ)(tL+Λ − tLˆ ) + (1 − ρ)(JD + 1) + 2ρ(D + 2 + tL+Λ − tLˆ ) ˆ ˆ Λ . (1 − ρ)(JD + 1) + 2ρ D + 2 + 1+ρ

(ii)

≤

Since v and t ≥ D + 1 were arbitrary, this shows Property (iii), concluding the proof. Property (iii) of the max estimates shown in this lemma gives rise to a very simple strategy to get arbitrary close to the lower bound from Corollary 5.7. If an algorithm makes sure that any node v ∈ V with Lv (t) = ˆ max ˆ max (t) at all times. L (t) is slow, this guarantees that maxv∈V {Lv (t)} ≤ L v Thus, any node whose logical clock falls by (1 − ρ)(JD + 1) + 2ρ(D + 2 + ˆ max Λ/(1 + ρ)) behind maxv∈V {Lv (t)} will notice that Lv (t) < L (t) and can v increase its clock speed to avoid larger skews. This is formalized as follows. Definition 5.10 (Max Estimate Algorithms). Suppose an algorithm satisfies Definition 5.8. It is a max estimate algorithm provided that, for all nodes v ∈ V and times t ∈ R+ 0 , it holds that max ˆ v (t) ⇒ lv (t) = hv (t) (i) Lv (t) = L ˆ max (ii) Lv (t) < L (t) ⇒ lv (t) ≥ v

1+ρ h (t). 1−ρ v

Theorem 5.11. Any max estimate algorithm has a global skew of Λ G := (1 − ρ)(JD + 1) + 2ρ D + 2 + 1+ρ and satisfies the envelope condition.

5.3. BOUNDING THE GLOBAL SKEW

49

Proof. Due to Property (i) from Definition 5.10, the fact that max estimates never decrease, and Property (i) from Definition 5.8, we have that ˆ max (t). Due to Property (ii) from Lemma 5.9 maxv∈V {Lv (t)} is bounded by L (for t1 = 0 and t2 = t) and the fact that lv (t) ≥ hv (t) for all nodes v ∈ V and times t, this shows that the algorithm satisfies the envelope condition. Moreover, using the notation from Lemma 5.9, Theorem 2.23 yields that ˆ max (t) − min{Lv (t)} = L ˆ max (t) + max{−Lv (t)} g(t) := L v∈V

v∈V

is differentiable at all but countably many points with d g(t) dt

≤

1+ρ−

min

{lv (t)}.

v∈V ˆ max (t)−Lv (t)=g(t) L

At any time when g(t) ≥ G, Property (iii) from Lemma 5.9 gives for ˆ max (t) − Lv (t) = g(t) ≥ G that Lv (t) < Lmax all v ∈ V for which L (t). v Thus, Property (ii) from Definition 5.10 yields that for any such v we have lv (t) ≥ (1 + ρ)hv (t)/(1 − ρ) ≥ 1 + ρ. It follows that at any time t when d g(t) ≤ 0. Note that g can only g(t) ≥ G and the derivative exists, we have dt max ˆ be discontinuous due to L , which due to Property (ii) from Lemma 5.9 cannot increase instantaneously. Thus, as g(0) = 0, we must have g(t) ≤ G ˆ max (t) for all v ∈ V and t, we see at all times. Recalling that Lv (t) ≤ L + that indeed for all v, w ∈ V and t ∈ R0 it holds that |Lv (t) − Lw (t)| ≤ G as claimed. Together with Corollary 5.7, this theorem states that for the class of algorithms satisfying the envelope condition the given technique is optimal. In particular, the classical algorithm which—roughly speaking—simply outputs ˆ max Lv (t) := L (t) as its logical clock value [99] achieves an optimal global v skew under the constraints that the envelope condition must hold and logical clocks are never slowed down in comparison to hardware clocks. However, this algorithm has two shortcomings: Neither does it exhibit a (non-trivial) gradient property nor does it provide any upper bound on logical clock rates. Theorem 5.11 demonstrates that the latter can be achieved by an algorithm that is uniform with respect to all model parameters but ρ. We can maintain exactly the same skew bounds by just slightly increasing clock speeds. We remark that speeding up clocks by less than factor (1 + ρ)/(1 − ρ) makes it impossible to compensate for the hardware clock drift, i.e., the range [1 − ρ, (1 + ρ)2 /(1 − ρ)] for logical clock rates is also best possible. Moreover, due to the following reasons the technique does not conflict with our approach to achieve an optimal gradient property:

50

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

• If the Algorithm Aµ presented in the next section requires logical clocks to be fast, this is due to a neighbor whose clock is certainly ahead. Similarly, if it requires clocks to be slow, this is due to a neighbor whose clock is certainly behind or because the nodes clock value attains the current maximum. • Property (iii) from Definition 5.8 can be changed in that any node ˆ max v ∈ V also increases L if it learns that a neighbor has a larger clock v ˆ max ˆ max that we value. This does not change the properties of L and L v have shown, yet ensures that nodes will not increase their logical clocks ˆ max beyond L because of the rules of Aµ . v • To prove Theorem 5.11, it is sufficient that nodes whose clocks attain the minimum clock value in the network increase their logical clocks faster than their hardware clock rate. Hence, the bound on the global skew still holds if nodes are required to be slow because of some neighbor that lags behind. To simplify the presentation, in the following we assume that a global skew of G as given by Theorem 5.11 is guaranteed. The reader who would like to see a combined algorithm minimizing both global and local skew is referred to [60, 71]; the dynamic setting does not pose new challenges in this regard.

5.4

An Algorithm with Optimal Gradient Property

In this section, we will present Algorithm Aµ , which is an abstract variant of the algorithm from [51]. We will discuss how to adapt the algorithm and its analysis to a more realistic model in Section 5.6. Algorithm Aµ operates in lock-step rounds of duration Θ(G/µ), where (r) a global skew of G is ensured by some separate mechanism. By T1 we (1) denote the time when round r ∈ N begins, i.e., T1 = 0. Each node v ∈ V maintains for each s ∈ N a dynamic subset of neighbors Nvs (t) ⊆ Nv (t), initialized to Nv (0). During each round, edges that have newly appeared until the beginning of the round are gradually incorporated into the algorithm’s (r) (r) decisions. To this end, each node sets Nv1 (T1 ) := Nv (T1 ) at the beginning of each round r ≥ 2. Similarly, for s ≥ 2 and each r ∈ N, each node sets (r) (r) Nvs (Ts ) := Nvs−1 (Ts ) at the times (r)

Ts(r) := T1

+

s−1 X s0 =1

where σ :=

G , σ s0 −1 (1 − ρ)µ

(1 − ρ)µ > 1. 2ρ

(5.2)

(5.3)

5.4. AN ALGORITHM WITH OPTIMAL GRADIENT PROPERTY

51

On the contrary, whenever an edge {v, w} disappears, node w is removed from all sets Nvs , s ∈ N (and vice versa). For the sake of a concise notation, however, we define that w is still present in Nv (t) and Nvs (t), s ∈ N, respectively (in contrast to the convention that if a variable changes at time t it already attains the new value at time t). We set the duration of a whole round to ! ∞ X 0 (2σ − 1)G G Tµ := , (5.4) 1+ σ −s +1 = 2 (1 − ρ)µ (σ − 1)(1 − ρ)µ 0 s =1

(r)

i.e., T1 := (r − 1)Tµ /2. Note that while appealingly simple, this scheme cannot be implemented in practice, as it requires that all nodes synchronously modify their sets at specific real times; this will be discussed in Section 5.6. From these sets and the estimated clock values of its neighbors, v determines its logical clock rate. More precisely, for a value κ > 2U , Aµ satisfies the following conditions. Definition 5.12 (Fast Condition). The fast condition on level s ∈ N states that for all nodes v ∈ V and times t ∈ R+ 0 we have ) ∃w ∈ Nvs (t) : Lw (t) − Lv (t) ≥ sκ ⇒ lv (t) = (1 + µ)hv (t). ∀u ∈ Nvs (t) : Lv (t) − Lu (t) ≤ sκ Informally, the fast condition accomplishes the following. If for some v ∈ V and s ∈ N node w ∈ V maximizes the expression Lv − Lw − ds (v, w)sκ (where ds (v, w) denotes the distance of v and w in the graph induced by the neighborhoods Nus , u ∈ V ), then w is fast, as otherwise for one of its neighbors this function would attain an even larger value. Thus, nodes which fall too far behind will catch up, granted that the nodes with large clock values are slow. Ensuring the latter is the goal of the second condition, which enforces that if for some v ∈ V and s ∈ N node w ∈ V maximizes the expression Lw − Lv − ds (v, w)(s + 1/2)κ, then w is slow. Definition 5.13 (Slow Condition). The slow condition on level s ∈ N states that for all nodes v ∈ V and times t ∈ R+ 0 we have ) ∀w ∈ Nvs (t) : Lw (t) − Lv (t) ≤ s + 21 κ + δ ⇒ lv (t) = hv (t), ∃u ∈ Nvs (t) : Lv (t) − Lu (t) ≥ s + 12 κ − δ where δ ∈ R+ is arbitrarily small. If neither of the conditions hold at node v ∈ V , its logical clock may run at any rate from the range [hv , (1 + µ)hv ], with the following exception.

52

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

Definition 5.14 (Controlled Maximum Clock). If for any node v ∈ V it holds that Lv (t) = maxw∈V {Lw (t)} at a time t ∈ R+ 0 , then lv (t) = hv (t). We point out that we introduce this condition mainly for convenience reasons, as the fast and slow conditions alone are sufficient to guarantee a strong gradient property. Nevertheless, in order to ensure a good global skew it is desirable that Definition 5.14 or a similar constraint is satisfied by the algorithm. The requirement that κ > 2U originates from the fact that the fast and slow conditions must not be contradictory. For this it is not enough that the preconditions for a node being fast respectively slow are mutually exclusive. As the nodes have only access to estimates of their neighbors’ clock values that may differ by up to U from the actual values, in case κ/2 ≤ U there are indistinguishable executions where either the fast or the slow condition hold at a node. Thus, κ > 2U is necessary for an algorithm to realize these rules in any circumstance. As the proof that κ > 2U is indeed sufficient to implement all conditions concurrently is a technicality, we omit it here and refer to [51]. Intuitively, the fast and the slow condition on level s ∈ N work together as follows. If on some path an adversary tries to accumulate clock skew beyond an average of (s + 1/2)κ per hop that is not already present somewhere in the graph, this can only happen at rate 2ρ due to the slow condition. On the other hand, this means that there must be much skew exceeding an average of sκ, which is reduced at a rate of at least (1 + µ)(1 − ρ) − (1 + ρ) ∈ Ω(µ) due to the fast condition. Hence, whatever the length of the longest possible path of average skew (s + 1/2)κ is, the longest path with an average skew that is by κ larger will be shorter by factor Θ(µ/ρ). The next section deals with formalizing and proving this statement.

5.5

Analysis of Algorithm Aµ

Before we can start our analysis, we need to introduce some definitions. In order to establish the gradient property on edges that appeared recently, Algorithm Aµ sequentially activates the slow and fast conditions starting from the lowest level. The following definition introduces the notions to capture the subgraphs we need to consider at a given time for a given level. Definition 5.15 (Level-s Edges and Paths). For s ∈ N, we say that edge s {v, w} ∈ V2 is a level-s edge at time t ∈ R+ 0 if w ∈ Nv (t) (and vice versa). s We define E (t) to be the set of level-s edges at time t. For any path p, denote by Ep the set of its edges and by `p its length. We define for s ∈ N and times t ∈ R+ 0 the set of level-s paths at time t as P s (t) := {path p | Ep ⊆ E s (t)}.

5.5. ANALYSIS OF ALGORITHM Aµ

53

Moreover, for each v ∈ V , the set of level-s paths starting at node v at time t is Pvs (t) := {path p = (v, . . .) | Ep ⊆ E s (t)}. Roughly speaking, within a certain range of skews the algorithm is proactive and quickly reacts to clock skews. For each level s ∈ N, the crucial magnitude is the skew on a path that exceeds an average of sκ, motivating the following definition. Definition 5.16 (Catch-Up Potential). For all paths p = (v, . . . , w), s ∈ N, and times t ∈ R+ 0 , we define ξps (t) := Lv (t) − Lw (t) − sκ`p . Moreover, for each v ∈ V , Ξsv (t) := max {ξps (t)}. s p∈Pv (t)

On the other hand, Aµ makes sure to not always act rashly, as otherwise the uncertainty in clock estimates could lead to all nodes being fast, inhibiting their ability to reduce skews. This time, for each level s ∈ N0 , the decisive value is the skew on a path exceeding an average of (s + 1/2)κ. Definition 5.17 (Runaway Potential). For all paths p = (v, . . . , w), s ∈ N0 , and times t ∈ R+ 0 , we define 1 ψps (t) := Lv (t) − Lw (t) − s + κ`p . 2 Moreover, Ψs (t) := max {ψps (t)}. s p∈P (t)

Essentially, we are going to show that the maximal length of a path with an average skew of at least (s + 1/2)κ decreases exponentially in s, leading to a logarithmic skew bound between neighbors. However, the algorithm perpetually adds edges on the various levels, which may also affect the skew bounds on long-standing paths. In order to reflect this in our analysis, we need to argue more generally, necessitating the following definition. Definition 5.18 (Gradient Sequences). A non-increasing sequence of positive Reals C = {Cs }s∈N0 is a gradient sequence if κC0 ≥ G. Depending on the considered gradient sequence, we can now formulate the condition under which the network is in a valid state.

54

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

Definition 5.19 (Legality). Given a weighted, dynamic graph G and a gradient sequence C, for each s ∈ N0 the system is s-legal with respect to C at time t ∈ R+ 0 , if and only if it holds that Ψs (t) ≤ κCs . The system is C-legal at time t if it is s-legal for all s ∈ N at time t with respect to C. The goal of our analysis is to prove that at all times the network remains in a legal state with respect to a certain gradient sequence whose values decrease—starting from the second level—exponentially. Some basic observations can immediately be deduced from the given definitions. Lemma 5.20. The following statements hold at all times t ∈ R+ 0 . (i) The system is 0-legal. (ii) If for some s ∈ N0 the system is s-legal and Cs+1 = Cs , then the system is also (s + 1)-legal. Proof. Statement (i) holds by definition as for any path p = (v, . . . , w) and any time t we have ψp0 (t) ≤ |Lv (t) − Lw (t)| ≤ G ≤ κC0 . For statement (ii), recall that Algorithm Aµ ensures at all times t and nodes v ∈ V that Nvs+1 (t) ⊆ Nvs (t), i.e., E s+1 (t) ⊆ E s (t). Hence, Ψs+1 (t) ≤ Ψs (t), which due to the assumptions of s-legality and Cs = Cs+1 yields the claim. The first property will serve as an induction anchor. Legality on higher levels then will follow from the bounds on the previous level. The second statement basically shows that we can add new edges level-wise, as this means that even if we “tamper” with the state on level s by inserting new edges, the inductive chain proving the skew bounds on higher levels can be maintained. We now prove the first key lemma of our analysis, which states that the nodes maximizing Ξsv must be fast whenever Ξsv > 0. This permits to bound (r) the rate at which Ξsv changes in terms of lv at all times except Ts , r ∈ N. At these times the algorithm inserts edges on level s, potentially reducing distances in the graph induced by the level-s edges, maybe increasing Ξsv . Lemma 5.21. Suppose for a node v ∈ V , some s ∈ N, and a time interval (r) s (t0 , t1 ] such that Ts 6∈ (t0 , t1 ] ⊂ R+ 0 for all r ∈ N it holds that Ξv (t) > 0 for all t ∈ (t0 , t1 ). Then we have that Ξsv (t1 ) − Ξsv (t0 ) ≤ Lv (t1 ) − Lv (t0 ) − (1 + µ)(1 − ρ)(t1 − t0 ).

5.5. ANALYSIS OF ALGORITHM Aµ

55

Proof. For a time t ∈ (t0 , t1 ], consider any path p = (v, . . . , w0 , w) maximizing Ξsv (t), i.e., ξps (t) = Ξsv (t). It holds that Lw0 (t) − Lw (t) ≥ sκ, s s s as otherwise we must have ξ(v,...,w 0 ) (t) > ξp (t). Similarly, for all u ∈ Nw (t) it holds that Lw (t) − Lu (t) ≥ sκ, s as otherwise we must have ξ(v,...,w,u) (t) > ξps (t). Hence, according to the fast condition, we have lw (t) = (1 + µ)hv (t) ≥ (1 + µ)(1 − ρ). By Theorem 2.23, Ξsv is thus differentiable at all but countably many points with derivative bounded by lv (t) − (1 + µ)(1 − ρ). (r) Because the algorithm adds edges on level s at the times Ts 6∈ (t0 , t1 ], s s paths can only be removed from Pv (t) during (t0 , t1 ], i.e., Ξv may only drop at discontinuities. We conclude that

Ξsv (t1 ) − Ξsv (t0 ) ≤ Lv (t1 ) − Lv (t0 ) − (1 + µ)(1 − ρ)(t1 − t0 ) as claimed. We will need a helper statement showing that under certain circumstances the preconditions of the previous lemma are always met. Lemma 5.22. Assume that for some node v ∈ V and s ∈ N, we have for all (r) r ∈ N that Ts 6∈ (t0 , t1 ] ⊂ R+ 0 . Suppose that Lv (t1 ) − Lv (t0 ) ≤ (1 + ρ)(t1 − t0 ) and that Ξsv (t1 ) > 2ρ(t1 − t0 ). Then ∀t ∈ [t0 , t1 ] : Ξsv (t) > 0. Proof. Let p = (v, . . . , w) ∈ P s (t1 ) maximize Ξsv (t1 ), i.e., Ξsv (t1 ) = ξps (t1 ). Since no edges are added on level s during (t0 , t1 ], we have that p ∈ P s (t) for all t ∈ [t0 , t1 ]. Thus, we can bound Ξsv (t)

≥

ξps (t)

=

ξps (t1 ) − (Lv (t1 ) − Lv (t)) + Lw (t1 ) − Lw (t)

≥

ξps (t1 ) − (Lv (t1 ) − Lv (t0 )) + Lv (t) − Lv (t0 ) + (1 − ρ)(t1 − t)

≥

Ξsv (t1 ) − 2ρ(t1 − t0 )

>

0.

56

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

Before we can move on to the main lemma, we need to derive a technical result from the slow condition. Intuitively, we show that if a node’s clock increased faster than at the maximum hardware clock rate during a given time period, the slow condition entails that there must have been a neighbor that was far ahead or all neighbors were close. Lemma 5.23. Assume that for some node v ∈ V and s ∈ N, we have for all (r) r ∈ N that Ts 6∈ (t0 , t1 ] ⊂ R+ 0 . If tmin := min{t ∈ [t0 , t1 ] | Lv (t1 ) − Lv (t) ≤ (1 + ρ)(t1 − t)} is greater than t0 , then ∃w ∈ Nvs (tmin ) : Lw (tmin ) − Lv (tmin ) > or ∀u ∈ Nvs (tmin ) : Lv (tmin ) − Lu (tmin ) <

s+

1 2

s+

1 2

κ

(5.5)

κ.

(5.6)

Proof. Assuming the contrary, the logical negation of (5.5) ∨ (5.6) is 1 κ ∀w ∈ Nvs (tmin ) : Lw (tmin ) − Lv (tmin ) ≤ s + 2 1 ∧ ∃u ∈ Nvs (tmin ) : Lv (tmin ) − Lu (tmin ) ≥ s + κ. 2 (r)

As Ts 6∈ (t0 , t1 ] for all r, no neighbors are added to Nvs in the time interval (t0 , t1 ]. Because edges that are removed at time tmin are still in Nvs (tmin ) and Nvs is finite at all times, there must be a closed time interval of non-zero length ending at time tmin during which Nvs does not change. Thus, as logical clocks are continuous and tmin > t0 , a time t0min ∈ (t0 , tmin ) exists such that for all t ∈ [t0min , tmin ] it holds that 1 ∀w ∈ Nvs (tmin ) = Nvs (t) : Lw (t) − Lv (t) ≤ s + κ+δ 2 1 ∧ ∃u ∈ Nvs (tmin ) = Nvs (t) : Lv (t) − Lu (t) ≥ s + κ − δ. 2 Due to the slow condition, we get that lu (t) = hv (t) for all t ∈ [t0min , tmin ] and therefore Lv (t1 )−Lv (t0min ) = Lv (t1 )−Lv (tmin )+Lv (tmin )−Lv (t0min ) ≤ (1+ρ)(t1 −t0min ), contradicting the minimality of tmin .

5.5. ANALYSIS OF ALGORITHM Aµ

57

We are now ready to advance to the heart of the proof of the gradient property of Aµ . Combining the fast condition and the slow condition on level s in form of the Lemmas 5.21 and 5.23, we show that if no edges are added on level s and the system is (s − 1)-legal for a certain amount of time, the system becomes legal on level s with respect to Cs ∈ Ω(ρCs−1 /µ), i.e., we gain a factor of Θ(µ/ρ). Since the proof of the lemma is quite intricate, we sketch the main concepts first. The proof will comprise two parts. Assuming that the claim does not hold for some path p, we backtrack the reason of its violation in time. Starting from p, we will construct a sequence of paths that are to be held “responsible” for the large skews in the system, removing nodes from p in the first part of the proof, then extending the path in the second. Whenever we switch from a path pi = (v, . . .) to a path pi+1 , from Lemma 5.23 we get a positive lower bound on ξpsi+1 that is comparable to the one we had on ξpsi . In between, the respective node v ran at an amortized rate of at most (1 + ρ), therefore Ξsv must have decreased at an (amortized) rate of at least (1 − ρ)µ − (1 + ρ) according to Lemma 5.21. Since we went back in time sufficiently far, this will finally lead to the conclusion that for some node w ∈ V , Ξsw ≤ Ψs−1 must have been overly large, contradicting the prerequisite that the system is (s − 1)-legal during the considered time interval. Lemma 5.24. Assume that for some s ∈ N the system is (s − 1)-legal during (r) the time interval t, t¯ ⊂ R+ 6∈ t, t¯ . 0 and that for all r ∈ N we have Ts Recall that (1 − ρ)µ σ= >1 2ρ and define ∇s :=

κCs−1 . (1 − ρ)µ

(5.7)

Then for all times t ∈ t + ∇s , t¯ it holds that Ψs (t) ≤ 2ρ∇s =

κCs−1 . σ

Proof. Assume for contradiction that there is a time t0 ∈ t + ∇s , t¯ with Ψs (t0 ) > κCs−1 /σ, i.e., there is a node v0 ∈ V and a path p = (v0 , . . . , vk ) ∈ Pvs0 (t0 ) with ψps (t0 ) = Ψs (t0 ) > 2ρ∇s . (5.8) Part I. Getting closer to vk . We inductively define a sequence of decreasing times t0 ≥ t1 ≥ . . . ≥ tl+1 , where tl+1 ≥ t0 − ∇s ≥ t. Given ti , we set ti+1 := min{t ∈ [t0 − ∇s , ti ] | Lvi (ti ) − Lvi (t) ≤ (1 + ρ)(ti − t)}.

(5.9)

58

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

For i ≤ k and ti ≥ t0 −∇s , time ti+1 is well-defined. We halt the construction at index l if tl+1 = t0 − ∇s or if ∃w ∈ Nvsl (tl+1 ) : Lw (tl+1 ) − Lvl (tl+1 ) >

s+

1 2

κ.

(5.10)

We will see later that the construction can never reach node vk . If the construction does not halt at index i, Lemma 5.23 states that ∀w ∈ Nvsi (ti+1 ) : Lvi (t) − Lw (t) <

s+

1 2

κ.

(5.11)

We show by induction that for all i ∈ {0, . . . , l} it holds that (k − i)κ − (1 + ρ)(t0 − ti+1 ) + Lvk (t0 ) − Lvk (ti+1 ). 2 (5.12) For the base case we compute

s ξ(v (t ) ≥ Ψs (t0 ) + i ,...,vk ) i+1

s ξ(v (t ) 0 ,...,vk ) 1

ξps (t0 ) − (Lv0 (t0 ) − Lv0 (t1 )) + Lvk (t0 ) − Lvk (t1 )

= (5.9)

ξps (t0 ) − (1 + ρ)(t0 − t1 ) + Lvk (t0 ) − Lvk (t1 ) kκ − (1 + ρ)(t0 − t1 ) + Lvk (t0 ) − Lvk (t1 ) ψps (t0 ) + 2 kκ Ψs (t0 ) + − (1 + ρ)(t0 − t1 ) + Lvk (t0 ) − Lvk (t1 ). 2

≥ = (5.8)

=

As for the induction step, assume that the claim holds for i < l. From Inequality (5.11) we know that Lvi (ti+1 ) − Lvi+1 (ti+1 ) <

s+

1 2

κ.

(5.13)

Thus, we can write s ξ(v (t ) i+1 ,...,vk ) i+1

= (5.13)

> (5.12)

≥

s ξ(v (t ) − Lvi (ti+1 ) + Lvi+1 (ti+1 ) + sκ i ,...,vk ) i+1

κ 2 (k − (i + 1))κ s Ψ (t0 ) + − (1 + ρ)(t0 − ti+1 ) 2 +Lvk (t0 ) − Lvk (ti+1 ). (5.14) s ξ(v (t ) − i ,...,vk ) i+1

5.5. ANALYSIS OF ALGORITHM Aµ

59

We need to show that i + 1 6= k as claimed. Assuming the contrary, Inequality (5.14) leads to the contradiction s 0 = ξ(v (t ) k) k

(5.14)

>

Ψs (t0 ) − (1 + ρ)(t0 − tk ) + Lvk (t0 ) − Lvk (tk )

≥

Ψs (t0 ) − 2ρ(t0 − tk )

(5.9)

≥

Ψs (t0 ) − 2ρ∇s

(5.8)

>

0.

Hence it follows that indeed i + 1 < k. Recall that ti+1 > t0 − ∇s because i 6= l and i + 1 < k, i.e., time ti+2 is defined. We obtain s ξ(v (t ) i+1 ,...,vk ) i+2

≥

s ξ(v (t ) − (Lvi+1 (ti+1 ) − Lvi+1 (ti+2 )) i+1 ,...,vk ) i+1

+Lvk (ti+1 ) − Lvk (ti+2 ) (5.9)

≥

s ξ(v (t ) − (1 + ρ)(ti+1 − ti+2 ) i+1 ,...,vk ) i+1

+Lvk (ti+1 ) − Lvk (ti+2 ) (5.14)

≥

(k − (i + 1))κ − (1 + ρ)(t0 − ti+2 ) 2 +Lvk (t0 ) − Lvk (ti+2 ), Ψs (t0 ) +

i.e., the induction step succeeds. Part II. Getting further away from vk . We define a finite chain of nodes wl , . . . , wm and times tl+1 ≥ tl+2 ≥ . . . ≥ tm+1 = t0 − ∇s , where wl := vl and tl+1 is the time at which the previous construction left off. The construction is inductive and maintains that for all i ∈ {l, . . . , m} it holds that Ξswi (ti+1 ) ≥ Ψs (t0 ) + ((1 − ρ)µ − 2ρ)(t0 − ti+1 )

(5.15)

and also that ti+1 = min{t ∈ [t0 − ∇s , ti ] | Lwi (ti ) − Lwi (t) ≤ (1 + ρ)(ti − t)},

(5.16)

First, we anchor an induction at index l. Observe that Equality (5.16) is satiesfied as it coincedes with Definition (5.9) for the indes i = l. Evaluat-

60

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

ing (5.12) for this index, we obtain for any t ∈ [tl+1 , t0 ] that s ξ(v (t) l ,...,vk )

s ξ(v (t ) + Lvl (t) − Lvl (tl+1 ) l ,...,vk ) l+1

=

−(Lvk (t) − Lvk (tl+1 )) (5.12)

≥

Ψs (t0 ) − (1 + ρ)(t0 − tl+1 ) + Lvl (t) − Lvl (tl+1 ) +Lvk (t0 ) − Lvk (t)

(5.17)

Ψs (t0 ) − 2ρ(t0 − tl+1 )

≥ (5.9)

≥

Ψs (t0 ) − 2ρ∇s

(5.8)

>

0.

Pvs0 (t)

Note that p ∈ for all t ∈ [t, t0 ], as no edges are added to E s during this (r) time interval due to the precondition that Ts 6∈ t, t¯ for all r ∈ N. Thus, s s as E(vl ,...,vk ) ⊆ Ep , for all t ∈ [t, t0 ], we have Ξvl (t) ≥ ξ(vl ,...,vk ) (t) > 0 during [tl+1 , t0 ] ⊆ t, t¯ . Applying Lemma 5.21 and Inequality (5.17) for time t0 , we see that Ξsvl (tl+1 )

≥

Ξsvl (t0 ) − (Lvl (t0 ) − Lvl (tl+1 ))

(5.18)

+(1 + µ)(1 − ρ)(t0 − tl+1 ) ≥

s ξ(v (t ) − (Lvl (t0 ) − Lvl (tl+1 )) l ,...,vk ) 0

+(1 + µ)(1 − ρ)(t0 − tl+1 ) (5.17)

≥

Ψs (t0 ) + ((1 − ρ)µ − 2ρ)(t0 − tl+1 ).

(5.19)

Next, suppose the claim holds for index i ∈ {l, . . . , m − 1}. Thus, as i 6= m, ti+1 > t0 − ∇s . If i = l, the previous construction must have halted because Inequality (5.10) was satisfied for some node w ∈ Nvsl (tl+1 ). In this case, we define wl+1 := w such that 1 Lwl+1 (tl+1 ) − Lvl (tl+1 ) > s + κ. 2 On the other hand, if i > l, we claim that there also must exist a node wi+1 ∈ Nws i (ti+1 ) fulfilling 1 Lwi+1 (ti+1 ) − Lwi (ti+1 ) > s + κ. (5.20) 2 Assuming for contradiction that this is not true, from the definition of ti+1 and Lemma 5.23 we get that 1 Lwi (ti+1 ) − Lwi−1 (ti+1 ) < s + κ. (5.21) 2

5.5. ANALYSIS OF ALGORITHM Aµ

61

Note that because i > l, we already have established Inequality (5.20) for index i − 1. We infer that (5.20,5.21) 1 κ Lwi−1 (ti ) − Lwi−1 (ti+1 ) < Lwi (ti ) − s + 2 1 − Lwi (ti+1 ) − s + κ 2 = Lwi (ti ) − Lwi (ti+1 ) (5.16)

≤

(1 + ρ)(ti − ti+1 ),

which yields Lwi−1 (ti−1 ) − Lwi−1 (ti+1 )

=

Lwi−1 (ti−1 ) − Lwi−1 (ti ) +Lwi−1 (ti ) − Lwi−1 (ti+1 )

(5.16)

<

(1 + ρ)(ti−1 − ti+1 ).

This contradicts Equality (5.16), implying that indeed Statement (5.20) must hold true for an appropriate choice of node wi+1 . Hence, whatever path maximizes Ξswi (ti+1 ), we can extend it by the edge {wi+1 , wi } in order to see that Ξswi+1 (ti+1 )

Ξswi (ti+1 ) + Lwi+1 (ti+1 ) − Lwi (ti+1 ) − sκ

≥ (5.20)

> (5.15)

≥

Ξswi (ti+1 ) Ψs (t0 ) + ((1 − ρ)µ − 2ρ)(t0 − ti+1 ).

(5.22)

We now can define ti+2 := min{t ∈ [t0 − ∇s , ti+1 ] | Lwi+1 (ti+1 ) − Lwi+1 (t) ≤ (1 + ρ)(ti+1 − t)} in accordance with Equality (5.16). Since we have that (5.22)

Ξswi+1 (ti+1 ) ≥ Ψs (t0 ) Lemma 5.22 shows that Lemma 5.21 yields that Ξswi+1 (ti+2 )

Ξswi+1 (t)

≥

(5.8,5.16)

>

2ρ(ti+1 − ti+2 ),

> 0 for all t ∈ [ti+2 , ti+1 ]. Thus, applying

Ξswi+1 (ti+1 ) − (Lwi+1 (ti+1 ) − Lwi+1 (ti+2 )) +(1 + µ)(1 − ρ)(ti+1 − ti+2 )

(5.16)

≥ (5.15)

≥

Ξswi+1 (ti+1 ) + ((1 − ρ)µ − 2ρ)(ti+1 − ti+2 ) Ψs (t0 ) + ((1 − ρ)µ − 2ρ)(t0 − ti+2 ),

62

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

i.e., the induction step succeeds. Note that the construction must halt after a finite number of steps because clock rates are bounded, we consider a finite time interval, and clock values increase by at least (s + 1/2)κ > 0 whenever we move from a node wi to a node wi+1 . Thus, the induction is complete. Inserting i = m and tm+1 = t0 − ∇s into Inequality (5.15), we conclude that Ξswm (tm+1 )

(5.15)

≥ (5.8)

> (5.7)

=

Ψs (t0 ) + ((1 − ρ)µ − 2ρ)∇s (1 − ρ)µ∇s κCs−1 .

Therefore, using that Pws m (tm−1 ) ⊆ P s−1 (tm−1 ), for any path p0 maximizing Ξswm (tm+1 ) we get Ψs−1 (tm+1 )

≥

(tm+1 ) ψps−1 0

≥

ξps0 (tm+1 )

=

Ξswm (tm+1 )

> κCs−1 . As t0 − ∇ ∈ t, t¯ , this contradicts the precondition that the system is (s − 1)-legal during this time interval, finishing the proof. s

From this relation between legality on the levels s − 1 and s we can derive the main theorem of this section. Essentially, the lemma guarantees that the system is legal with respect to a gradient sequence with Cs = Cs−1 /σ except for the level where we added edges most recently. The careful definition of the (r) insertion times Ts makes sure that the algorithm waits for sufficiently long before proceeding to the next level, i.e., the given level will have stabilized again prior to losing the factor σ decrease of Ψs+1 compared to Ψs on the next level. Therefore, we need to “skip” one level in the respective gradient sequence, which is does not affect the asymptotic bounds. Theorem 5.25. At all times t ∈ R+ 0 , the system is legal with respect to the gradient sequence G G G G G C := , , , 2 , 3 ,... . κ κ σκ σ κ σ κ Proof. Define for s ∈ N the gradient sequence C s by ( 0 G/(σ s κ) if s0 < s ∀s0 ∈ N0 : Css0 := s0 −1 G/(σ κ) otherwise.

5.5. ANALYSIS OF ALGORITHM Aµ

63

We claim that forh all r, s ∈ N the system is legal with respect to C s during (r) (r) the time interval Ts , Ts+1 . Note that because by definition we have for all s ∈ N and s0 ∈ N0 that Css0 ≤ Cs0 , the statement of the theorem immediately follows as soon as this claim is established. Assume for contradiction that the claim is false. Suppose that, for some h (r) (r) r, s ∈ N, tmax ∈ Ts , Ts+1 is the infimum of all times when it is violated and s¯ ∈ N is the minimal level for which legality is violated at this time (recall that Statement (i) of Lemma 5.20 gives that the system is 0-legal at all times), i.e., Ψs¯ t˜max > κCs¯s (5.23) for some time t˜max > tmax that is arbitrarily close to tmax . We make a case distinction. The first case is that s¯ = s. Since we have s Css = Cs−1 and w.l.o.g. the system is (s − 1)-legal until time t˜max (as s¯ is minimal and t˜max − tmax is arbitrarily small), this is an immediate contradiction to Lemma h 5.20, which states that the system is s-legal throughout the (r) time interval Ts , t˜max . The second case is that s¯ 6= s and tmax <

G , (1 − ρ)µσ min{¯s−2,0}

implying that r = s = 1 and thus s¯ ≥ 2. However, as Aµ satisfies Definition 5.14 and no edges are added on level s¯ until that time, we have that Ψs¯ (tmax ) ≤ 2ρtmax <

(5.3) 2ρG G = s¯−1 = κCs¯1 . (1 − ρ)µσ s¯−2 σ

This is a contradiction, as logical clocks are continuous and no edges are (r) (r) added to E s¯ during (tmax , t˜max ] ⊂ Ts , Ts+1 , implying that Ψs¯ does not increase at discontinuities during (tmax , t˜max ]. The third case is that we have s¯ > s and the second case does not apply. We claim that the system is (¯ s − 1)-legal with respect to C s during the time interval h κCs¯s−1 ˜ G ˜max ⊂ Ts¯(r−1) , Ts¯(r) , tmax − , t , tmax ⊆ tmax − s ¯ −2 (1 − ρ)µ (1 − ρ)µσ (0)

where Ts¯−1 is simply to be read as 0. This follows from the observations that (i) tmax is sufficiently large for the left boundary to be at least 0 because Case 2 does not apply, (ii) for r ≥ 2 we have that tmax −

G G (r) ≥ T1 − (1 − ρ)µσ s¯−2 (1 − ρ)µ

(5.2,5.4)

≥

(r−1)

Ts¯−1 ,

64

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

(iii) the minimality of s¯ ensures that the system must be (¯ s − 1)-legal until some time that is strictly larger than tmax , and (iv) the time difference t˜max − tmax can be chosen arbitrarily small. Therefore, we may apply Lemma 5.24 to the time t˜max on level s¯, yielding the contradiction Ψs¯(t˜max ) ≤

κCs¯s−1 = κCs¯s σ

(5.24)

to Inequality (5.23). The fourth and final case is that s¯ < s. We get that the system is (¯ s − 1)legal with respect to C s during the interval (5.2) h κCs¯s−1 ˜ G (r) ˜max ⊂ Ts¯(r) , Ts¯(r+1) , tmax − , tmax ⊆ Ts¯+1 − , t (1 − ρ)µ (1 − ρ)µσ s¯−1 which by Lemma 5.24 again implies the contradictory Inequality (5.24). Since all possibilities lead to a contradiction, our initial assumption that tmax is finite must be wrong, concluding the proof. This theorem can be rephrased in terms of the gradient property and stabilization time of Aµ . Corollary 5.26. Algorithm Aµ exhibits a stable gradient skew of G 5 Sµ (d) := κd logσ + κd 2 with stabilization time Tµ =

2(2σ − 1)G . (σ − 1)(1 − ρ)µ

Granted that µ ≥ (2 + ε)ρ/(1 − ρ) for any constant ε > 0, κ ∈ O(U ), and G ∈ O(DU ), we have D Sµ ∈ O U d logµ/ρ d and Tµ ∈ O

G . µ

Proof. Since one round of Aµ takes Tµ /2 time, any edge that has been present for at least Tµ time existed during a complete round. Setting s(d) := d1 + logσ (G/(κd))e, it is thus sufficient to show that at any time t ∈ R+ 0 it holds for any path p = (v, . . . , w) ∈ P s(d) (t) of length `p = d that 3 Lv (t) − Lw (t) ≤ s(d) + κd. 2

5.5. ANALYSIS OF ALGORITHM Aµ

65

Observe that we defined s(d) such that Cs(d) ≤ d for the gradient sequence C from Theorem 5.25. By the theorem and the definition of s(d)-legality we have that 1 κd = ψps(d) (t) Lv (t) − Lw (t) − s(d) + 2 ≤

Ψs(d) (t)

≤

κCs(d)

≤

κd,

which can be rearranged to the desired inequality. We will conclude this section with exemplarily demonstrating two more interesting properties of Aµ . The first one is that the algorithm is selfstabilizing (cf. [26]). Definition 5.27 (Self-Stabilization). An algorithm is called self-stabilizing, if it converges from an arbitrary initial state to a correct state with respect to a certain specification, i.e., after finite time the system remains in a correct state. It is called T -self-stabilizing, if this takes at most T time units. A self-stabilizing algorithm is capable of recovering from arbitrary transient faults, since as soon as errors cease the system will restore a correct configuration. In our context, a correct state means that the definitions of Aµ having a stable gradient skew of Sµ with stabilization time Tµ are satisfied. Corollary 5.28. Suppose Aµ is initialized at time 0 with arbitrary logical clock values, however still the global skew of G is guaranteed at all times. Then (2) Aµ self-stabilizes within Tµ /2 time, i.e., at all times t ≥ T1 the system is legal with respect to the gradient sequence C given in Theorem 5.25. Proof. h We modify the proof of Theorem 5.25 in that for all s ∈ N and times (1) (1) t ∈ Ts , Ts+1 , the system is legal with respect to the gradient sequence ˜ s given by C ( s0 if s0 < s 0 s ˜s0 := G/(σ κ) ∀s ∈ N0 : C s−1 G/(σ κ) otherwise, i.e., we give no non-trivial guarantees for levels s0 ≥ s during the first round. Therefore, for r = 1 the case that s¯ ≥ s is covered by Statement (ii) from Lemma 5.20, i.e., all but the fourth case from the case differentiation in the proof become trivial. This case can be treated analogously to the theorem, ˜ss0 coincide for all s0 ≤ s. Finally, the case that r ≥ 2 is treated as Css0 and C analogously as well.

66

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

We remark that the sets Nvs , v ∈ V , s ∈ N, are also part of the system’s state. However, these sets are updated during each round of the algorithm and thus also stabilize correctly. Moreover, note that the technique to maintain a small global skew presented in Section 5.3 is also self-stabilizing, however with unbounded stabilization time, as the initial clock skew might be arbitrarily large. This drawback can be overcome by adding a mechanism to detect a (significant) violation of the bound on the global skew, overall resulting in an O(Tµ )-self-stabilizing algorithm (granted that clock estimates are also O(Tµ )-self-stabilizing). An interesting side effect of this property is that, regardless of G, a smaller maximum skew during a time interval of length at least Tµ will temporarily result in a smaller stable gradient skew. However, without further modification, the stabilization time of the algorithm with respect to integrating new edges remains Tµ , as the algorithm is not aware of the fact that edges could be incorporated more quickly. The second property of Aµ we would like to highlight is that, by slight abuse of notation, the algorithm by itself ensures a global skew linear in U and the “classical” diameter of GTµ . Corollary 5.29. For all t ∈ R+ 0 , denote by D(t) the diameter of GTµ (t) and assume that D(t) ≤ Dmax for all t. If µ ≥ (1+ε)4ρ/(1−ρ) for some constant ε > 0 and κ ∈ O(U ), we have a global skew of G∇ :=

3(1 − ρ)µ κDmax ∈ O(U Dmax ). 2((1 − ρ)µ − 4ρ)

Proof. Assume w.l.o.g. that G∇ ≤ G. Suppose for the sake of contradiction that tmax ∈ R+ 0 is the infimum of all times when the global skew of G∇ is violated, i.e., for all t ≤ tmax we have that maxv,w∈V {Lv (t) − Lw (t)} ≤ G∇ . Define G∇ ˜ 1 := ∇ . (1 − ρ)µ Trivially, we have for any node v ∈ V that Ξ1v (t) < G∇ at all times t ≤ tmax . ˜ 1 , we see Reasoning as in Lemma 5.24 for s = 1, but with ∇1 replaced by ∇ that for all r ∈ N and t ≤ tmax we have h (r) ˜ 1 , T (r+1) ⇒ Ψ1 (t) ≤ 2ρ∇ ˜ 1. t ∈ T1 + ∇ 1 Thus, at such times t, we have for any two nodes v, w ∈ V joined by a shortest path p ∈ Pv1 (t) that Lv (t) − Lw (t) ≤ Ψ1 (t) +

3 ˜ 1 + 3 κDmax = G∇ − 2ρ∇ ˜ 1. κ`p ≤ 2ρ∇ 2 2

5.6. DISCUSSION

67

h (r) ˜ 1 , T (r+1) for any r ∈ N. Moreover, since clock Therefore, tmax 6∈ T1 + ∇ 1 ˜ 1 ≤ ∇1 < Tµ /2 because G∇ ≤ G, this implies for values are continuous and ∇ (r)

≤ tmax that n o (r) (r) ˜ 1. max Lv T1 − Lw T1 ≤ G∇ − 2ρ∇

any r ∈ N with T1

v,w∈V

Considering that maxv,w∈V {Lv (t) − Lw (t)} grows hat most at rate i2ρ since (r) (r) ˜ 1 for any Aµ satisfies Definition 5.14, we see that also tmax 6∈ T1 , T1 + ∇ (r)

r with T1 ≤ tmax . (r ) the Now let rmax ∈ N be maximal such that T1 hmax ≤ tmax . Combining i (r +1) (rmax ) , contra, T1 max previous observations, we conclude that tmax 6∈ T1 dicting the definition of rmax . We remark that since Aµ is also self-stabilizing, indeed it is sufficient that D(t) ≤ Dmax for a sufficient period of time in the recent past in order to ensure the stated bound. However, employing a flooding mechanism to ensure a small global skew is more reliable, as it tolerates a much weaker connectivity of the dynamic graph G.

5.6

Discussion

Throughout this chapter, we employed a greatly simplified model in order to facilitate the presentation of Aµ . In this section, we will discuss our assumptions and briefly sketch some existing and possible generalizations. Note that all proposed adaptions of Aµ are mutually compatible, i.e., all addressed issues may be dealt with simultaneously. Bounded local memory, computation, and number of events in finite time. The way we described Aµ , in each round r ∈ N there are (r) infinitely many times Ts when each node v ∈ V needs to perform state changes, infinitely many sets Nvs to store, etc. However, as we have finite skew bounds, for each node there is a maximal level sv ∈ N up to which these times and sets are relevant. This value can be bounded by G sv ≤ 1 + logσ κ and v needs to check the algorithm’s conditions only up to that level. Asynchronous edge insertions. It is mandatory to drop the assumption that all nodes synchronously insert new edges on level s at the times (r) Ts , r ∈ N. This can be elegantly solved by adding edges at certain logical clock values instead of fixed points in time. Adding an additional “buffer

68

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

time” ensures that level s stabilized at least locally before advancing to level s + 1. Putting it simply, the skew bounds from level s − 1 are tight enough to synchronize edge insertions on level s sufficiently precisely to maintain a bound of Θ(G/µ) on the overall stabilization time. More specifically, the key observations are that (i) we stabilize new edges inductively starting from low levels, (ii) at the time when we insert a new edge adjacent to a node v, up to certain distances from v the lower levels s0 < s will already have stabilized, (iii) the skew bounds from these levels s0 < s make sure that nodes that are close to v will not insert their newly appeared adjacent edges much earlier or later than v on level s, and (iv) paths that are longer than Cs0 for some s0 < s will not maximize Ψs , because s0 -legality ensures that they exhibit an average skew smaller than (s + 1/2)κ, i.e., we do not need to care about distant nodes. Roughlyh speaking, instead of reasoning globally about Ψs (r) (r) and the time intervals Ts , Ts+1 , we argue locally about some properly h (r) (r) measured in terms , L−1 Ts+1 defined Ψsv and time intervals L−1 Ts v v of the logical time of v. Naturally, localizing the proofs and relating local times of different nodes adds a number of technical difficulties. We refer the interested reader to [51] for details. Asynchronous edge arrivals and departures. Another unrealistic assumption we made is that both endpoints of an edge detect its appearance or disappearance at exactly the same point in time. Obviously, this is not possible in practice, as nodes are not perfectly synchronized, send only finitely many messages, etc. If there is some delay τ between the two nodes recognizing the event, one of them will still act as if the edge was still operational for up to τ time while the other does not. It is not hard to believe that, since nodes can estimate each other’s progress during τ time up to O(µτ ) by means of their hardware clock, increasing κ by O(µτ ) is enough to address this problem. Although this is true, e.g. the fact that Ψsv and Ξsv cannot be related to nodes’ actions as cleanly and directly as before increases the complexity of the proofs considerably. Again, we point to [51] for a full proof respecting these issues. Non-differentiable, discrete, and/or finite clocks. We assumed that hardware and logical clocks are differentiable in order to support the reader’s intuition. The proofs however rely on the progress bounds of the clocks only. In a practical system, hardware and logical clocks will both have finite and discrete values. Putting it simply, having discrete clocks does not change anything except that the clock resolution imposes a lower bound on the uncertainty U (see also [59]). The size of clock values can be kept finite by standard wrap-around techniques, where one needs to make sure that the range of the clocks exceeds 2G in order to be able to compare values correctly at all times.

5.6. DISCUSSION

69

Initialization. Initialization can be done by flooding. For many reasonable sets of system parameters, the initial clock skews will then be small enough not to violate the gradient skew constraints (cf. [59]). Alternatively, one can simply treat all edges as newly appeared and wait for the algorithm to stabilize. Note that a node that arrives late can set its clock value to the first one it becomes aware of without violating the global skew constraint. However, a (long-lasting) partition of the system necessitates to fall back to self-stabilizing solutions, temporarily jeopardizing the gradient property for all but one of the previous components. Non-uniform uncertainties. In almost any practical setting, links will be different. Consequently, we would like Aµ to deal with different uncertainties Ue for each edge e ∈ E(t). This is easily achieved by applying the same rules as Aµ with κ replaced by κe > 2Ue (respectively κ ∈ Ω(Ue + µτe ), where τe replaces the value τ from above). In contrast to the previous issues, this does not affect the analysis at all—essentially we replace κ by κe also there (see [50, 51, 52, 55]). In general, uncertainties may also be a function of time, e.g. due to fluctuations in wireless link quality. This can be addressed, too, as we will see in a moment. Short-lived edge failures. If an edge fails for a short period of time, it is not desirable that we have to wait for Ω(Tµ ) time until again a nontrivial gradient property holds. Also here one could make use of the fact that the estimates nodes have of their neighbors deteriorate at rate at most O(µ) in absence of communication. The idea is to gradually increase the uncertainty of a failed edge accordingly and decrease it again once the edge returns to being operational. This has to happen slowly, though, leading to a suboptimal stabilization time when applied to an edge that has been absent for a long period of time or is completely new [50]. However, we conjecture that this technique can be freely combined with the edge integration scheme of Aµ in order to achieve fast stabilization in both cases. Moreover, it offers a seamless solution to the aforementioned problem of edge uncertainties that vary over time. Directed Graphs. Our model considers simple graphs only, i.e., there are no unidirectional links and uncertainties are symmetric. This requirement is crucial for Aµ to work. However, we made no assumption on how nodes obtain clock estimates of their neighbors. To give but one example on how to derive a symmetric estimate graph from an asymmetric communication infrastructure, consider the following setting. Assume that estimates are computed from direct communication and we have a strongly connected directed graph. For any two neighbors, the receiving node may respond to each message with the estimated clock difference via multihop communication. The returned estimate will lose accuracy in the order of µτ if it travels √ for τ time. As long as µτ is not too large (note that choosing µ := ρ still

70

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

yields an asymptotically optimal gradient property), this way accurate estimates can be obtained. The value of Ue for link e ∈ V2 then simply is the larger of the two accuracy bounds. Unrestrained clock rates. We imposed that µ ∈ O(1). A lower bound from [60] shows that permitting larger clock rates is of no use when trying to achieve a stronger gradient property. One might now ask why we cannot obtain skew bounds that are linear in the distance between each pair of nodes by making σ ∈ Θ(µ/ρ) sufficiently large such that the logarithm in Sµ is bounded by a constant. The problem that arises is that if the algorithm wants to make use of a large value of µ ∈ ω(1), this entails that also the uncertainty U must grow linearly in µ, as there is insufficient time to communicate the intended clock progress first. However, as mentioned in Section 5.2, a larger value of µ might help to achieve a better stabilization time if one relaxes the gradient property. Opposed to that, the main impact of permitting clock rates of o(1) indeed is that a skew bound linear in the distance can be maintained. Phrasing it differently, one can pose the question what is the strongest possible guarantee on the progress of logical clocks if we require a skew of O(U ) between neighbors. A simple transformation answers this question by means of Theorem 5.25 and a second lower bound from [60]. If we slow down logical clock rates by factor f := logσ (G/(U D)) as soon as the clock skew to any neighbor exceeds Ω(U ), nodes are capable of determining logical clock skews to neighbors up to O(U/f ) whenever they exceed some threshold Ω(U ). Thus, we can “scale down” the fast and slow conditions on higher levels, i.e., substitute κ by some value in O(κ/f ) and still guarantee that the conditions can be implemented. Therefore, analogously to Corollary 5.26, we will get a skew bound of O(U d) for paths of length d. The lower bound from [60] tells that this bound is asymptotically optimal for clock rates of Ω(1/f ). In some sense, this can be read as a statement about synchronizers: A synchronizer needs to guarantee a skew of O(U ) between neighbors in order to locally trigger a synchronous round every O(U ) logical time. Thus, the derived bounds state that under this constraint we can guarantee that any node can trigger a round locally at least every O(f U ) real time, but not better. In contrast, a traditional synchronizer might stall a node for up to Ω(D) time, where D denotes the diameter of the communication graph in question. Unknown system parameters. As can be seen from the analysis, asymptotically optimal global skew, stable gradient skew, and stabilization time can be achieved without knowledge on the drift ρ. In fact, we merely need ρ to be bounded away from one in order to compute sufficiently large values µ, Tµ , etc. Knowledge on ρ is required only if we want to determine the minimal feasible value of µ due to the condition that µ > 2ρ/(1 − ρ).

5.6. DISCUSSION

71

In contrast, it is crucial to have a reasonable upper bound on the uncertainty U , as it affects the gradient property almost linearly through κ. Similarly, we need a good bound on τ if we consider the setting where nodes detect edge arrivals and departures asynchronously. We remark that in practice it is a viable option to employ optimistic bounds on U and τ and adapt them if they turn out to be too small. The fact that the executions leading to large skews require “everything to go wrong” for a considerable amount of time together with the strong stabilization properties of Aµ indicate that such a scheme should be quite robust to all kinds of dynamics. Adaptive global skew. The one “parameter”, so to speak, that remains to be understood is the global skew G. In fact, it is relevant for the stabilization time of the algorithm only, as the analysis of the gradient property can be conducted with the “true” global skew, i.e., maxv,w∈V,t {Lv (t)−Lw (t)}, as long as the estimate G of this value the algorithm uses is a valid upper bound at all times. However, ideally the algorithm should be able to incorporate new edges quickly whenever the maximum skew is small. This is particularly important in highly dynamic settings, where on the one hand we strive for resilience to temporary weak connectivity (e.g. a list or even a disconnected graph), but also for a small stabilization time whenever the topology is more benign. To this end, a strong estimator is in demand that provides the nodes with an upper bound on the current maximum skew in the network. Based on such an estimator, nodes can change the duration of a round appropriately to achieve good stabilization times in an adaptive manner. In a static graph, the diameter can be determined easily by a floodingecho initiated by some leader. The determined value then is distributed by a second flooding to all nodes. In a dynamic graph, this approach becomes difficult as the dynamic diameter of the graph might increase while a flooding is in progress. However, if the diameter of the graph increases considerably, the global skew bound does also, i.e., it is feasible to spend a larger amount of time to incorporate new edges. Moreover, the actual maximal clock skew between any two nodes increases slowly at a rate of at most 2ρ. Thus, it is possible to use an approach where only nodes v ∈ V that receive an update on the estimate of the global skew participate in a round, that is, add new edges to their neighborhood subsets Nvs . The participating nodes can be certain that the used bound remains valid until their new edges have stabilized. The remaining nodes simply wait until a flooding-echo-flooding sequence for the new, larger diameter terminated (from their perspective) successfully before adding their edges in the respective round. This simple approach achieves the desired goal, but suffers from a very large message size. In order to be certain that a flooding terminated, all nodes need to report back to the leader. More sophisticated techniques are

72

CHAPTER 5. GRADIENT CLOCK SYNCHRONIZATION

in demand in order to obtain an algorithm that is both practical and adaptive with respect to the global skew.

Part II

Load Balancing

Chapter 6

An Introduction to Parallel Randomized Load Balancing “Divide and conquer.” – Gaius Julius Caesar.

Apart from fault-tolerance, maybe the main motivation to study distributed computing is parallelism. If a task cannot be solved fast enough with a single processor, just use more! Nowadays it is common folklore that this approach does help to a certain degree, but eventually hits fundamental barriers. Sorting n items, for instance, requires O((n log n)/k) rounds on k ≤ n processors, whereas in general one cannot be faster than Ω(log n) rounds – regardless of the amount of hardware thrown at the problem. In this part of this thesis, we will examine a more subtle obstacle to parallelism in a distributed world. Even if the considered problem is “embarrassingly parallel”, coordinating a distributed system’s efforts to solve it often incurs an overhead. To achieve perfect scalability, also coordination must be parallel, i.e., one cannot process information sequentially, or collect the necessary coordination information at a single location. A striking and fundamental example of coordination is load balancing, which occurs on various levels: canonical examples are job assignment tasks such as sharing work load among multiple processors, servers, or storage locations, but the problem also plays a vital role in e.g. low-congestion circuit routing, channel bandwidth assignment, or hashing, cf. [86]. A common archetype of all these tasks is the well-known balls-into-bins problem: Given n balls and n bins, how can one place the balls into the bins quickly while keeping the maximal bin load small? As in other areas where centralized control must be avoided (sometimes because it is impossible), the

76

CHAPTER 6. INTRODUCTION TO LOAD BALANCING

key to success is randomization. Adler et al. [1] devised parallel algorithms for the problem whose running times and maximal bin loads are essentially doubly-logarithmic. They provide a lower bound which is asymptotically matching the upper bound. However, their lower bound proof requires two critical restrictions: algorithms must (i) break ties symmetrically and (ii) be non-adaptive, i.e., each ball restricts itself to a fixed number of candidate bins before communication starts. Dropping these assumptions, we are able to devise more efficient algorithms. More precisely, we are going to present a simple adaptive and symmetric algorithm achieving a maximal bin load of two within log∗ n + O(1) rounds of communication w.h.p. This is achieved using an asymptotically optimal number of messages. Complementing this upper bound, we prove that—given the constraints on bin load and communication complexity—the running time of our first algorithm is (1 + o(1))-optimal for symmetric algorithms. Our bound necessitates a new proof technique; it is not a consequence of the impossibility to gather reliable information in time (e.g. due to asynchronicity, faults, or explicitly limited local views of the system), rather it emerges from bounding the total amount of communication. Thus, we demonstrate that breaking symmetry to a certain degree, i.e., reducing entropy far enough to guarantee small bin loads, comes at a cost exceeding the apparent minimum of Ω(n) total bits and Ω(1) rounds. In this light, a natural question to pose is how much initial entropy is required for the lower bound to hold. We show that the crux of the matter is that bins are initially anonymous, i.e., balls do not know globally unique addresses of the bins. This captures the essence of the aforementioned condition of symmetry imposed by Adler et al. Discarding this requirement, we give an asymmetric adaptive algorithm that runs in constant time and sends O(n) messages w.h.p., yet achieves a maximal bin load of three. Completing the picture, we show that the same is possible for a symmetric algorithm if we incur a slightly superlinear number of messages or non-constant maximal bin loads. Jointly, the given bounds provide a full characterization of the parallel complexity of the balls-into-bins problem.

6.1

Model

The system consists of n bins and n balls, and we assume it to be fault-free. We employ a synchronous message passing model, where one round consists of the following steps: 1. Balls perform (finite, but otherwise unrestricted) local computations and send messages to arbitrary bins. 2. Bins receive these messages, do local computations, and send messages to any balls they have been contacted by in this or earlier rounds.

6.2. RELATED WORK

77

3. Balls receive these messages and may commit to a bin (and terminate). Note that (for reasonable algorithms) the third step does not interfere with the other two. Hence, the literature typically accounts for this step as “half a round” when stating the time complexity of balls-into-bins algorithms; we adopt this convention. The considered task now can be stated concisely. Problem 6.1 (Parallel Balls-into-Bins). We want to place each ball into a bin. The goals are to minimize the total number of rounds until all balls are placed, the maximal number of balls placed into a bin, and the amount of involved communication. In order to classify balls-into-bins algorithms, we fix the notions of an algorithm being adaptive or symmetric, respectively. Definition 6.2 (Adaptivity). A balls-into-bins algorithm is non-adaptive, if balls fix the subset of the bins they will contact prior to all communication. An algorithm not obeying this constraint is called adaptive. A natural restriction for algorithms solving Problem 6.1 is to assume that random choices cannot be biased, i.e., also bins are anonymous. This is formalized by the following definition. Problem 6.3 (Symmetric Balls-into-Bins). We call an instance of Problem 6.1 symmetric, if balls and bins identify each other by u.i.r. port numberings. We call an algorithm that can be implemented under this constraint symmetric. In contrast, balls executing an asymmetric algorithm may e.g. all decide to contact bin 42. Note that this is impossible for symmetric algorithms, for which the uniformly random port numberings even out any non-uniformity in the probability distribution of contacted port numbers. Problem 6.4 (Asymmetric Balls-into-Bins). An instance of Problem 6.1 is asymmetric, if balls identify bins by globally unique addresses 1, . . . , n. An algorithm relying on this information is called asymmetric.

6.2

Related Work

Probably one of the earliest applications of randomized load balancing has been hashing. In this context, it was proved that when throwing n balls u.i.r. into n bins, the fullest bin has load (1 + o(1)) log n/ log log n in expectation [41]. It is also common knowledge that the maximal bin load of this simple approach is Θ(log n/ log log n) w.h.p. (e.g. [29]).

78

CHAPTER 6. INTRODUCTION TO LOAD BALANCING

With the growing interest in parallel computing, since the beginning of the nineties the topic received increasingly more attention. Karp et al. [44] demonstrated for the first time that two random choices are superior to one. By combining two (possibly not fully independent) hashing functions, they simulated a parallel random access machine (PRAM) on a distributed memory machine (DMM) with a factor O(log log n log∗ n) overhead; in essence, their result was a solution to balls-into-bins with a maximal bin load of O(log log n) w.h.p. Azar et al. [6] generalized their result by showing that if the balls choose sequentially from d ≥ 2 u.i.r. bins greedily the currently least loaded one, the maximal load is log log n/ log d + O(1) w.h.p.1 They prove that this bound is stochastically optimal in the sense that any other strategy to assign the balls majorizes2 their approach. The expected number of bins each ball queries during the execution of the algorithm was later improved to 1 + ε (for any constant ε > 0) by Czumaj and Stemann [22]. This is achieved by placing each ball immediately if the load of an inspected bin is not too large, rather then always querying d bins. So far the question remained open whether strong upper bounds can be achieved in a distributed setting. Adler et al. [1] answered this affirmatively by devising a parallel greedy algorithm obtaining a maximal load of O(d + log log n/ log d) within the same number of rounds w.h.p. Thus, choosing d ∈ Θ(log log n/ log log log n), the best possible maximal bin load of their algorithm is O(log log n/ log log log n). On the other hand, they prove that a certain subclass of algorithms cannot perform better with probability larger than 1 − 1/ polylog n. The main characteristics of this subclass are that algorithms are non-adaptive, i.e., balls have to choose a fixed number of d candidate bins before communication starts, and symmetric, i.e., these bins are chosen u.i.r. Moreover, communication takes place only between balls and their candidate bins. In this setting, Adler et al. show also that for any constant values of d and the number of rounds r the maximal bin load is in Ω((log n/ log log n)1/r ) with constant probability. Recently, Even and Medina extended their bounds to a larger spectrum of algorithms by removing some artificial assumptions [32]. A matching algorithm was proposed by Stemann [100], which for d = 2 and r ∈ O(log log n) achieves a load of O((log n/ log log n)1/r ) w.h.p.; for r ∈ Θ(log log n) this implies a constantly bounded bin load. The only “adaptive” algorithm proposed so far is due to 1 There is no common agreement on the notion of w.h.p. Frequently it refers to probabilities of at least 1 − 1/n or 1 − o(1), as so in the work of Azar et al.; however, their proof also provides their result w.h.p. in the sense we use throughout this thesis. 2 Roughly speaking, this means that any other algorithm is as least as likely to produce bad load vectors as the greedy algorithm. An n-dimensional load vector is worse than another, if after reordering the components of both vectors descendingly, any partial sum of the first i ∈ {1, . . . , n} entries of the one vector is greater or equal to the corresponding partial sum of the other.

6.2. RELATED WORK

79

Even and Medina [31]. If balls cannot be allocated, they get an additional random choice. However, one could also give all balls this additional choice right from the start, i.e., this kind of adaptivity cannot circumvent the lower bound. Consequently, their 2.5 rounds algorithm p uses a constant number of choices and exhibits a maximal bin load of Θ( log n/ log log n) w.h.p., the same asymptotic characteristics as parallel greedy with 2.5 rounds and two choices. In comparison, within this number of rounds our technique is capable of achieving bin loads of (1 + o(1)) log log n/ log log log n w.h.p.3 See Table 6.1 for a comparison of our results to parallel algorithms. Our adaptive algorithms outperform all previous solutions for the whole range of parameters. Given the existing lower bounds, the only possibility for further improvement has been to search for non-adaptive or asymmetric algorithms. V¨ ocking [103] introduced the sequential “always-go-left” algorithm which employs asymmetric tie-breaking in order to improve the impact of the number of possible choices d from logarithmic to linear. Furthermore, he proved that dependency of random choices does not offer asymptotically better bounds. His upper bound holds also true if merely two bins are chosen randomly, but for each choice d/2 consecutive bins are queried [45]. Table 6.2 summarizes sequential balls-into-bins algorithms. Note that not all parallel algorithms can also be run sequentially. Stemann’s collision protocol, for instance, requires bins to accept balls only if a certain number of pending requests is not exceeded. Thus the protocol cannot place balls until all balls’ random choices are communicated. In contrast, our approach translates to a simple sequential algorithm competing in performance with the best known results [22, 103]. This algorithm could be interpreted as a greedy algorithm with d = ∞. Beyond that, there is a huge body of work studying variants of the basic problem [4, 6, 13, 14, 15, 22, 47, 48, 82, 83, 84, 85, 93, 100, 102, 103, 104]. In this exposition we will focus on the simple version of the task. A brief overview of these works can be found in [68]. Results related to ours have been discovered before for hashing problems. A number of publications presents algorithms with running times of O(log∗ n) (or very close) in PRAM models [10, 38, 78, 81]. At the heart of these routines as well as our balls-into-bins solutions lies the idea to use an in each iteration exponentially growing share of the available resources to deal with the remaining keys or bins, respectively. Implicitely, this approach already occured in previous work by Raman [94]. For a more detailed review of 3 This follows by setting a := (1 + ε) log log n/ log log log n (for arbitrary small ε > 0) in the proof of Corollary 8.6; we get that merely n/(log n)1+ε balls remain after one round, which then can be delivered in 1.5 more rounds w.h.p. using O(log n) requests per ball.

CHAPTER 6. INTRODUCTION TO LOAD BALANCING 80

algorithm yes

sym. no

adpt. 1

choices

0.5

rounds

sym.

no

adpt.

d≥2

1

choices

maximal bin load O logloglogn n q O

log n log log n

messages

n

O(n)

O(n)

O(n)

n log log n log log log n

O(n)

O O

log log n log log log n 1/r log n log log n

2

O(ln)

+ r + O(1)

O(1)

O(n)

max. bin load

bin queries

3

log(r) n log(r+1) n

O

n

O(dn)

O(dn)

Table 6.1: Comparison of parallel balls-into-bins algorithms. Committing balls into bins counts as half a round with regard to time complexity.

naive [41]

Θ

2.5

2 Θ

no no

yes yes

par. greedy [1] par. greedy [1]

r + 0.5

log log n log log log n

r + O(1)

log∗ n + O(1)

2

log log n log log log n

O(1) (exp.)

no

O(1) (exp.)

yes yes

collision [100]

yes

log∗ n − log∗ l + O(1)

yes

O(l) (exp.)

O(1)

yes yes

O(1) (exp.)

Ab2 yes yes

Ab (r) no

algorithm

yes

no

Ac (l) √ A( log n)

naive [41]

yes

d≥2

Table 6.2: Comparison of sequential balls-into-bins algorithms.

greedy [6]

1 + o(1) (exp.); at most d ≥ 2

(1 + o(1))n

no

(2 + o(1))n

yes

2

O log n log log n log log n + O(1) log d n O log log d O logloglogd n no

O(1) (exp.)

yes

yes

adpt. greedy [22]

yes

always-go-left [103] Aseq

6.2. RELATED WORK

81

these papers, we refer the interested reader to [42]. Despite differences in the models, our algorithms and proofs exhibit quite a few structural similarities to the ones applicable to hashing in PRAM models. From our point of view, there are two main differences distinguishing our upper bound results on symmetric algorithms. Firstly, the parallel balls-into-bins model permits to use the algorithmic idea in its most basic form. Hence, our presentation focuses on the properties decisive for the log∗ n + O(1) complexity bound of the basic symmetric algorithm. Secondly, our analysis shows that the core technique is highly robust and can therefore tolerate a large number of faults. The lower bound by Adler et al. (and the generalization by Even and Medina) is stronger than our lower bound, but it applies to algorithms which are severely restricted in their abilities only. Essentially, these restrictions uncouple the algorithm’s decisions from the communication pattern; in particular, communication is constrained to an initially fixed random graph, where each ball contributes d edges to u.i.r. bins. This prerequisite seems reasonable for systems where the initial communication overhead is large. In general, we believe it to be difficult to motivate that a non-constant number of communication rounds is feasible, but an initially fixed set of bins may be contacted only. In contrast, our lower bound also holds for adaptive algorithms; in other words, it arises from the assumption that bins are (initially) anonymous, which fits a wide range of real-world systems. Like Linial in his seminal work on 3-coloring the ring [70], we attain a lower bound of Ω(log∗ n) on the time required to solve the task efficiently. This connection is more than superficial, as both bounds essentially arise from a symmetry breaking problem. However, Linial’s argument just uses a highly symmetric ring topology. This general approach to argue about a simple topology has been popular when proving lower bounds (see e.g. [25, 65, 89]). This is entirely different from our setting, where any two parties may potentially exchange information. Therefore, we cannot argue on the basis that nodes will learn about a specific subset of the global state contained within their local horizon only. Instead, the random decisions of a balls-into-bins algorithm define a graph describing the flow of information. This graph is not a simple random graph, as the information gained by this communication feeds back to its evolution over time, i.e., future communication may take the local topology of its current state into account. A different lower bound technique is by Kuhn et al. [54], where a specific locally symmetric, but globally asymmetric graph is constructed to render a problem hard. Like in our work, [54] restricts its arguments to graphs which are locally trees. The structure of the graphs we consider imposes to examine subgraphs which are trees as well; subgraphs containing cycles occur too infrequently to constitute a lower bound. The bound of Ω(log∗ n) from [38], applicable to hashing in a certain model, which also argues about

82

CHAPTER 6. INTRODUCTION TO LOAD BALANCING

trees, has even more in common with our result. However, neither of these bounds needs to deal with the difficulty that the algorithm may influence the evolution of the communication graph in a complex manner. In [54], input and communication graph are identical and fixed; in [38], there is also no adaptive communication pattern, as essentially the algorithm may merely decide on how to further separate elements that share the same image under the hash functions applied to them so far. Various other lower bound techniques exist [34, 75], however, they are not related to the bound presented in Chapter 7. If graph-based, the arguments are often purely information theoretic, in the sense that some information must be exchanged over some bottleneck link or node in a carefully constructed network with diameter larger than two [72, 92]. In our setting, such information theoretic lower bounds will not work: Any two balls may exchange information along n edge-disjoint paths of length two, as the graph describing which edges could potentially be used to transmit a message is complete bipartite. In some sense, this is the main contribution of this part of our exposition: We show the existence of a coordination bottleneck in a system without a physical bottleneck.

Chapter 7

Lower Bound on Symmetric Balls-into-Bins Algorithms

“That’s probably the most complicated thing I’ve ever heard.” – “Seriously?” – “No, but the way you explain it, it sounds as if it was.” – Roger’s comment on my first chaotic proof sketch of the balls-into-bins lower bound.

In this chapter, we derive our lower bound on the parallel complexity of the balls-into-bins problem. We will show that any symmetric balls-intobins algorithm guaranteeing O(n) total messages w.h.p. requires at least ∗ (1 − o(1)) log∗ n rounds w.h.p. to achieve a maximal bin load of o(log n) 2. This chapter is based on [69] and the accompanying technical report [68].

7.1

Definitions

In fact, our lower bound for the symmetric problem holds for a slightly stronger communication model. Problem 7.1 (Acquaintance Balls-into-Bins). We call an instance of Problem 6.1 acquaintance balls-into-bins problem, if the following holds. Initially, bins are anonymous, i.e., balls identify bins by u.i.r. port numberings. However, once a ball contacts a bin, it learns its globally unique address, by which it can be contacted reliably. Thus, by means of forwarding addresses, balls can learn to contact specific bins directly. The addresses are abstract in the

84

CHAPTER 7. LOWER BOUND ON SYMMETRIC ALGORITHMS

sense that they can be used for this purpose only.1 We call an algorithm solving this problem acquaintance algorithm. For this problem, we will show the following result. Theorem 7.2. Any acquaintance algorithm sending in total O(n) messages w.h.p. and at most polylog n messages per node either incurs a maximal bin load of more than L ∈ N w.h.p. or runs for (1 − o(1)) log∗ n − log∗ L rounds, irrespective of the size of messages. In order to show this statement, we need to bound the amount of information a ball can collect during the course of the algorithm. As each ball may contact any bins it has heard of, this information is a subset of an (appropriately labeled) exponentially growing neighborhood of the ball in the graph where edges are created whenever a ball picks a communication partner at random. Definition 7.3 (Balls-into-Bins Graph). The (bipartite and simple) ballsinto-bins graph GA (t) associated with an execution of the acquaintance ballsinto-bins algorithm A running for t ∈ N rounds is constructed as follows. The node set V := Vt ∪ V◦ consists of |Vt | = |V◦ | = n bins and balls. In each round i ∈ {1, . . . , t}, each ball b ∈ V◦ adds an edge connecting itself to bin v ∈ Vt if b contacts v by a random choice in that round. By EA (i) we denote the edges added in round i and GA (t) = (V, ∪ti=1 EA (i)) is the graph containing all edges added until and including round t. In the remainder of this chapter, we will consider such graphs only. The proof will argue about certain symmetric subgraphs in which not all balls can decide on bins concurrently without incurring large bin loads. As can be seen by a quick calculation, any connected subgraph containing a cycle is unlikely to occur frequently. For an adaptive algorithm, it is possible that balls make a larger effort in terms of sent messages to break symmetry once they observe a “rare” neighborhood. Therefore, it is mandatory to reason about subgraphs that are trees. We would like to argue that any algorithm suffers from generating a large number of trees of uniform ball and bin degrees. If we root such a tree at an arbitrary bin, balls cannot distinguish between their parents and children according to this orientation. Thus, they will decide on a bin that is closer to the root with probability inverse proportional to their degree. If bin degrees are by factor f (n) larger than ball degrees, this will result in an expected load of the root of f (n). However, this line of reasoning is too simple. As 1 This requirement is introduced to prohibit the use of these addresses for symmetry breaking, as is possible for asymmetric algorithms. One may think of the addresses e.g. as being random from a large universe, or the address space might be entirely unknown to the balls.

7.1. DEFINITIONS

85

edges are added to G in different rounds, these edges can be distinguished by the balls. Moreover, even if several balls observe the same local topology in a given round, they may randomize the number of bins they contact during that round, destroying the uniformity of degrees. For these reasons, we (i) rely on a more complicated tree in which the degrees are a function of the round number and (ii) show that for every acquaintance algorithm a stronger algorithm exists that indeed generates many such trees w.h.p. In summary, the proof will consist of three steps. Firstly, for any acquaintance algorithm obeying the above bounds on running time and message complexity, an at least equally powerful algorithm from a certain subclass of algorithms exists. Secondly, algorithms from this subclass generate for (1 − o(1)) log∗ n rounds large numbers of the aforementioned highly symmetric trees in GA (t) w.h.p. Thirdly, enforcing a decision from all balls in such structures leads to a maximal bin load of ω(1) w.h.p. The following definition clarifies what we understand by “equally powerful” in this context. Definition 7.4 (W.h.p. Equivalent Algorithms). We call two Algorithms A and A0 for Problem 6.1 w.h.p. equivalent if their output distributions agree on all but a fraction of the events occurring with total probability at most 1/nc . That is, if Γ denotes the set of possible distributions of balls into bins, we have that X 1 |PA [γ] − PA0 [γ]| ≤ c . n γ∈Γ The subclass of algorithms we are interested in is partially characterized by its behaviour on the mentioned subgraphs, hence we need to define the latter first. These subgraphs are special trees, in which all involved balls up to a certain distance from the root see exactly the same topology. This means that (i) in each round, all involved balls created exactly the same number of edges by contacting bins randomly, (ii) each bin has a degree that depends on the round when it was contacted first only, (iii) all edges of such bin are formed in exactly this round, and (iv) this scheme repeats itself up to a distance that is sufficiently large for the balls not to see any irregularities that might help in breaking symmetry. These properties are satisfied by the following tree structure. Definition 7.5 (Layered (∆t , ∆◦ , D)-Trees). A layered (∆t , ∆◦ , D)-tree of t ` ∈ N0 levels rooted at bin R is defined as follows, where ∆t = (∆t 1 , . . . , ∆` ) ◦ ◦ ◦ and ∆ = (∆1 , . . . , ∆` ) are the vectors of bins’ and balls’ degrees on different levels, respectively, and D ∈ N. If ` = 0, the “tree” is simply a single bin. If ` > 0, the subgraph of P (2D) GA (`) induced by NR is a tree, where ball degrees are uniformly `i=1 ∆◦i .

86

CHAPTER 7. LOWER BOUND ON SYMMETRIC ALGORITHMS

X X

X

Figure 7.1: Part of a ((2, 5), (3, 5), D)-tree rooted at the topmost bin. Bins are squares and balls are circles; neighborhoods of all balls and the bins marked by an “X” are depicted completely, the remainder of the tree is left out. Thin edges and white bins were added to the structure in the first round, thick edges and grey bins in the second. Up to distance 2D from the root, the pattern repeats itself, i.e., the (2D − d)-neighborhoods of all balls up to depth d appear identical.

Except for leaves, a bin that is added to the structure in round i ∈ {1, . . . , `} has degree ∆t i with all its edges in EA (i). See Figure 7.1 for an illustration. Intuitively, layered trees are crafted to present symmetric neighborhoods to nodes which are not aware of leaves. Hence, if bins’ degrees are large compared to balls’ degrees, not all balls can decide simultaneously without risking to overload bins. This statement will be made mathematically precise later. We are now in the position to define the subclass of algorithms we will analyze. The main reason to resort to this subclass is that acquaintance algorithms may enforce seemingly asymmetric structures, which complicates proving a lower bound. In order to avoid this, we grant the algorithms additional random choices, restoring symmetry. The new algorithms must be even stronger, since they have more information available, yet they will generate many layered trees. Since we consider such algorithms specifically for this purpose, this is hard-wired in the definition. Definition 7.6 (Oblivious-Choice Algorithms). Assume that an acquaint tance Algorithm A running for at most t rounds, ∆t = (∆t 1 , . . . , ∆t ), and ◦ ◦ ◦ + t ∆ = (∆1 , . . . , ∆t ) are given. Suppose T = (T0 , . . . , Tt ) ∈ (R ) is a sequence such that ∆◦i ∈ O(n/Ti−1 ) for all i ∈ {1, . . . , t} and for all i ∈ t ◦ ◦ t {0, . . . , t} the number of disjoint layered ((∆t 1 , . . . , ∆i ), (∆1 , . . . , ∆i ), 2 )trees in GA (i) is at least Ti w.h.p.

7.2. PROOF OF THE LOWER BOUND

87

We call A a (∆t , ∆◦ , T )-oblivious-choice algorithm, if the following requirements are met: (i) The algorithm terminates at the end of round t, when all balls simultaneously decide into which bin they are placed. A ball’s decision is based on its 2t -neighborhood in GA (t), including the random bits of any node within that distance, and all bins within this distance are feasible choices.2 (ii) In round i ∈ {1, . . . , t}, each ball b decides on a number of bins to contact and chooses that many bins u.i.r., forming the respective edges in GA (i) if not yet present. This decision may resort to the topology of the 2t -hop neighborhood of a ball in GA (i − 1) (where GA (0) is the graph containing no edges). (iii) In round i ∈ {1, . . . , t}, it holds w.h.p. that all balls in depth d ≤ 2t t ◦ ◦ t of Ω(Ti−1 ) layered ((∆t 1 , . . . , ∆i−1 ), (∆1 , . . . , ∆i−1 ), 2 )-trees in GA (i) ◦ each choose ∆i bins to contact. The larger t can be, the longer it will take until eventually no more layered trees occur and all balls may decide safely.

7.2

Proof of the Lower Bound

We need to show that for appropriate choices of parameters and non-trivial values of t, indeed oblivious-choice algorithms exist. Essentially, this is a consequence of the fact that we construct trees: When growing a tree, each added edge connects to a node outside the tree, therefore leaving a large number of possible endpoints for the edge; in contrast, closing a circle in a small subgraph is unlikely. Lemma 7.7. Let ∆◦1 ∈ N and C > 0 be constants, L, t ∈ N arbitrary, ◦ T0 := n/(100(∆◦1 )2 (2C + 1)L) ∈ Θ(n/L), and ∆t 1 := 2L∆1 . Define for i ∈ {2, . . . , t} that ◦ ∆1 n ∆◦i := , Ti−1 ∆t i

:=

2L∆◦i ,

and for i ∈ {1, . . . , t} that 4·2t

Ti := 2−(n/Ti−1 ) 2

n.

This is a superset of the information a ball can get when executing an acquaintance algorithm, since by address forwarding it might learn of and contact bins up to that distance. Note that randomly deciding on an unknown bin here counts as contacting it, as a single round makes no difference with respect to the stated lower bound.

88

CHAPTER 7. LOWER BOUND ON SYMMETRIC ALGORITHMS

√ If Tt ∈ ω( n log n) and n is sufficiently large, then any algorithm fulfilling the prerequisites (i), (ii), and (iii) from Definition 7.6 with regard to these parameters that sends at most Cn2 /Ti−1 messages in round i ∈ {1, . . . , t} w.h.p. is a (∆t , ∆◦ , T )-oblivious-choice algorithm. Proof. Since by definition we have ∆◦i ∈ O(n/Ti−1 ) for all i ∈ {1, . . . , t}, in order to prove the claim we need to show that at least Ti disjoint layered t ◦ ◦ t ((∆t 1 , . . . , ∆i ), (∆1 , . . . , ∆i ), 2 )-trees occur in GA (i) w.h.p. We prove this statement by induction. Since T0 ≤ n and every bin is a ((), (), 2t )-tree, we need to perform the induction step only. Hence, assume that for i − 1 ∈ {0, . . . , t − 1}, Ti−1 lower bounds the numt ◦ ◦ t ber of disjoint layered ((∆t 1 , . . . , ∆i−1 ), (∆1 , . . . , ∆i−1 ), 2 )-trees in GA (i − 1) w.h.p. In other words, the event E1 that we have at least Ti−1 such trees occurs w.h.p. We want to lower bound the probability p that a so far isolated bin R bet ◦ ◦ t comes the root of a ((∆t 1 , . . . , ∆i ), (∆1 , . . . , ∆i ), 2 )-tree in GA (i). Starting from R, we construct the 2D-neighborhood of R. All involved balls take part t ◦ ◦ t in disjoint ((∆t 1 , . . . , ∆i−1 ), (∆1 , . . . , ∆i−1 ), 2 )-trees, all bins incorporated in these trees are not adjacent to edges in EA (i), and all bins with edges on level i have been isolated until and including round i − 1. P 2 As the algorithm sends at most i−1 Cn /Tj−1 messages until the end j=1 of round i − 1 w.h.p., the expected number of isolated bins after round i − 1 is at least Cn Pi−1 n/Tj−1 j=1 1 1 1− c n 1− ∈ ne−(1+o(1))Cn/Ti−1 n n ⊂

ne−O(n/Tt−1 )

⊂

ω(log n).

Thus Lemma 2.15 and Corollary 2.13 imply that the event E2 that at least ne−(1+o(1))Cn/Ti−1 such bins are available occurs w.h.p. Denote by N the total number of nodes in the layered tree. Adding balls one by one, in each step we choose a ball out of w.h.p. at least Ti−1 − N + 1 t ◦ ◦ t remaining balls in disjoint ((∆t 1 , . . . , ∆i−1 ), (∆1 , . . . , ∆i−1 ), 2 )-trees, connect ◦ it to a bin already in the tree, and connect it to ∆i − 1 of the w.h.p. at least ne−(1+o(1))Cn/Ti−1 −N +1 remaining bins that have degree zero in GA (i−1). Denote by E3 the event that the tree is constructed successfully and let us bound its probability. t Observe that because for all i ∈ {1, . . . , t} we have that ∆t i > 2∆i−1 and ◦ ◦ ∆i > 2∆i−1 , it holds that !!d 2t i 2t X X X t t ◦ ◦ d ◦ 2 N< ∆i ∆j < 2∆t < 2 2∆t . (7.1) i ∆i i ∆i d=0

j=1

d=0

7.2. PROOF OF THE LOWER BOUND

89

◦ Furthermore,√ the inductive definitions of ∆t i , ∆i , and Ti , the prerequisite that Tt ∈ ω( n log n), and basic calculations reveal that for all i ∈ {1, . . . , t}, we have the simpler bound of

◦ N < 2 2∆t i ∆i

2t

< 2(4L + 1)2t

∆t 1n Ti−1

4t

∈ ne−ω(n/Ti−1 ) ∩ o(Ti−1 ) (7.2)

on N . Recall that A is an oblivious-choice algorithm, i.e., the bins that are contacted in a given round are chosen independently. Thus, provided that E1 occurs, the (conditional) probability that a bin that has already been attached to its parent in the tree is contacted by the first random choice of exactly ∆t i − 1 balls that are sufficiently close to the roots of disjoint t ◦ ◦ t ((∆t , . . . , ∆ 1 i−1 ), (∆1 , . . . , ∆i−1 ), 2 )-trees is lower bounded by ! t ◦ (∆t ∆i −1 i −1)(∆i −1) 1 Ti−1 − N + (∆t 1 i − 1) 1 − ∆t n n i −1 (1+o(1))(∆t −1) i Ti−1 . t n∆i

(7.2)

∈

t Because ∆t i ∈ O(n/Ti−1 ), it holds that ln(n∆i /Ti−1 ) ∈ o(n/Ti−1 ). Thus, going over all bins (including the root, where the factor in the exponent is t ∆t i instead of ∆i − 1), we can lower bound the probability that all bins are contacted by the right number of balls by

Ti−1 n∆t i

(1+o(1))N

∈ e−(1+o(1))N n/Ti−1 ,

as less than N balls need to be added to the tree in total. Note that we have not made sure yet that the bins are not contacted by other balls; E3 is concerned with constructing the tree as a subgraph of GA (t) only. For E3 to happen, we also need that all balls that are added to the tree contact previously isolated bins. Hence, in total fewer than N u.i.r. choices need to hit different bins from a subset of size ne−(1+o(1))Cn/Ti−1 . This probability can be bounded by

ne−(1+o(1))Cn/Ti−1 − N n

N

(7.2)

∈ e−(1+o(1))CN n/Ti−1 .

Now, after constructing the tree, we need to make sure that it is indeed (2D) the induced subgraph of NR in GA (i), i.e., no further edges connect to any nodes in the tree. Denote this event by E4 . As we already “used” all edges

90

CHAPTER 7. LOWER BOUND ON SYMMETRIC ALGORITHMS

of balls inside the tree and there are no more than Cn2 /Ti−1 edges created by balls outside the tree, E4 happens with probability at least Cn2 /Ti−1 N ∈ e−(1+o(1))CN n/Ti−1 . 1− n Combining all factors, we obtain that p

≥ ∈ = (7.1)

P [E1 ] · P [E2 | E1 ] · P [E3 | E1 ∧ E2 ] · P [E4 | E1 ∧ E2 ∧ E3 ] 2 1 1− c e−(1+o(1))(C+1)N n/Ti−1 e−(1+o(1))CN n/Ti−1 n e−(1+o(1))(2C+1)N n/Ti−1 t

◦ 2t

⊆

2N e−(1+o(1))(2C+1)(2∆i ∆i )

⊆

2N e−(1+o(1))(2C+1)

⊆ ⊆

3

n/Ti−1 (1+o(1))Cn/Ti−1

2 4L(2∆◦ 1 n/Ti−1 )

e 2t

n/Ti−1 (1+o(1))Cn/Ti−1

e

2t

2N 2− n/T0 (n/Ti−1 ) e(1+o(1))Cn/Ti−1 2N Ti (1+o(1))Cn/Ti−1 e . n

We conclude that the expected value of the random variable X countt ◦ t t ing the number of disjoint ((∆t 1 , . . . , ∆i ), (∆1 , . . . , ∆i ), 2 )-trees is lower −(1+o(1))Cn/Ti−1 bounded by E[X] > 2Ti , as at least e n isolated bins are left that may serve as root of (not necessarily disjoint) trees and each tree contains less than N bins. Finally, having fixed GA (i − 1), X becomes a function of w.h.p. at most O(n2 /Ti−1 ) ⊆ O(n2 /Tt−1 ) ⊆ O(n log(n/Tt )) ⊆ O(n log n) u.i.r. chosen bins contacted by the balls in round i. Each of the corresponding random variables may change the value of X by at most three: An edge insertion may add one tree or remove two, while deleting an edge removes at most one√tree and creates at most two. √ Due to the prerequisite that Ti ≥ Tt ∈ ω( n log n), we have E[X] ∈ ω( n log n). Hence we can apply Theorem 2.17 in order to obtain 2 E[X] P X< ∈ e−Ω(E[X] /(n log n)) ⊆ n−ω(1) , 2 proving the statement of the lemma. We see that the probability that layered trees occur falls at most exponentially in their size to the power of 4 · 2t . Since t is very small, i.e., smaller than log∗ n, this rate of growth is comparable to exponentiation by a polynomial in the √ size of the tree. Therefore, one may expect that the requirement of Tt ∈ ω( n/ log n) can be maintained for values of t in Ω(log∗ n). Calculations reveal that even t ∈ (1 − o(1)) log∗ n is feasible.

7.2. PROOF OF THE LOWER BOUND

91

Lemma 7.8. Using the notation of Lemma 7.7, it holds for t ≤ t0 (n, L) ∈ (1 − o(1)) log∗ n − log∗ L √ that Tt ∈ ω( n/ log n). Proof. By basic calculus. We refer to [68]. In light of the upper bounds we will show in the next chapter, this interplay between L and t is by no means arbitrary. We will see that if for any r ∈ N one accepts a maximal bin load of log(r) n/ log(r+1) n + r + O(1), Problem 6.1 can be solved in r + O(1) rounds. Since now we know that critical subgraphs occur frequently for specific algorithms, next we prove that this subclass of algorithms is as powerful as acquaintance algorithms of certain bounds on time and message complexity. Lemma 7.9. Suppose the acquaintance Algorithm A solves Problem 6.1 within t ≤ t0 (n, L), L ∈ N, rounds w.h.p. (t0 as in Lemma 7.8), sending w.h.p. at most O(n) messages in total and polylog n messages per node. Then, for sufficiently large n, a constant ∆◦1 and an oblivious-choice algorithm A0 with regard to the set of parameters specified in Lemma 7.7 exists that sends at most O(n2 /Ti−1 ) messages in round i ∈ {1, . . . , t} w.h.p., terminates at the end of round t, and is w.h.p. equivalent to A. Proof. Observe that A has only two means of disseminating information: Either, balls can randomly connect to unknown bins, or they can send information to bins known from previous messages. Thus, any two nodes at distance larger than 2t from each other in GA (i − 1) must act independently in round i. Since degrees are at most polylog n w.h.p., w.h.p. no ball knows t more than (polylog n)2 ⊆ no(1) bins (w.l.o.g. t0 (n, l) ≤ log∗ n − 2). Assume that in a given round a ball chooses k bins, excluding the ones of which he already obtained the global address. If it contacted k0 := d3cke bins u.i.r. and dropped any drawn bin that it already knows (including repetitions in the current round), it would make with probability at most ! k0 −j k0 X k0 1 1 1 − 1−o(1) ⊂ n−2(1−o(1))c polylog n ⊂ n−c (1−o(1))j j n n 0 j=k −k+1

less than k new contacts. Thus, we may modify A such that it chooses O(k) bins u.i.r. whenever it would contact k bins randomly. This can be seen as augmenting GA (i) by additional edges. By ignoring these edges, the resulting algorithm A0 is capable of (locally) basing its decisions on the probability distribution of graphs GA (i). Hence, Condition (ii) from Definition 7.6 is met by the modified algorithm.

92

CHAPTER 7. LOWER BOUND ON SYMMETRIC ALGORITHMS

Condition (i) forces A0 to terminate in round t even if A does not. However, since A must terminate in round t w.h.p., balls may choose arbitrarily in this case, w.h.p. not changing the output compared to A. On the other hand, we can certainly delay the termination of A until round t if A would terminate earlier, without changing the results. Thus, it remains to show that we can further change the execution of A during the first t rounds in a way ensuring Condition (iii) of the definition, while maintaining the required bound on the number of messages. To this end, we modify A inductively, where again in each round for some balls we increase the number of randomly contacted bins compared to an execution of A; certainly this will not affect Conditions (i) and (ii) from Definition 7.6, and A0 will exhibit the same output distribution as A (up to a fraction of 1/nc of the executions) if A0 ignores any of the additional edges when placing the balls at the end of round t. Now, assume that the claim holds until round i − 1 ∈ {0, . . . , t − 1}. In round i ∈ {1, . . . , t}, any pair of balls in depth at most 2t of disjoint t ◦ ◦ t ((∆t 1 , . . . , ∆i−1 ), (∆1 , . . . , ∆i−1 ), 2 )-trees in GA0 (i) are in distance at least t+1 2 from each other, i.e., they must decide mutually independently on the number of bins to contact. Consequently, since A has less information at hand than A0 , these balls would also decide independently in the corresponding execution of A. For sufficiently large n, (the already modified variant of) A will send at most Cn messages w.h.p. (for some constant C ∈ R+ ). Hence, the balls that are up to depth 2t of such a tree send together in expectation less than 2nC/Ti−1 messages, since otherwise Corollary 2.13 and the fact that Ti−1 ∈ ω(log n) would imply that at least (2 − o(1))Cn messages would be sent by A in total w.h.p. Consequently, by Markov’s Bound (Theorem 2.6), with independent probability at least 1/2, all balls in depth 2t or less of a t ◦ ◦ t layered ((∆t 1 , . . . , ∆i−1 ), (∆1 , . . . , ∆i−1 ), 2 )-tree in union send no more than 4Cn/Ti−1 messages in round i. Using Corollary 2.13 again, we conclude that for Ω(Ti−1 ) many trees it holds that none of the balls in depth at most 2t will send more than 4Cn/Ti−1 messages to randomly chosen bins w.h.p. When executing A0 , we demand that each such ball randomly contacts exactly that many bins, i.e., with ∆◦1 := 4C Condition (iii) of Definition 7.6 is met. By Corollary 2.13, this way at most O(n + n2 /Ti−1 ) = O(n2 /Ti−1 ) messages are sent in round i w.h.p., as claimed. Moreover, the new algorithm can ensure to follow the same probability distribution of bin contacts as A, simply by ignoring the additional random choices made. This completes the induction step and thus the proof. The final ingredient is to show that randomization is insufficient to deal with the highly symmetric topologies of layered trees. In particular, balls that decide on bins despite not being aware of leaves cannot avoid risking to

7.2. PROOF OF THE LOWER BOUND

93

choose the root bin of the tree. If all balls in a tree where bin degrees are large compared to ball degrees decide, this results in a large load of the root bin. Lemma 7.10. Suppose after t rounds of some Algorithm A ball b is in depth at most 2t of a layered (∆t , ∆◦ , 2t )-tree of t levels in GA (t). We fix the topology of the layered tree. Let v be a bin in distance d ≤ 2t from b in GA (t) and assume that the edge sequence of the (unique) shortest path from b to v is e1 , . . . , ed . If b decides on a bin in round t, the probability that b places itself in v depends on the sequence of rounds `1 , . . . , `d ∈ {1, . . . , t}d in which the edges e1 , . . . , ed have been created only. Proof. Observe that since b is a ball, it must have an odd distance from the root of the layered (∆t , ∆◦ , 2t )-tree it participates in. Thus, the 2t neighborhood of b is a subset of the (2t+1 − 1)-neighborhood of the root of the layered tree. Therefore, this neighborhood is a balanced tree of uniform ball degrees. Moreover, for all i ∈ {1, . . . , t}, the number of edges from EA (i) balls up to distance 2t from b are adjacent to is the same. Bin degrees depend on the round i in which they have been contacted first only, and all edges of a bin in the tree were created in the respective round (cf. Figure 7.1). Let b1 , . . . , bn and v1 , . . . , vn be global, fixed enumerations of the balls and bins, respectively. Fix a topology T of the 2t -neighborhood of b with regard to these enumerations. Assume that v and w are two bins in T for which the edges on the shortest paths from b to v resp. w were added in the rounds `1 , . . . , `d , d ∈ {1, . . . , 2t }. Assume that x and y are the first distinct nodes on the shortest paths from b to v and w, respectively. The above observations show that the subtrees of T rooted at x and y are isomorphic (cf. Figure 7.2). Thus, a graph isomorphism f exists that “exchanges” the two subtrees (preserving their orientation), fixes all other nodes, and fulfills that f (v) = w and f ◦ f is the identity. We choose such an f and fix it. Denote by p(bi , vj ) ∈ {1, . . . , n}, i, j ∈ {1, . . . , n}, the port number vj has in the port numbering of bi and by p(vi , bj ) the number bj has in the port numbering of bin vi . Similarly, r(bi ) and r(vi ), i ∈ {1, . . . , n}, denote the string of random bits bi and vi use for randomization, respectively. Using f , we define the automorphism h : S → S on the set of tuples of possible port numberings and random strings (p(·, ·), r(·)) by h((p(·, ·), r(·))) := (p(f (·), f (·)), r(f (·))). Set Sv := {(p(·, ·), r(·)) | T occurs ∧ b chooses v} ⊂ S and Sw analogously. We claim that h(Sv ) = Sw (and therefore also h(Sw ) = h2 (Sv ) = Sv ). This means when applying h to an element of Sv , the topology

94

CHAPTER 7. LOWER BOUND ON SYMMETRIC ALGORITHMS

shared subpath

keep all

switch port numbers

isomorphic subtrees

switch port numbers and random strings

switch port numbers

remaining graph

keep all

Figure 7.2: Example for the effect of h on the topology of GA (t). Switching port numberings and random labels of isomorphic subtrees and the port numberings of their neighborhood, nodes in these subtrees essentially “switch their identity” with their counterparts. Since the local view of the topmost ball is completely contained within the tree, it cannot distinguish between the two configurations.

T is preserved and b chooses w instead of v. To see this, observe that A can be interpreted as deterministic algorithm on the (randomly) labeled graph where nodes u are labeled r(u) and edges (u, u0 ) are labeled (p(u, u0 ), p(u0 , u)). Hence, h simply switches the local views of u and f (u) in that graph and a node u takes the role of f (u) and vice versa (cf. Figure 7.2). Thus, b will choose f (v) = w in the execution with the new labeling. On the other hand, T is preserved because we chose the function f in a way ensuring that we mapped the two subtrees in T that are rooted at x and y to each other by a graph isomorphism, i.e., the topology with regard to the fixed enumerations b1 , . . . , bn and v1 , . . . , vn did not change. In summary, for any topology T of the 2t -neighborhood of b in a layered (∆ , ∆◦ , 2t )-tree such that b is connected to v and w by shortest paths for which the sequences of round numbers when the edges were created are the same, we have that Sv = h(Sw ). Since both port numberings and random inputs are chosen independently, we conclude that P [(p(·, ·), r(·)) ∈ Sv ] = t

7.2. PROOF OF THE LOWER BOUND

95

P [(p(·, ·), r(·)) ∈ Sw ], i.e., b must choose v and w with equal probability as claimed. We are now in the position to prove our lower bound on the trade-off between maximal bin load and running time of acquaintance algorithms. Proof of Theorem 7.2. Assume that Algorithm A solves Problem 7.1 within at most t ≤ t0 ∈ (1 − o(1)) log∗ n − log∗ L rounds w.h.p. (t0 as in Lemma 7.8). Thus, due to Lemma 7.9 a constant ∆◦1 and an oblivious-choice Algorithm A0 with regard to the parameters from Lemma 7.7 whose maximal bin load is w.h.p. the same as the one of A exist. In the following, we use the notation from Lemma 7.7. Suppose that R is the root bin of a layered (∆t , ∆◦ , 2t )-tree in GA0 (t). According to Lemma 7.10, for all balls b in distance up to 2t from R, the probability p to choose a bin v solely depends on the sequence s(b, v) = (s1 , . . . , sd ) of round numbers when the edges on the shortest path from R S2t−1 to b were created. Set S := i=1 S2i−1 , where Sd denotes the set of round sequences s = (s1 , . . . , sd ) of (odd) length d(s) := d from balls to bins (inside the tree). Denote for s ∈ S by p(s) the probability that a ball b within distance 2t from R decides on (any) bin v with s(b, v) = s and by X the random variable counting the number of balls deciding on R. Recall that for ◦ any i ∈ {1, . . . , t}, we have ∆t i = 2L∆i . We compute E[X]

=

X

p(s)

s∈S

=

X s∈S

=

X

p(s)

|{b ∈ V◦ | s(b, R) = s} |{v ∈ Vt | s(b, v) = s}| bd(s)/2c

∆◦s2i ∆t s2i+1

bd(s)/2c

◦ ∆t s2i ∆s2i+1

∆t s1 Πi=1

∆◦s1 Πi=1

p(s)2L

s∈S

=

2L,

as each ball must decide with probability 1 on a bin within distance 2t (Condition (ii) of Definition 7.6). On the other hand, the maximal possible load of R is the number of balls ◦ 2t up to depth 2t of the tree, which we observed to be less than (2∆t ∈ t ∆t ) 2 2 O((n/Tt−1 ) ) ⊂ o(log n). We infer that for sufficiently large n we have that P [X > L] > 1/ log2 n, since otherwise 2L = E[X] ≤ (1 − P [X > L])L + P [X > L] log2 n < 2L. t ◦ t As Lemma 7.8 states √ that the number of disjoint layered (∆ , ∆ , 2 )trees is at least Tt ∈ ω( n/ log n) w.h.p., we have for the random variable Y

96

CHAPTER 7. LOWER BOUND ON SYMMETRIC ALGORITHMS

counting the number of roots of such trees that get a bin load of more than L that 1 E(Y ) ≥ 1 − c P [X > L]Tt ∈ ω(log n). n Recall that Lemma 7.10 holds for fixed topologies of the trees, i.e., the estimates for P [X > L] and thus E(Y ) follow after fixing the topology up to distance 2t+1 from all the roots first. Thus, the bound on the probability that a root bin gets a load of more than L is independent of the other roots’ loads, since there is no communication between the involved balls. We conclude that we can apply Corollary 2.13 to Y in order to see that Y > 0 w.h.p., i.e., A0 incurs a maximal bin load larger than L w.h.p. Because A and A0 are w.h.p. equivalent, the same holds true for A, proving the claim.

7.2.1

Generalizations

For ease of presentation, the proof of the lower bound assumed that bins do not contact other bins. This is however not necessary. Corollary 7.11. Theorem 7.2 holds also if bins may directly exchange messages. Proof Sketch. The presented technique is sufficient for the more general case, as can be seen by the following reasoning. To adapt the proof, we have to consider trees similar to layered (∆◦ , ∆t , 2t )-trees, where now also bins form edges. Therefore, also bins may create an in each round exponentially growing number of edges to other bins. However, the probability that a bin is the root of such a tree structure in round i will still be lower bounded by f (t) 2−(n/Ti−1 ) , where f (t) is some function such that log∗ (f (t)) ∈ log∗ t+O(1) and Ti−1 is a lower bound on the number of such roots in round i − 1. Hence, Lemmas 7.7 and 7.8 can be changed accordingly. From that point on, the remainder of the proof can be carried out analogously. It is important to be aware that this holds only as long as bins initially identify each other according to u.i.r. port numberings as well. If bins are aware of a globally consistent labeling of all bins, an asymmetric algorithm can be executed, as bins may support balls in doing asymmetric random choices. Similarly, the upper bound on the number of messages individual nodes send can be relaxed. Corollary 7.12. Theorem 7.2 holds also if nodes send at most λn messages in total, where λ ∈ [0, 1) is a constant.

7.2. PROOF OF THE LOWER BOUND

97

Proof Sketch. The critical point of the argumentation is in the proof of Lemma 7.9, where we replace nodes’ new contacts by u.i.r. chosen bins. For large numbers of messages, we can no longer guarantee that increasing the number of balls’ random choices by a constant factor can w.h.p. compensate for the fact that balls will always contact different bins with each additional message. Rather, we have to distinguish between nodes sending many messages and ones sending only few (e.g. polylog n). Only to the latter we apply the replacement scheme. Of course, this introduces new difficulties. For instance, we need to observe that still a constant fraction of the bins remains untouched by balls sending many messages for the proof of Lemma 7.7 to hold. The worst case here would be that O(1) nodes send λn messages during the course of the algorithm, since the bound of O(n) total messages w.h.p. must not be violated. Thus, the probability that a bin is not contacted by such a ball is lower bounded by 1 − (1 − λ)O(1) ⊆ Ω(1). Using standard techniques, this gives that still many bins are never contacted during the course of the algorithm w.h.p. Similarly, we need to be sure that sufficiently many of the already constructed trees are not contacted by “outsiders”; here we get probabilities that fall exponentially in the size of such a tree, which is sufficient for the applied techniques. Another aspect is that now care has to be taken when applying Theorem 2.17 to finish the proof of Lemma 7.7. The problem here is that the random variable describing the edges formed by a ball of large degree is not the product of independent random variables. On the other hand, treating it as a single variable when applying the theorem, it might affect all of the layered trees, rendering the bound from the theorem useless. Thus, we resort to first observing that not too many nodes will send a lot of messages, then fixing their random choices, subsequently bounding the expected number of layered trees conditional to these choices already being made, and finally applying Theorem 2.17 merely to the respective random variable depending on the edges created u.i.r. by balls with small degrees only. Note that if we remove the upper bound on the number of messages a single node might send entirely, there is a trivial solution: √ √ 1. With probability, say, 1/ n, a ball contacts n bins. 2. These balls perform a leader election on the resulting graph (using random identifiers). 3. Contacting all bins, the leader coordinates a perfect distribution of the balls.

98

CHAPTER 7. LOWER BOUND ON SYMMETRIC ALGORITHMS

In the first step O(n) messages are sent w.h.p. Moreover, the subgraph induced by the created edges is connected and has constant diameter w.h.p. Hence step 2, which can be executed with O(n) messages w.h.p., will result in a single leader within a constant number of rounds, implying that step 3 requires O(n) messages as well. However, this algorithm introduces a central coordination instance. If this was a feasible solution, there would be no need for a parallel balls-into-bins algorithm in the first place. In light of these results, our lower bound essentially boils down to the following. Any acquaintance algorithm that guarantees w.h.p. both small bin loads and asymptotically optimal O(n) messages requires (1−o(1)) log∗ n rounds.

Chapter 8

Balls-into-Bins Algorithms “I’m getting too old for this. I wish one day I’d manage to finish earlier than the night of the deadline.” – Thomas Locher right before submitting our first joint paper, at about 11 pm on a Friday.

In this chapter, which is also based on [68, 69], we will match the lower bound from the previous one. More precisely, we show that (i) a symmetric algorithm can achieve a constant bin load in log∗ n + O(1) time with O(n) messages w.h.p. and (ii) if we drop either of the requirements of symmetry, constant maximal bin load, or O(n) total messages, a constant-time solution exists. Moreover, we briefly present an application of the given techniques to an exemplary routing problem.

8.1

Optimal Symmetric Algorithm

Our basic symmetric Algorithm Ab originates from a very simple idea. Assume that all balls behave identically. Given the constraint that we want to guarantee a message complexity of O(n), it is infeasible that a ball sends more than constantly many messages in the first round. Hence, let each ball contact one random bin. The resulting distribution is well known; in particular, w.h.p. a fraction of 1 − 1/e ± o(1) of the bins will receive at least one message. Since we strive for small bin loads, suppose each bin chooses one of the balls it contacted to be placed into it. A non-adaptive algorithm would be forced to place each ball into the single bin it contacted, implying a roughly logarithmic maximal bin load. Being adaptive, we can instead let each ball that could not be successfully

100

CHAPTER 8. BALLS-INTO-BINS ALGORITHMS

placed try again by approaching a different bin. However, we can do better. As we know that no more than (1 + o(1))n/e balls remain w.h.p., each such ball may try out two random bins, still guaranteeing that we have less than n requests in the second round. If again each bin accepts one request, the probability of failure will be much smaller. Roughly speaking, we can argue more generally as follows. The probability of a ball not being placed in a round where it sends k requests is 2−Ω(k) . Using Chernoff’s bound, the number of remaining balls thus drops by (almost) this factor, enabling us to increase the number of requests comparably. Overall, the number of requests in round i grows like Ω(i) 2 until it becomes e.g. log n. By then, each ball will be placed within a constant number of rounds w.h.p. Formally, initialized with k(1) := 1 and i := 1, Algorithm Ab executes the following loop until termination: 1. Balls contact bk(i)c u.i.r. bins, requesting permission to be placed into them. 2. Each bin accepts one of the requests (if it received any) and notifies the respective ball. 3. Any ball receiving at least one acceptance notification chooses an arbitrary of the respective bins to be placed into it and terminates. 4. Set k(i + 1) := min{k(i)ebk(i)c/5 ,

√ log n} and i := i + 1.

We will refer to a single execution of the loop of Ab (or our later algorithms) as a phase. Since in each phase of Ab some messages will reach their destination, the algorithm will eventually terminate. To give strong bounds on its running time, however, we need some helper statements. The first lemma states that a sufficiently large uniformly random subset of the requests will be accepted in step 2 of each phase. Lemma 8.1. Denote by R the set of requests Ab generates in step 1 of a phase. Provided that |R| ∈ [ω(log n), n] and n is sufficiently large, w.h.p. a uniformly random subset of R of size at least |R|/4 is accepted in step 2. Proof. Set λ := |R|/n ≤ 1. Consider the random experiment where |R| = λn balls are thrown u.i.r. into n bins. We make a case differentiation. Assume first that λ ∈ [1/4, 1]. Denote for l ∈ N0 by Bl the random variable counting

8.1. OPTIMAL SYMMETRIC ALGORITHM

101

the number of bins receiving exactly l balls. According to Corollary 2.16, B1

≥ ∈ =

λn − 2 (B0 − (1 − λ)n) 2 − λ − 2(1 + o(1))e−λ n 2 − λ − 2(1 + o(1))e−λ |R| λ

w.h.p. Since λ ≥ 1/4, the o(1)-term is asymptotically negligible. Without that term, the prefactor is minimized at λ = 1, where it is strictly larger than 1/4. On the other hand, if λ < 1/4, we may w.l.o.g. think of the balls as being thrown sequentially. In this case, the number of balls thrown into occupied bins is dominated by the sum of |R| independent Bernoulli variables taking the value 1 with probability 1/4. Since |R| ∈ ω(log n), Corollary 2.13 yields that w.h.p. at most (1/4 + o(1))|R| balls hit non-empty bins. For sufficiently large n, we get that w.h.p. more than (1/2 − o(1))|R| > |R|/4 bins receive exactly one ball. Finally, consider step 1 of the algorithm. The above considerations show that w.h.p. at least |R|/4 bins receive exactly one request, which they will accept in step 2. Consider such a bin receiving exactly one request. Since each ball sends each element of R it holds with probability 1/n to this bin, the accepted request is drawn uniformly at random from R. Furthermore, as we know that no other copy is sent to the bin, all other messages are sent with conditional probability 1/(n − 1) each to any of the other bins. Repeating this argument inductively for all bins receiving exactly one request, the claim follows. In order to apply the previous lemma repeatedly, we need to make sure that the total number of requests in each round remains smaller than n. This is connected to the fact that the fraction of non-placed balls drops exponentially in the number of requests per ball w.h.p. Lemma 8.2. For a phase i ∈ N of Ab , suppose the number of balls that have not been placed yet is bounded by βi ≤ n/k(i) w.h.p., where βi ∈ R+ . Then the number of unplaced balls remaining after phase i is bounded by o n √ max (1 + o(1))e−k(i)/4 βi , e− log n n w.h.p. √

Proof. If βi ≤ e− log n n or n is constantly bounded, the statement is trivial. Thus, as the βi remaining balls send k(i)βi ≤ n requests in phase i, Lemma 8.1 states that w.h.p. a uniformly random subset of size at least

102

CHAPTER 8. BALLS-INTO-BINS ALGORITHMS

k(i)βi /4 of the requests is accepted. For each ball, exactly k(i) requests are sent. Hence, if we draw requests one by one, each message that we have not seen yet has probability at least k(i)/(k(i)βi ) = 1/βi to occur in the next trial. Thus, the random experiment where in each step we draw one of the βi balls u.i.r. with probability 1/βi each and count the number of distinct balls stochastically dominates the experiment counting the number of placed balls. The former is exactly the balls-into-bins scenario from Lemma 2.15, where (at least) k(i)βi /4 balls are thrown into βi bins. For n sufficiently large, we have that √ p 2 ln(k(i)βi ) 2(ln n − log n) ≤ . k(i) ≤ log n ≤ ln ln n ln ln n Hence, Corollary 2.16 bounds the number of balls that cannot be placed in phase i by (1 + o(1))e−k(i)/4 βi w.h.p. Finally, we need to show that the algorithm places a small number of remaining balls quickly. Lemma 8.3. Suppose that at the beginning of phase√i0 ∈ N of Ab merely √ e− log n n balls have not been placed yet and k(i0 ) = log n. Then all balls are placed within O(1) more phases of Ab w.h.p. Proof. Regardless of √ all other balls’ messages, each request has probability √ at least 1 − log n e− log n to be accepted. Thus, the probability that a ball cannot be placed is independently bounded by p k(i0 ) √ log n e− log n ⊆ n−Ω(1) . Since k(i) = k(i0 ) = phases w.h.p.

√

log n for all i ≥ i0 , all balls are placed within O(1)

Plugging these three lemmas together, we derive our performance bounds on Ab . Theorem 8.4. Ab solves Problem 6.1, guaranteeing the following properties: • It terminates after log∗ n + O(1) rounds w.h.p. • Each bin in the end contains at most log∗ n + O(1) balls w.h.p. • In each round, the total number of messages sent is at most n w.h.p. The total number of messages is in O(n) w.h.p. • Balls √ send and receive O(1) messages in expectation and at most O( log n) messages w.h.p.

8.1. OPTIMAL SYMMETRIC ALGORITHM

103

• Bins send and receive O(1) messages in expectation and at most O(log n/ log log n) messages w.h.p. Furthermore, the algorithm runs asynchronously in the sense that balls and bins can decide on any request respectively permission immediately, provided that balls’ messages contain round counters. According to the previous statements messages then have a size of O(1) in expectation and O(log log∗ n) w.h.p. Proof Sketch (see [68] for technical details). We apply Lemma 8.2 inductively to see that after a constant number of phases, the number of remaining balls starts to drop exponentially in k(i) in each phase i w.h.p. Moreover, the definition of k(i) and the lemma yield that the total number of messages sent in √ each phase is monotonically decreasing and falls for sufficiently large log n exponentially w.h.p. Lemma 8.3 shows that as soon as k(i) k(i) < √ becomes log n, the algorithm terminates in a constant number of rounds w.h.p. By basic calculations with log∗ , one infers that w.h.p. the algorithm terminates within log∗ n+O(1) phases and thus also rounds, which immediately implies a maximal bin load of log∗ n + O(1). Together with the above statements about the number of messages sent, this also proves the third and fourth statement as well as the bound on the expected number of messages bins send and receive. The fact that w.h.p. O(n) messages are sent in total to u.i.r. bins implies by Corollary 2.13 the bound on the maximal number of messages bins send and receive. Since Lemma 8.1 argues about the bins receiving exactly one request in a given phase only, all results also hold for the asynchronous case. We remark that a more careful analysis would allow for a smaller cap on the maximal number of requests a ball sends in a given round. This can be seen in the proof of Lemma 8.3, where we did not exploit that the number of √ remaining balls still drops quickly once the arbitrarily chosen threshold of e− log n n is reached, i.e., the probability for a collision falls further. Incurring a larger time complexity of e.g. (1 + o(1)) log∗ n or O(log∗ n) permitted to reduce this bound even more. The simple approach that motivated Algorithm Ab is quite flexible, as a number of corollaries will demonstrate. We give only the key arguments of the proofs and refer to [68] for details. Our first observation is that, without surprise, starting with less balls leads to earlier termination. Corollary 8.5. If only m := n/ log(r) n balls are to be placed into n bins for some r ∈ N, Ab initialized with k(1) := blog(r) nc terminates within r + O(1) rounds w.h.p.

104

CHAPTER 8. BALLS-INTO-BINS ALGORITHMS

Proof. This can be viewed as the algorithm being started in a later round, and only log∗ n − log∗ (log(r) n) + O(1) = r + O(1) more rounds are required for the algorithm to terminate. More interestingly, this can be used to enforce a constant time complexity at the expense of slightly larger bin loads. Corollary 8.6. For any r ∈ N, Ab can be modified into an Algorithm Ab (r) that guarantees a maximal bin load of log(r) n/ log(r+1) n + r + O(1) w.h.p. and terminates within r +O(1) rounds w.h.p. Its message complexity respects the same bounds as the one of Ab . Proof sketch. In order to speed up the process, we rule that in the first phase bins accept up to l := blog(r) n/ log(r+1) nc many balls. Computation and Chernoff’s bound show that w.h.p. no more than polylog n+2−Ω(l log l) n balls remain after the first phase. By Corollary 8.5 this implies the statement. Another appealing side effect of the adaptive increase in the number of requests is that it deals with faults implicitly. Corollary 8.7. Algorithm Ab can be modified to tolerate independent message loss with constant probability p. The properties from Theorem 8.4 remain untouched except that balls now send w.h.p. at most O(log n) messages. Proof Sketch. In step 4 of Ab we set 2

k(i + 1) := min{k(i)eb(1−p)

k(i)c/5

, log n}.

Essentially the reasoning remains the same, except that a request or the respective response may get lost with independent probability (1 − p) each, necessitating to increase k(i) more slowly. Moreover, even if few balls remain, still each request has constant probability to fail. Therefore, we need to increase the bound on k(i) to log n to ensure quick termination. The same technique permits to enforce a maximal bin load of two. Corollary 8.8. We modify Ab into A2b by ruling that any bins having already accepted two balls refuse any further requests in step 2, and in step 4 we set k(i + 1) := min{k(i)ebk(i)c/10 , log n}. Then the statements of√Theorem 8.4 remain true except that balls now send O(log n) instead of O( log n) messages w.h.p. In turn, the maximal bin load of the algorithm becomes two.

8.2. OPTIMAL ASYMMETRIC ALGORITHM

105

Proof Sketch. Trivially, at any time no more than half of the bins may have two balls placed into them. Thus, if balls inform the bins they commit to, always at least half of the bins are capable of accepting a ball. Consequently, the same reasoning as for Corollary 8.7 applies. The observation that neither balls nor bins need to wait prior to reacting to a message implies that our algorithms can also be executed sequentially, placing one ball after another. In particular, we can guarantee a bin load of two efficiently. This corresponds to the simple sequential algorithm that queries for each ball sufficiently many bins to find one of load less than two. Lemma 8.9. An adaptive sequential balls-into-bins algorithm Aseq exists guaranteeing a maximal bin load of two, requiring at most (2+o(1))n random choices and bin queries w.h.p. Proof. The algorithm simply queries u.i.r. bins until one of load less than two is found; then the current ball is placed and the algorithm proceeds with the next. Since at least half of the bins have load less than two at any time, each query has independently a probability of at least 1/2 of being successful. Therefore, it can be deduced from Corollary 2.13 that no more than (2 + o(1))n bin queries are necessary to place all balls w.h.p.

8.2

Optimal Asymmetric Algorithm

In this section, we will show that asymmetric algorithms can indeed obtain constant bin loads in constant time, at asymptotically optimal communication costs. Note that for asymmetric algorithms, we can w.l.o.g. assume that n is a multiple of some number l ∈ o(n), since we may simply opt for ignoring negligible n − lbn/lc bins. We start by presenting a simple algorithm demonstrating the basic idea of our solution. Given l ∈ O(log n) that is a factor of n, A1 (l) is defined as follows. 1. Each ball contacts one bin chosen uniformly at random from the set {il | i ∈ {1, . . . , n/l}}. 2. Bin il, i ∈ {1, . . . , n/l}, assigns up to 3l balls to the bins (i − 1)l + 1, . . . , il, such that each bin gets at most three balls. 3. The remaining balls (and the bins) proceed as if executing the symmetric Algorithm A2b , however, with k initialized to k(1) := 2αl for an appropriately chosen constant α > 0. Essentially, we create buckets of non-constant size l in order to ensure that the load of these buckets is slightly better balanced than it would be the case

106

CHAPTER 8. BALLS-INTO-BINS ALGORITHMS

for individual bins. This enables the algorithm to place more than a constant fraction of the balls immediately. Small values of l suffice for this algorithm to terminate quickly. Lemma 8.10. Algorithm A1 (l) solves Problem 6.4 with a maximal bin load of three. It terminates within log∗ n − log∗ l + O(1) rounds w.h.p. Proof. Let for i ∈ N0 Y i denote the random variables counting the number of bins receiving at least i messages in step 1. From Lemma 2.15 we i i know Corollary 2.13 applies to these variables, i.e., |Y − E[Y ]| ∈ that p O log n + E[Y i ] log n w.h.p. Consequently, we have that the number Y i − Y i+1 of bins receiving exactly i messages differs by at most O log n + p max{E[Y i ], E[Y i+1 ]} log n from its expectation w.h.p. Moreover, Corol√ lary 2.13 states that these bins receive at most l + O( l log n + log n) ⊂ O(log n) messages w.h.p., i.e., we need to consider values of i ∈ O(log n) only. Thus, the number of balls that are not accepted in the first phase is bounded by n X (i − 3l) Y i − Y i+1 i=3l+1 O(log n)

X

∈

h i p (i − 3l) E Y i − Y i+1 + O n log n

i=3l+1

⊆

⊆

⊆ ⊆ ⊆

n l n l

O(log n)

X i=3l+1

! n−i i p l n l (i − 3l) 1− +O n log n i n n

O(log n)

X i=3l+1

(i − 3l)

el i

i +O

p n log n

∞ p n X e (j+2)l +O n log n jl l j=1 3 p e 2l O n + n log n 3 −(2−o(1))l p 3 n+O n log n e

w.h.p., where in the third step we used the inequality ni ≤ (en/i)i . √ −Ω(l) Thus, w.h.p. at most 2 n+O n log n balls are not assigned in the first two steps. Hence, according to Corollary 8.5, we can deal with the

8.2. OPTIMAL ASYMMETRIC ALGORITHM

107

remaining balls within log∗ n − log∗ l + O(1) rounds by running A2b with k initialized to 2αl for α ∈ (2 − o(1)) log(3/e) when executing As . We conclude that A1 (l) will terminate after log∗ n−log∗ l+O(1) rounds w.h.p. as claimed. In particular, if we set l := log(r) n, for any r ∈ N, the algorithm terminates within r + O(1) rounds w.h.p. However, the result of Lemma 8.10 is somewhat unsatisfactory with respect to the balls-into-bins problem, since a subset of the bins has to deal with an expected communication load of l + O(1) ∈ ω(1). Hence, we want to modify the algorithm such that this expectation is constant. To this end, assume that l ∈ O(log n/ log log n) and l2 is a factor of n. Consider the following algorithm A2 (l) which assigns balls as coordinators of intervals of up to l2 consecutive bins. 1. With probability 1/l, each ball picks one bin interval Ij := {(j − 1)l + 1, . . . , jl}, j ∈ {1, . . . , n/l}, uniformly at random and contacts these bins. These messages from each ball contain a (for each ball fixed) string of d(c + 2) log ne random bits. 2. Each bin that receives one or more messages sends an acknowledgement to the ball whose random string represents the smallest number; if two or more strings are identical, no response is sent. 3. Each ball b that received acknowledgements from a contacted interval Ij queries one u.i.r. chosen bin from each interval Ij+1 , . . . , Ij+l−1 (taking indices modulo n/l) whether it has previously acknowledged a message from another ball; these bins respond accordingly. Ball b becomes the coordinator of Ij and all consecutive intervals Ij+1 , . . . , Ij+k , k < l such that none of these intervals has already responded to another ball in step 2. The algorithm might miss some bin intervals, but overall most of the bins will be covered. Lemma 8.11. When A2 (l) terminates after a constant number of rounds, w.h.p. all but 2−Ω(l) n bins have a coordinator. The number of messages sent or received by each ball is constant in expectation and at most O(l). The number of messages sent or received by each bin is constant in expectation and O(log n/ log log n) w.h.p. The total number of messages is in O(n) w.h.p. Proof. The random strings chosen in step 2 are unique w.h.p., since the probability that two individual strings are identical is at most 2−(c+2) log n = n −(c+2) 2 n and we have 2 < n different pairs of balls. Hence, we may w.l.o.g.

108

CHAPTER 8. BALLS-INTO-BINS ALGORITHMS

assume that no identical strings are received by any bins in step 2 of the algorithm. In this case, if for some j the bins in Ij are not contacted in steps 1 or 3, this means that l consecutive intervals were not contacted by any ball. The probability for this event is bounded by n n l l 1 = 1− < e−l , 1− · l n/l n Hence, in expectation less than e−l n/l intervals get no coordinator assigned. The variables indicating whether the Ij have a coordinator are negatively associated, as can be seen as follows. We interpret the first round as throwing n balls u.i.r. into n bins, of which n/l are labeled I1 , . . . , In/l . An interval Ij has a coordinator exactly if one of the bins Ij−l+1 , . . . , Ij (again, indices modulo n/l) receives a ball. We know from Lemma 2.15 that the indicator variables Yi1 counting the non-empty bins are negatively associated; however, the third step of its proof uses Statement (iii) from Lemma 2.14, which applies to any set of increasing functions. Since maxima of increasing functions are increasing, also the indicator variables max{YI1j−l+1 , YI1j−l+2 , . . . , YI1j } are negatively associated. Therefore, Corollary 2.13 yields that the number of intervals that have no coordinator is upper bounded by O(e−l n/l + log n) w.h.p. Consequently, w.h.p. all but O(e−l n + l log n) ⊆ e−Ω(l) n bins are assigned a coordinator. Regarding the communication complexity, observe that balls send at most O(l) messages and participate in the communication process with probability 1/l. In expectation, n/l balls contact u.i.r. chosen bin intervals, implying that bins receive in expectation one message in step 1. Similarly, at most l balls pick u.i.r. bins from each Ij to contact in step 3. Since in step 3 at most l − 1 messages can be received by bins, it only remains to show that the bound of O(log n/ log log n) on the number of messages bins receive in step 1 holds. This follows from the previous observation that we can see step 1 as throwing n balls u.i.r. into n bins, where n/l bins represent the Ij . For this setting the bound follows from Lemma 2.15 and Corollary 2.13. Finally, we apply Corollary 2.13 to the number of balls choosing to contact bins in step 1 in order to see that O(n) messages are sent in total w.h.p. Finally, algorithm A(l) essentially plugs A1 (l) and A2 (l) together, where √ l ∈ O( log n) and l2 is a factor of n. 1. Run Algorithm A2 (l). 2. Each ball contacts one bin, chosen uniformly. 3. Each coordinator contacts the bins it has been assigned to by A2 (l).

8.2. OPTIMAL ASYMMETRIC ALGORITHM

109

4. The bins respond with the number of balls they received a message from in step 2. 5. The coordinators assign (up to) three of these balls to each of their assigned bins. They inform each bin where the balls they received messages from in step 2 need to be redirected. 6. Each ball contacts the same bin as in step 2. If the bin has a coordinator and the ball has been assigned to a bin, the bin responds accordingly. 7. Any ball receiving a response informs the respective bin that it is placed into it and terminates. 8. The remaining balls (and the bins) proceed as if executing Algorithm A2b , however, with k initialized to k(1) := 2αl for an appropriately chosen constant α > 0. Theorem 8.12. Algorithm A(l) solves Problem 6.4 with a maximal bin load of three. It terminates after log∗ n−log∗ l+O(1) rounds w.h.p. Both balls and bins send and receive a constant number of messages in expectation. Balls send and receive at most O(log n) messages w.h.p., bins O(log n/ log log n) many w.h.p. The total number of messages is in O(n) w.h.p. Proof. Lemma 8.11 states that all but 2−Ω(l) n bins have a coordinator. Steps 2 to 7 of A(l) emulate steps 1 and 2 of Algorithm A1 (l) for all balls that contact bins having a coordinator. By Lemma 8.10, w.h.p. all but 2−Ω(l) n of the balls could be assigned if the algorithm would be run completely, i.e., with all bins having a coordinator. Since w.h.p. only 2−Ω(l) n bins have no coordinator and bins accept at most three balls, we conclude that w.h.p. after step 7 of A(l) merely 2−Ω(l) n balls have not been placed into bins. Thus, analogously to Lemma 8.10, step 8 will require at most log∗ n − log∗ l + O(1) rounds w.h.p. Since steps 1 to 7 require constant time, the claimed bound on the running time follows. The bounds on message complexity and maximal bin load are direct consequences √ of Corollary 8.5, Lemma 8.11, the definition of A(l), and the bound of O( log n) on l. Thus, choosing l = log(r) n for any r ∈ N, Problem 6.4 can be solved within r + O(1) rounds.

110

8.3

CHAPTER 8. BALLS-INTO-BINS ALGORITHMS

Symmetric Solution Using ω(n) Messages

A similar approach is feasible for symmetric algorithms if we permit ω(n) messages in total. Basically, Algorithm A(l) relied on asymmetry to assign coordinators to a vast majority of the bins. Instead, we may settle for coordinating a constant fraction of the bins; in turn, balls will need to send ω(1) messages to find a coordinated bin with probability 1 − o(1). Consider the following Algorithm Aω (l), where l ≤ n/ log n is integer. 1. With probability n/l, a ball contacts a uniformly random subset of l bins. 2. Each bin receiving at least one message responds to one of these messages, choosing arbitrarily. The respective ball is the coordinator of the bin. This simple algorithm guarantees that a constant fraction of the bins will be assigned to coordinators of Ω(l) bins. Lemma 8.13. When executing Aω (l), bins receive O(log n/ log log n) messages w.h.p. In total O(n) messages are sent w.h.p. W.h.p., a constant fraction of the bins is assigned to coordinators of Ω(l) bins. Proof. Corollary 2.13 states that in step 1 w.h.p. Θ(n/l) balls decide to contact bins, i.e., Θ(n) messages are sent. As before, the bound on the number of messages bins receive follows from Lemma 2.15 and Corollary 2.13. Using Lemma 2.15 and Corollary 2.13 again, we infer that w.h.p. a constant fraction of the bins receives at least one message. Thus, Θ(n/l) balls coordinate Θ(n) bins, implying that also Θ(n) bins must be coordinated by balls that are responsible for Ω(l) bins. Permitting communication exceeding n messages by more than a constant factor, this result can be combined with the technique from Section 8.1 to obtain a constant-time symmetric algorithm. Corollary 8.14. For l ∈ O(log n), an Algorithm Ac (l) exists that sends O(ln) messages w.h.p. and solves Problem 6.3 with a maximal bin load of O(1) within log∗ n − log∗ l + O(1) rounds w.h.p. Balls send and receive O(l) messages in expectation and O(log n) messages w.h.p. Proof Sketch. W.h.p., Algorithm Aω (l) assigns coordinators to a constant fraction of the bins such that these coordinators control l0 ∈ Ω(l) bins. The coordinators inform each of their bins b of the number of bins `(b) they supervise, while any other ball contacts a uniformly random subset of l bins. Such a bin b, if it has a coordinator, responds with the value `(b). Note that

8.4. AN APPLICATION

111

the probability that the maximal value a ball receives is smaller than l0 is smaller than 2−Ω(l) ; Corollary 2.13 therefore states that w.h.p. (1 − 2−Ω(l) )n balls contact a bin b with `(b) ≥ l0 . Next, these balls contact a bin b from which they received a value `(b) ≥ l0 , where they pick a feasible bin uniformly at random. The respective coordinators assign (at most) constantly many of these balls to each of their bins. By the same reasoning as in Lemma 8.10, we see that (if the constant was sufficiently large) all but 2−Ω(l) n balls can be placed. Afterwards, we again proceed as in Algorithm A2b , with k initialized to 2αl for an appropriate α > 0; analogously to Lemma 8.10 we obtain the claimed running bound. The bounds on message complexity can be deduced from Chernoff bounds as usual. Again, choosing l = log(r) n for any r ∈ N, Problem 6.3 can be solved within r + O(1) rounds using O(n log(r) n) messages w.h.p.

8.4

An Application

We conclude this chapter by briefly presenting an exemplary application of our results. Consider the following routing problem. Assume we have a system of n fully connected nodes, where links have uniform capacity. All nodes have unique identifiers, that is, v ∈ V denotes both the node v and its identifier. For the sake of simplicity, let communication be synchronous and reliable. During each synchronous round, nodes may perform arbitrary local computations, send a (different) message to each other node, and receive messages. Ideally, we would like to fully exploit the outgoing and incoming bandwidth (whichever is more restrictive) of each node with marginal overhead w.h.p. More precisely, we strive for enabling nodes to freely divide the messages they can send in each round between all possible destinations in the network. Naturally, this is only possible to the extent dictated by the capability of nodes to receive messages in each round, i.e., the amount of time required can at best be proportional to the maximal number of messages any node must send or receive, divided by n. This leads to the following problem formulation. Problem 8.15 (Information Distribution Task). Each node v ∈ V is given a (finite) set of messages Sv = {miv | i ∈ Iv } with destinations d(miv ) ∈ V , i ∈ Iv . Messages can be distinguished (e.g., by including the sender’s identifier and the position in an internal ordering

112

CHAPTER 8. BALLS-INTO-BINS ALGORITHMS

of the messages of that sender). The goal is to deliver all messages to their destinations, minimizing the total number of rounds. By ( ) [ i i Rv := mw ∈ Sw d(mw ) = v w∈V

we denote the set of messages a node v ∈ V shall receive. We abbreviate Ms := maxv∈V |Sv | and Mr := maxv∈V |Rv |, i.e., the maximal numbers of messages a single node needs to send or receive, respectively. Note that this task is not trivial, as nodes have no information on who wants to send messages to whom, while indirection is crucial to achieve a small time complexity. Abstractly speaking, we have n concurrent ballsinto-bins problems: If we can for each node v ∈ V distribute the messages destined to it among all other nodes in a constant number of rounds, such that no node holds more than constantly many messages for each destination, all messages can be delivered within constant time. Thus, using the approach from Section 8.2, the following statement can be derived (cf. [68]). Theorem 8.16. Problem 8.15 can be solved in Ms + Mr O n rounds w.h.p. As can be seen from this bound, it is feasible to apply our techniques also if the number of balls m is not at most the number of bins n. Essentially, this is accounted for by increasing the maximal bin load to O(m/n). Similar results are obtained if one bases a solution on Algorithm Ab , however, with an additive term of O(log∗ n) in the time needed to solve Problem 8.15 [68]. Figure 8.1 shows simulation results agreeing with the bounds from our analysis.

8.4. AN APPLICATION

113

Figure 8.1: Simulations of a solution to Problem 8.15 based on Algorithm Ab . Number of remaining messages plotted against passed phases (taking two rounds each). 512 runs were simulated for 32 (left) and 64 (right) nodes. (Ms + Mr )/n was close to 2 for all runs. In both cases, most of the instances terminated after three phases; for 32 nodes, a single instance required 5 phases.

Part III

Graph Problems in Restricted Families of Graphs

Chapter 9

An Introduction to Graph Problems “How many papers have you published yet?” – “Three.” – “And how long are your studies supposed to take?” – “Three years.” – “Then you have to write six more papers before you’re done!” – My mother’s approximation to the number of publications that make up a thesis.

In large parts, theoretical computer science is the study of the complexity of archetypical problems, many of which are naturally formulated on graphs. Traditionally, one primarily examines how many sequential computing steps are required to solve an instance of a problem in the worst case. This is measured in terms of the input size, frequently expressed by the number of nodes n. With the advent of distributed computing, this approach had to be reconsidered. While NP-hard problems remain (supposedly) intractable even if one increases the computing power by factor n, the main hurdles for “solvable” graph problems in distributed systems are usually limits on communication and concurrency. In fact, classical distributed symmetry breaking problems like finding a coloring of the nodes with ∆ +1 colors (see Definition 2.25) or a maximal set of non-adjacent nodes (see Definition 9.6 are completely trivial from the perspective of sequential algorithms. In this part of this thesis, we consider two well-known graph problems, the first being the minimum dominating set problem. Definition 9.1 (Dominating Sets). Given a graph G = (V, E), a node v ∈ V (set A ⊆ V ) covers Nv+ (NA+ ). The set D ⊆ V is a dominating set (DS) if it covers V . A dominating set of minimal cardinality is a minimum dominating set (MDS).

118

CHAPTER 9. INTRODUCTION TO GRAPH PROBLEMS

As easy as it is to state the problem of finding a minimum dominating set, as difficult it is to solve. In fact, finding a minimum dominating set was one of the first tasks known to be NP-hard [37]. Consequently, one typically is satisfied with an approximative solution. Definition 9.2 (Minimum Dominating Set Approximations). Given f ∈ R+ , a DS D is an f MDS approximation, if |D| ≤ f |M | for any MDS M . For f : N → R+ , an f approximation algorithm for the MDS problem returns for any feasible graph of n nodes a DS that is an f (n) approximation. For randomized algorithms this might happen only with at least a certain probability; this probability bound however must not depend on the particular instance. In general, it is also NP-hard to compute a C log ∆ approximation [95], where C > 0 is some constant. Indeed, unless NP is a subset of the problems that can be solved within nO(log log n) steps, a (1 − o(1)) ln ∆ approximation is intractable in general graphs. While this bound is easily matched by the centralized algorithm that always picks the node covering the most yet uncovered nodes, the distributed case is more involved. Here, the lower bound on the approximation ratio can be asymptotically matched within O(log n) rounds by a randomized algorithm [56], whereas any distributed algorithm achieving a polylogarithmic √ approximation must take Ω(log ∆) and Ω( log n) rounds [54]. Therefore, the algorithm from [56] is O(log n/ log ∆)-optimal with respect to the running time. We remark, though, that the algorithm makes use of messages whose size respects no non-trivial bound, i.e., nodes might need to learn about the entire graph. For this reason, the authors also propose a variant of their algorithm running in O(log2 ∆) rounds with message size O(log ∆), yielding a complexity gap of O(log ∆) in running time. While not tight, these results seem to suggest that we are not far from understanding the distributed complexity of the MDS problem. We believe that this is not the case. The lower bound from [54] is based on graphs that have large girth, yet many edges. Although such graphs exist and show that there is no algorithm solving the problem more efficiently in any graph of n nodes or maximum degree ∆, in a practical setting one will never encounter one of these constructed instances. Thus, we argue that it is reasonable to restrict inputs to graph families which occur in realistic settings. Of course, this approach suffers from the drawback that it is not trivial to find appropriate families of graphs—supposing they even exist—offering both sufficiently efficient solutions as well as wide practical applicability. We do not claim to have a satisfying answer to this question. Instead, we confine ourselves to striving for an extended knowledge on the complexity of the MDS problem in restricted families of graphs.

119

To this end, we devise two MDS approximation algorithms for graphs of small arboricity presented in Chapter 12. Definition 9.3 (Forest Decomposition and Arboricity). For f ∈ N0 , an f forest decomposition of a graph G = (V, E) is a partition of the edge set into f rooted forests. The arboricity A(G) is the minimum number of forests in a forest decomposition of G. The graph class of (constantly) bounded arboricity is quite general, as any family of graphs excluding a fixed minor has bounded arboricity. Definition 9.4 (Contractions and Minors). Given a simple graph G, a minor of G can be obtained by any sequence of the following operations. • Deleting an edge. • Deleting a node. • Contracting an edge {v, w}, i.e., replacing v and w by a new node u such that Nu := (Nv ∪ Nw )/{v, w}. Note, however, that graphs of bounded arboricity may contain arbitrary √ minors. In particular, if we take the complete graph K√n of n nodes and replace its edges by edge-disjoint paths of length two, we obtain a graph of fewer than n nodes and arboricity two that has K√n as minor. Therefore, demanding bounded arboricity is considerably less restrictive than excluding the existence of certain minors. Both presented algorithms improve on the results from [56] that apply to general graphs. However, the lower bound from [54] does not hold for graphs of bounded arboricity, yet these algorithms have logarithmic running times. In Chapter 13, we present a different approach that on planar graphs achieves an O(1) approximation in a few number of rounds. Definition 9.5 (Planarity). A graph G is planar if and only if it can be drawn in the two-dimensional plane such that no two nodes are mapped to the same point and edges intersect at their endpoints only. Equivalently, G is planar if and only if it does neither contain K3,3 nor K5 as a minor, where Kk,k , k ∈ N, is the complete bipartite graph of k nodes each and Kk is the complete graph of k nodes. Although our algorithm for planar graphs is not practical due to the use of large messages, we deem this result interesting because it shows that a fast solution exists in a graph family where an O(1) approximation is not immediately evident. In contrast, in trees the set of inner nodes forms a three approximation, for instance, and in graphs of bounded degree ∆ taking all nodes yields a ∆ + 1 approximation.

120

CHAPTER 9. INTRODUCTION TO GRAPH PROBLEMS

Our algorithms and the one from [56] share two similarities. On the one hand, they all act greedily in one way or the other. Considering that the sequential algorithm matching the (1 − o(1)) ln n lower bound on the approximation ratio on general graphs is also greedy, this could be anticipated. More interestingly, a main obstacle for all these algorithms is symmetry breaking. While in [56] this is achieved by randomized rounding, our algorithms exploit the additional structure present in graphs of small arboricity and planar graphs, respectively. Nevertheless, one of our algorithms additionally relies on randomized symmetry breaking, employing a standard technique for constructing a so-called maximal independent set. Definition 9.6 (Independent Sets). Given a graph G = (V, E), a subset of the nodes I ⊆ V is an independent set (IS), if for any v, w ∈ I, v 6= w, we have that {v, w} 6∈ E. An IS is a maximal independent set (MIS), if no nodes can be added without destroying independence, i.e., for all v ∈ V \ I, the set I ∪ {v} is not independent. A maximum independent set (MaxIS) is an IS of maximal size. Note that any MIS is a DS. Moreover, an MIS can be very different from a MaxIS, as can be seen by example of the star graph. Computing an MIS sequentially is trivial, whereas in general graphs the best known distributed solutions run in O(log n) randomized rounds [2, 43, 73, 80]. In case of MIS the lower bound construction from√[54] applies to line graphs and proves this to be optimal up to a factor of O( log n). Except for graph classes where the problem is much simpler because the number of independent nodes in a neighborhood is small, no considerably faster distributed algorithms are known. In particular, even in a (non-rooted) forest the fastest solution takes O(log n/ log log n) rounds, while the strongest lower bound is Ω(log∗ n) [70, 89]. This startling complexity gap motivates to study the MIS problem in forests, which we do in Chapter 10. Finally, we consider unit disk graphs, where the problems of finding an MIS and approximating an MDS are closely related, as any MIS is a constantfactor MDS approximation. Definition 9.7 (Unit Disk Graphs). A unitdisk graph G(ι) = (V, E) is de fined by a mapping ι : V → R2 , where E := {v, w} ∈ V2 kv − wkR2 ≤ 1 . For this family of graphs, we generalize the lower bound from [70] in Chapter 11 to show that no deterministic algorithm can find a constantfactor MDS approximation in o(log∗ n) rounds; this bound has been matched asymptotically [98]. Unit disk graphs and generalizations thereof have been employed in the study of wireless communication to model interference and communication ranges [53, 76]. The algorithm from [98] is efficient for the much larger class

9.1. MODEL

121

of graphs of bounded growth which satisfies that the number of independent nodes up to a certain distance r is bounded by some function f (r) that is independent of n. Thus, together these bounds classify the deterministic complexity of finding a constant MDS approximation in geometric graph families up to constants. We remark, though, that neither the technique from [98] nor our lower bound apply if one uses the more sophisticated physical model (see e.g. [88] and references therein).

9.1

Model

In this part of the thesis, we make use of a very simple network model. We assume a fault-free distributed system. A simple graph G = (V, E) describes the MDS or MIS problem instance, respectively, as well as the communication infrastructure. In each synchronous round, each node v ∈ V may send a (different) message to each of its neighbors w ∈ Nv , receives all messages from its neighbors, and may perform arbitrary finite local computations. Initially, node v knows its neighbors Nv and possibly a unique identifier of size O(log n). For some algorithms a port numbering is sufficient, i.e., the node v has a bijective mapping p(v, ·) : Nv → {1, . . . , |Nv |} at hand. When sending a message, it may specify which neighbor receives which message by means of the respective port numbers. When receiving, it can tell apart which neighbor sent which message, also in terms of its port numbering. At termination, the node must output whether it is part of the DS or MIS, respectively, and these outputs must define a valid solution of the problem with regard to G. In the context of distributed graph algorithms, this abstract model can be motivated as follows. • Asynchronicity can be dealt with by a synchronizer (cf. [3]). • Recovery from transient faults can be ensured by making the algorithm self-stabilizing (see Definition 5.27). There is a simple transformation from algorithms obeying the given model to self-stabilizing ones [5, 64]. • Changing topology due to joining and leaving nodes, crash failures, etc. also changes the input, i.e., we need to rerun the algorithm on the new topology. • With respect to lower bounds, we typically restrict neither the number of messages nor their size. For algorithms, these values should of course be small. This is not enforced by the model, but considered as quality measure like the running time.

122

CHAPTER 9. INTRODUCTION TO GRAPH PROBLEMS

• Local computation and memory are typically not critical resources, as the most efficient algorithms usually are not highly complicated (within a small number of rounds, only little information is available that can be processed). Note that if the algorithm terminates within T rounds, recovery from faults or adaption to new topology require local operations up to distance at most T from the event only. In particular, if T is sublogarithmic and degrees are bounded, we get a non-trivial bound on the size of the subgraph that may affect the outcome of a node’s computations. This underlines the importance of both upper and lower bounds on the time complexity of algorithms in this model; for small time complexities, this might even be the most significant impact of such bounds – for applications the difference between 5 and 10 rounds of communication might be of no concern, but whether a small fraction or the majority of the nodes has to re-execute the algorithm and change its state in face of e.g. a single node joining the network could matter a lot.

9.2

Overview

In Chapter 10 we propose an algorithm computing an MIS in non-rooted forests. For rooted forests, the problem is known to have a time complexity of Θ(log∗ n) [19, 89]. In contrast, for non-rooted forests no stronger lower bound is known, yet the previously fastest algorithm requires Θ(log n/ log log n) √ rounds [9]. With a running time of O( log n log log n), our result reduces this gap considerably. To be fair, the deterministic algorithm from [9] is efficient for the much more general class of graphs of bounded arboricity. However, it makes use of an f -forest decomposition requiring at least Ω(log n/ log f ) rounds, as derived in [9] from a coloring lower bound by Linial [70]. Hence, this technique cannot be made faster. Moreover, once the decomposition is obtained, the respective algorithm makes heavy use of the fact that the outdegree (i.e., the number of parents) of each node is small. Similarly, other fast solutions that run in time Θ(log∗ n) on rooted forests [19], graphs of bounded degree [40], or graphs of bounded growth [98] exploit that in one way or another the number of neighbors considered by each node can be kept small. By partly related techniques, one can color the graph with ∆+1 colors and subsequently incrementally add still independent nodes of each color to an IS concurrently, until eventually an MIS has been constructed. This results in deterministic algorithms with a running time of O(∆ + log∗ n) [8, 49]. See Table 9.1 for an overview of results on MIS computation. In light of these results and the fact that forests are a very restrictive graph family, maybe the most interesting point about the algorithm presented

9.2. OVERVIEW

123

Table 9.1: Upper and lower bounds on distributed time complexities of the MIS problem in various graph families. graph family

running time

deterministic

general [2, 43, 73, 80]

O(log n) √ 2O( log n) √ Ω min log n, log ∆ log∗ n − O(1) 2 O(log∗ n) O(∆ + log∗ n) ∗ O(log n)

no

general [90] line graphs [54] rings & forests [73, 89] rings & rooted forests [19] maximum degree ∆ [8, 49] bounded growth [98] bounded arboricity [9] forests (Chapter 10)

O logloglogn n √ O log n log log n

yes no no yes yes yes yes no

in Chapter 10 is the following: Despite an unbounded number of independent neighbors and the lack of a conducive edge orientation, it can break symmetry in considerably sub-logarithmic time. In Chapter 11, we turn to studying the problem of approximating an MDS in unit disk graphs. Leveraging Linial’s lower bound [70] of (log∗ n − 1)/2 on the number of rounds required to compute a 3-coloring or maximal independent set on the ring, we can show that no deterministic algorithm can compute an f (n) approximation in g(n) rounds if f (n)g(n) ∈ o(log∗ n). Independently, Czygrinow et al. showed by a related, but different approach that a constant approximation is impossible within o(log∗ n) rounds [25]. On the other hand, unit disk graphs feature bounded growth, i.e., the maximal number of independent nodes in each r-hop neighborhood is polynomially bounded in r: ∀v ∈ V, r ∈ N :

max {|I|} ≤ f (r), (r)

IS I⊆Nv

where f is some polynomial. This property can be exploited in order to compute an MIS in O(log∗ n) deterministic rounds [98]. Note that graphs of bounded growth (and thus in particular unit disk graphs) also exhibit the weaker property of bounded independence, i.e., the neighborhood of each node contains a constantly bounded number of independent nodes only. This implies that an MIS (which is always a DS) is an O(1) MDS approximation, as the size of any IS is bounded by the size of an MDS times the respective constant. Therefore, our lower bound is matched asymptotically for the class of

124

CHAPTER 9. INTRODUCTION TO GRAPH PROBLEMS

graphs of bounded growth. In contrast, the fastest known MIS algorithms on graphs of bounded independence are just the ones for general graphs, leaving a factor of O(log n/ log∗ n) between the strongest upper and lower bounds known so far. Note, however, that √ algorithms that are based on computing a MIS cannot be faster than Ω( log n) rounds, as the lower bound by Kuhn et al. [54] applies to line graphs which feature bounded independence. Complementary, in Chapter 12, we consider graphs of small arboricity. Such graphs can be very different from graphs with small independence. While in the latter case the number of edges may be large (in particular, in the complete graph there are no two independent nodes), graphs of constantly bounded arboricity have O(n) edges, but potentially large sets of independent neighbours (the star graph has arboricity one). We remark that it is not difficult to see that demanding both bounded growth and arboricity is equivalent to asking for bounded degree. In the family of graphs of maximum degree ∆ ∈ O(1) simply taking all nodes yields a trivial ∆ + 1 MDS approximation; we already mentioned that finding an MIS here takes Θ(log∗ n) rounds [40, 89]. We devise two algorithms for graphs of small arboricity A. The first employs a forest decomposition to obtain an O(A2 ) MDS approximation within time O(log n) w.h.p. This algorithm utilizes the fact that not too many nodes may be covered by their children in a given forest decomposition, implying that a good approximation ratio can be maintained if we cover all nodes that have one by a parent. Solving this problem approximatively can be reduced to computing an arbitrary MIS in some helper graph. Therefore, we get a randomized running time of O(log n) w.h.p., whereas the approximation guarantee of O(A2 ) is deterministic. The second algorithm we propose is based on the property that subgraphs of graphs of bounded arboricity are sparse, i.e., if having n0 nodes, they contain at most A(n0 − 1) edges. For this reason, we can build on the very simple greedy strategy of adding all nodes of locally large degree to the output set simultaneously, until eventually, after O(log ∆) rounds, all nodes are covered. This straightforward approach yields an O(A log ∆) approximation if one makes sure that the number of covered nodes in each repetition is at least the number of selected nodes. The latter can easily be done by requiring that uncovered nodes choose one of their eligible neighbors to enter the set instead of just electing all possible candidates into the set. Playing with the factor up to which joining nodes are required to have largest degree within their two-neighborhood, the algorithm can be modified to an O(αA logα ∆) approximation within O(logα ∆) time, for any integer α ≥ 2. This second algorithm appeals by its simplicity; unlike the first, it is uniform, deterministic, and merely requires port numbers.

9.2. OVERVIEW

125

Recall that the best algorithm with polylogarithmically sized messages for general graphs has running time O(log2 ∆) and approximation ratio O(log ∆) [56]. Thus, the algorithms given in Chapter 12 clearly improve on this result for graphs of bounded arboricity. However, although the lower bound from [54] fails to hold for such √ graphs, our algorithms’ running times do not get below the thresholds of Ω( log n) and Ω(log ∆), respectively. In contrast, we show in Chapter 13 that in planar graphs an O(1) approximation can be computed in O(1) rounds.1 Our algorithm makes use of the facts that planar graphs and their minors are sparse, i.e., contain only O(n) edges, and that in planar graphs circles separate their interior (with respect to an embedding) from their outside. A drawback of our algorithm for planar graphs is that it is rendered impractical because it relies on messages that in the worst case encode the whole graph. For the same reason, the technique by Czygrinow et al. [25] to obtain a 1 + ε approximation in O(log∗ n) rounds is also of theoretical significance only. If message size is required to be (poly)logarithmic in n, to the best of our knowledge the most efficient distributed algorithms are the ones from Chapter 12. More generally, the same is true for any graph family excluding fixed minors. Also here distributed 1 + ε approximations are known [23, 24], however of polylogarithmic running time with large exponent and again using large messages. Excluding particular minors is a rich source of graph families, apart from planar graphs including e.g. graphs of bounded genus or treewidth. Again to the best of our knowledge, currently no distributed algorithms tailored to these families of graphs exist. Moreover, as mentioned before, graphs of bounded (or non-constant, but slowly growing) arboricity extend beyond minor-free graph families. Therefore, the algorithms presented in Chapter 12 improve on the best known solutions for a wide range of inputs. See Table 9.2 for a comparison of distributed MDS approximations.

1 This result was claimed previously, in [61] by us and independently in [25] by others. Sadly, both proofs turned out to be wrong.

CHAPTER 9. INTRODUCTION TO GRAPH PROBLEMS 126

unit disk (Chapter 11)

bounded independence [90] bounded growth [98]

general [56] general [56] general [54] bounded independence [80]

graph family

0 1 6 O(log∗ n) polylog n O(A2 ) O(logα ∆)

g(n) ∈ o(log∗ n)

O(log n) O(log2 ∆) √ Ω min log n, log ∆ O(log n) √ 2O( log n) O(log∗ n)

running time

O(log ∆) O(log ∆) polylog n O(1)

approximation ratio

yes

yes yes

no no no no

deterministic

arbitrary

trivial O(log n)

trivial O(log ∆) arbitrary O(1)

message size

log n g(n)

O(1) O(1) ∗

N/A O(1) trivial trivial trivial O(log n) O(log logα ∆)

∆+1 3 130 1+ε 1+ε O(log n) O(αA logα ∆)

6∈ o

yes yes yes yes yes no yes

Table 9.2: Upper and lower bounds on distributed MDS approximation in various graph families. For upper bounds, message size is stated to be “trivial” if the whole topology might be collected at individual nodes.

maximum degree ∆ (trivial) forests (trivial) planar (Chapter 13) planar [25] excluded minor [23, 24] arboricity A (Chapter 12) arboricity A (Chapter 12)

Chapter 10

Maximal Independent Sets on Trees “In fact, distributed MIS computation can be seen as a Drosophila of distributed computing as it prototypically models symmetry breaking [. . . ]” – Fabian Kuhn, The Price of Locality.

In this chapter, we present a uniform randomized MIS algorithm that √ has a running time of O( log n log log n) in forests w.h.p. Over each edge, O(log n) bits are transmitted w.h.p. This material appears as [67].

10.1

Algorithm

For the sake of simplicity, we first describe a simplified variant of the algorithm, which (i) is non-uniform, (ii) makes use of uniformly random real numbers, and (iii) whose pseudocode contains a generic term of Θ(R). In Theorem 10.10, we will show how to remove the first two assumptions, and it will turn out that for the uniform algorithm the precise choice of the constants in the term Θ(R) is of no concern (barring constant factors). The algorithm seeks to increase the number of nodes in the independent set I perpetually until it is maximal. Whenever a node enters I, its inclusive neighborhood is removed from the graph and the algorithm proceeds on the remaining subgraph of G. The algorithm consists of three main steps, each of which employs a different technique to add nodes to I. It takes a single parameter R, which ideally is small enough to guarantee a small running time of the first two loops of the algorithm, but large enough to guarantee that the residual nodes can be dealt with quickly by the final loop of the algorithm.

128

CHAPTER 10. MIS ON TREES

Algorithm 10.1: Fast MIS on Trees. input : R ∈ N output: MIS I 1 I := ∅ 2 for i ∈ {1, . . . , Θ(R)} do // reduce degrees 3 for v ∈ V in parallel do 4 rv := u.i.r. number from [0, 1] ⊂ R 5 if rv > maxw∈Nv {rw } then // join IS (no neighbor can) 6 I := I ∪ {v} 7 delete Nv+ from G 8 end 9 end 10 end 11 for i ∈ {1, 2} do // remove nodes of small degree 12 H := subgraph of G induced by nodes of degree δv ≤ R 13 (R + 1)-color H 14 for i ∈ {1, . . . , R + 1} do 15 for v ∈ V with C(v) = i in parallel do // colors independent 16 I := I ∪ {v} 17 delete Nv+ from G 18 end 19 end 20 end 21 while V 6= ∅ do // clean up 22 for v ∈ V in parallel do 23 if δv ∈ {0, 1} then // remove isolated nodes and leaves 24 if ∃w ∈ Nv with δw = 1 then // adjacent leaves 25 rv := u.i.r. number from [0, 1] ⊂ R 26 if rv > rw then 27 I := I ∪ {v} 28 delete Nv+ from G 29 end 30 end 31 else 32 I := I ∪ {v} 33 delete Nv+ from G 34 end 35 end 36 end 37 end

10.2. ANALYSIS

129

We proceed by describing the parts of the algorithm in detail. See Algorithm 10.1 for its pseudocode. In the first part, the following procedure is repeated Θ(R) times. Each active node draws a number from [0, 1] ⊂ R u.i.r. and joins I if its value is larger than the ones of all its neighbors.1 Techniques similar to this one, which is due to Luby [73], have been known for a long time and reduce the number of edges in the graph exponentially w.h.p.; however, in our case we have the goal of reducing the maximum degree in the graph rapidly. Once the maximum degree is small, we cannot guarantee a quick decay w.h.p. anymore, therefore the employed strategy is changed after Θ(R) rounds. Consequently, the second part of the algorithm aims at dealing with smalldegree nodes according to a deterministic scheme. We will show that if R is large enough, most nodes will not have more than R neighbors of degree larger than R in their neighborhood, even if not all nodes of degree larger than R could be removed during the first loop. Thus, removing all nodes of degree at most R, we reduce for most nodes the degree to R. To this end, we first (R + 1)-color the subgraph induced by all nodes of degree at most R and then iterate through the colors, concurrently adding all nodes to I that share a color. Executing this subroutine for a second time, all nodes that previously had at most R neighbors of degree larger than R will be eliminated. Yet, a small fraction of the nodes may still remain active. To deal with these nodes, we repeat the step of removing all leaves and isolated nodes from the forest until all nodes have terminated. As evident from the description of the algorithm, each iteration of the first or third loop can be implemented within a constant number of synchronous distributed rounds. The second loop requires O(R) time plus the number of rounds needed to color the respective subgraph; for this problem deterministic distributed algorithms taking O(R + log∗ n) √ time are known [8, 49]. In Theorem 10.9 we will show that for some R ∈ O( log n log log n), the third loop of the algorithm will complete in O(R) rounds w.h.p. Thus, √ for an appropriate value of R, the algorithm computes an MIS within O( log n log log n) rounds.

10.2

Analysis

For the purpose of our analysis, we will w.l.o.g. assume that the graph is a rooted tree. In order to execute the algorithm, the nodes do not need access to this information. The children of v ∈ V are denoted by Cv ⊆ Nv . The 1 A random real number from [0, 1] can be interpreted as infinite string of bits of decreasing significance. As nodes merely need to know which one of two values is larger, it is sufficient to generate and compare random bits until the first difference occurs.

130

CHAPTER 10. MIS ON TREES

lion’s share of the argumentation will focus on the first loop of Algorithm 10.1. We will call an iteration of this loop a phase. By δv (i) we denote the degree of node v at the beginning of phase i ∈ {1, . . . , Θ(R)} in the subgraph of G induced by the nodes that have not been deleted yet; similarly, Nv (i) and Cv (i) are the sets of neighbors and children of v that are still active at the beginning of phase i, respectively. We start our analysis with the observation that in any phase, a highdegree node without many high-degree children is likely to be deleted in that phase, independently of the behavior of its parent. Lemma 10.1. Suppose that at the beginning of phase i it holds for a node v that half of its children have degree at most δv (i)/(16 ln δv (i)). Then, independently of the random number of its parent, v is deleted with probability at least 1 − 5/δv (i) in that phase. Proof. Observe that the probability that v survives phase i is increasing in the degree δw (i) of any child w ∈ Cv (i) of v. Thus, w.l.o.g., we may assume that all children of v of degree at most δ := δv (i)/(16 ln δv (i)) have exactly that degree. Consider such a child w. With probability 1/δ, its random value rw (i) is larger than all its children’s. Denote by X the random variable counting the number of children w ∈ Cv (i) of degree δ satisfying that ∀u ∈ Cw (i) : rw (i) > ru (i). It holds that E[X] =

X w∈Cv (i) δw (i)=δ

(10.1)

1 δv ≥ ≥ 8 ln δv (i). δ 2δ

Since the random choices are independent, applying Corollary 2.13 yields that E[X] 1 P X< < e−E[X]/8 ≤ . 2 δv (i) Node v is removed unless the event E that rv (i) > rw (i) for all the children w ∈ Cv (i) of degree δ satisfying (10.1) occurs. If E happens, this implies that rv (i) is also greater than all random values of children of such w, i.e., rv (i) is greater than δ E[X]/2 ≥ δv (i)/4 other independent random values. Since the event that X ≥ E[X]/2 depends only on the order of the involved random values, we infer that P [E | X ≥ E[X]/2] < 4/δv (i). We conclude that v is deleted with probability at least E[X] E[X] 1 4 5 P X≥ P E¯ X ≥ > 1− 1− >1− 2 2 δv (i) δv (i) δv (i)

10.2. ANALYSIS

131

as claimed. Since we reasoned about whether children of v join the independent set only, this bound is independent of the behavior of v’s parent.

Applied inductively, this result implies that in order to maintain a high degree for a considerable number of phases, a node must be the root of a large subtree. This concept is formalized by the following definition. Definition 10.2 (Delay Trees). A delay tree of depth d ∈ N0 rooted at node v ∈ V is defined recursively as follows. For d = 0, the tree consists of v only. For d > 0, node v satisfies at least one of the following criteria: (i) At least δv (d)/4 children w ∈ Cv are roots of delay trees of depth d − 1 and have δw (d − 1) ≥ δv (d)/(16 ln δv (d)). (ii) Node v is the root of a delay tree of depth d − 1 and δv (d − 1) ≥ δv (d)2 /(324 ln δv (d)). In order to bound the number of phases for which a node has a high chance of retaining a large degree, we bound the depth of delay trees rooted at high-degree nodes. √ δv (d) ≥ eR . Then for a Lemma 10.3. Assume that R ≥ 2 ln n ln ln n and p delay tree of depth d rooted at v it holds that d ∈ O( log n/ log log n).

Proof. Assume w.l.o.g. that d > 1. Denote by si (δ), where i ∈ {0, . . . , d − 1} and δ ∈ N, the minimal number of leaves in a delay tree of depth i rooted at some node w satisfying δw (i + 1) = δ. We claim that for any δ and i ≤ ln δ/(2 ln(324 ln δ)), it holds that

si (δ) ≥

i Y j=1

δ , (324 ln δ)j−1

which we show by induction. This is trivially true for i = 1, hence we need to perform the induction step only. For any i ∈ {2, . . . , bln δ/(2 ln(324 ln δ))c}, the assumption that the claim is true for i − 1 and the recursive definition of

132

CHAPTER 10. MIS ON TREES

delay trees yield that δ δ2 δ si−1 , si−1 si (δ) ≥ min 4 16 ln δ 324 ln δ ( i−1 ) i−1 Y Y δ/(16 ln δ) δ 2 /(324 ln δ)) δ ≥ min , 4 j=1 (16 ln(δ/(16 ln δ)))j−1 j=1 (324 ln(δ/(324 ln δ)))j−1 ( i−1 ) i−1 Y δ Y δ δ2 > min , 4 j=1 (16 ln δ)j j=1 (324 ln δ)j >δ

i−1 Y j=1

=

i Y j=1

δ (324 ln δ)j

δ . (324 ln δ)j−1

Thus the induction step succeeds, showing the claim. Now assume that v is the root of a delay tree of depth d−1. As δv (d) ≥ eR , we may insert any i ∈ {1, . . . , min{d − 1, bR/(2 ln 324R)c}} into the previous claim. Supposing for contradiction that d − 1 ≥ bR/(2 ln 324R)c, it follows that the graph contains at least bR/(2 ln 324R)c

Y j=1

eR > (324R)j−1

bR/(2 ln 324R)c

Y

eR/2 ∈ eR

2

/((4+o(1)) ln R)

⊆ n2−o(1)

j=1

nodes. On the other hand, if d − 1 < bR/(2 ln 324R)c, we get p that the graph contains at least e(d−1)R/2 nodes, implying that d ∈ O( ln n/ ln ln n) as claimed. √ With this statement at hand, we infer that for R ∈ Θ( log n log log n), it is unlikely that a node has degree eR or larger for R phases. √ Lemma 10.4. Suppose that p R ≥ 2 ln n ln ln n. Then, for any node v ∈ VR and some number r ∈ O( log n/ log log n), we have that δv (r + 1) < e with probability at least 1 − 6e−R . This statement holds independently of the behavior of v’s parent. Proof. Assume that δv (r) ≥ eR . According to Lemma 10.1, node v is removed with probability at least 1 − 5/δv (r) in phase r unless half of its children have at least degree δv (r)/(16 ln δv (r)). Suppose the latter is the case and that w is such a child. Applying Lemma 10.1, we see that in phase r − 1, when we have δw (r − 1) ≥ δw (r) ≥ δv (r)/(16 ln δv (r)), w is removed with probability 1−5/δw (r−1) if it does not

10.2. ANALYSIS

133

have δw (r − 1)/2 children of degree at least δw (r − 1)/(16 ln δw (r − 1)). Thus, the expected number of such nodes w that do not themselves have many high-degree children in phase r − 1 but survive until phase r is bounded by 80δv (r − 1) ln δv (r) 5δv (r − 1) ≤ . δw (r − 1) δv (r) Since Lemma 10.1 states that the probability bound for a node w ∈ Cv (r− 1) to be removed is independent of v’s actions, we can apply Corollary 2.13 in order to see that (80 + 1/2)δv (r − 1) ln δv (r) + O(log n) δv (r) of these nodes remain active at the beginning of phase r w.h.p. If this number is not smaller than δv (r)/4 ∈ ω(log n), it holds that δv (r−1) ∈ δv (r)2 /(4(80+ 1/2 + o(1)) ln δv (r)). Otherwise, at least δv (r)/2 − δv (r)/4 = δv (r)/4 children w ∈ Cv (r−1) have degree δw (r−1) ≥ δv (r)/16 ln δv (r). In both cases, v meets one of the conditions in the recursive definition of a delay tree. Repeating p this reasoning inductively for all r ∈ O( log n/ log log n) rounds (where we may choose the constants in the O-term to be arbitrarily large), we construct a delay tree of depth at least r w.h.p. p However, Lemma 10.3 states that r must be in O( log n/ log log n) provided that δv (r) ≥ eR . Therefore, for an appropriate choice of constants, we conclude that the event E that both half of the nodes in Cv (r) have degree at least δv (r)/16 ln δv (r) and δv (r) ≥ eR w.h.p. does not occur. If E does not happen, but δv (r) ≥ eR , Lemma 10.1 gives that v is deleted in phase r with probability at least 1 − 5e−R . Thus, the total probability that v is removed or has sufficiently small degree at the beginning of phase r + 1 is bounded by h i h i P E¯ · P v deleted in phase r E¯ and δv (r) ≥ eR 1 5 > 1− c 1− R n e >

1 − 6e−R ,

where we used that R < ln n because δv ≤ n − 1. Since all statements used hold independently of v’s parent’s actions during the course of the algorithm, this concludes the proof. For convenience, we rephrase the previous lemma in a slightly different way.

134

CHAPTER 10. MIS ON TREES

√ Corollary 10.5. Provided that R ≥ 2 log n log log n, for any node v ∈ −ω(R) V it holds with probability 1 − e that δv (R) < eR . This bound holds independently of the actions of a constant number of v’s neighbors. Proof. Observe that R ∈ ω(r), where r and R are defined as in Lemma 10.4. The lemma states that after r rounds, v retains δv (r + 1) ≥ eR with probability at most 6e−R . As the algorithm behaves identically on the remaining subgraph, we can apply the lemma repeatedly, giving that δv (R) < eR with probability 1 − e−ω(R) . Ignoring a constant number of v’s neighbors does not change the asymptotic bounds. Since we strive for a sublogarithmic value of R, the above probability bound does not ensure that all nodes will have degree √ smaller than eR after R phases w.h.p. However, on paths of length at least ln n, at least one of the nodes will satisfy this criterion w.h.p. Moreover, nodes of degree smaller than eR will have left a few high-degree neighbors only, which do not interfere with our forthcoming reasoning. √ Lemma 10.6. Assume that R ≥ 2 log n log log n. Given some path P = (v0 , . . . , vk ), define for i ∈ {0, . . . , k} that Gi is the connected component of G containing vi after removing the edges from P . If δvi (R) < eR , denote ¯ i the connected (sub)component of Gi consisting of nodes w of degree by G √ δw (R) < eR that contains vi . Then, with probability 1 − e−ω( ln n) , we have that (i) δvi (R) < eR and ¯ i have at most (ii) nodes in G

√

ln n neighbors w of degree δw (R) ≥ eR .

This probability bound holds independently of anything that happens outside of Gi . Proof. Corollary 10.5 directly yields Statement (i). For the second statement, let u be any node of degree δu (R) < eR . According to Corollary 10.5, all nodes w ∈ Cu (R) have δw (R) < eR with independently bounded probability 1 − e−ω(R) . In other words, the random variable counting the number of such nodes having degree δw (R) ≥ eR is stochastically dominated from above by the sum of δu (R) independent Bernoulli variables attaining the value 1 with √ −ω(R) −ω( ln n) probability e ⊂√ e . Applying Corollary 2.13, we conclude that w.h.p. no more than ln n of u’s neighbors have a degree that is too large. Due to the union bound, thus both statements are true with probability √ 1 − e−ω( ln n) .

10.2. ANALYSIS

135

Having dealt with nodes of degree eR and larger, we need to show that we can get rid of the remaining nodes sufficiently fast. As a first step, we show a result along the lines of Lemma 10.1, trading in a weaker probability bound for a stronger bound on the degrees of a node’s children. Lemma 10.7. Given a constant β ∈ R+ , assume that in phase i ∈ N for a node v ∈ V we have that at least e−β δv (i) of its children have degree at most eβ δv (i). Then v is deleted with at least constant probability in that phase, regardless of the random value of its parent. Proof. As in Lemma 10.1, we may assume w.l.o.g. that all children of v with degree at most δ := eβ δv (i) have exactly that degree. For the random variable X counting the number of nodes w ∈ Cv (i) of degree δ that satisfy Condition 10.1 we get that E[X] =

X w∈Cv (i) δw (i)=δ

1 ≥ e−2β . δ

Since the random choices are independent, applying Chernoff’s bound yields that −2β /2 P [X = 0] ≤ e−E[X]/2 < e−e =: γ. Provided that X > 0, there is at least one child w ∈ Cv of v that joins the set in phase i unless rv (i) > rw (i). Since rw (i) is already larger than all of its neighbors’ random values (except maybe v), the respective conditional probability P [rv (i) > rw (i) | w satisfies Inequality (10.1)] certainly does not exceed 1/2, i.e., P [X > 0] · P [v is deleted in phase i | X > 0] ≥

1−γ . 2

Since we reasoned about whether children of v join the independent set exclusively, this bound is independent of the behavior of v’s parent. We cannot guarantee that the maximum degree in the subgraph formed by the active nodes drops quickly. However, we can show that for all but a negligible fraction of the nodes this is the case. Lemma 10.8. Denote by H = (VH , EH ) a subgraph of G still present in phase R in which√all nodes have degree smaller than eR and for any node no more than O(√ log n) neighbors outside H are still active in phase R. If R ≥ R(n) ∈ O( log n log log n), it holds that all nodes from H are deleted after the second for-loop of the algorithm w.h.p.

136

CHAPTER 10. MIS ON TREES

Proof. For the sake of simplicity, we consider the special case that no edges to nodes outside H exist first. We claim that for a constant α ∈ N and all i, j ∈ N0 such that i > j and eR−j ≥ 8ec ln n, it holds that n n o o n o max w ∈ Cv (R + αi) | δw (R + αi) > eR−j ≤ max eR−2i+j , 8c ln n v∈V

w.h.p. For i = 1 we have j = 0, i.e., the statement holds by definition because degrees in H are bounded by eR . Assume the claim is established for some value of i ≥ 1. Consider a node w ∈ H of degree δw (R + αi) > eR−j ≥ 8ec ln n for some j ≤ i. By induction hypothesis, the number of children of w having degree larger than eR−(j−1) in phase R + αi (and thus also subsequent phases) is bounded by max{eR−(2i−(j−1)) , 8c ln n} ≤ eR−(j+1) w.h.p., i.e., at least a fraction of 1 − 1/e of w’s neighbors has degree at most factor e larger than δw (R + αi) in phase R + αi. According to Lemma 10.7, this implies that w is removed with constant probability in phase R + αi. Moreover, as long as w has such a high degree, there is at least a constant probability that w is removed in each subsequent phase. This constant probability bound holds independently from previous phases (conditional to the event that w retains degree larger than eR−j ). Furthermore, due to the lemma, it applies to all children w of a node v ∈ H independently. Hence, applying Corollary 2.13, we get that in all phases k ∈ {αi, αi + 1, . . . , α(i + 1) − 1}, the number |{w ∈ Cv (k) | δw (k) > eR−j }| is reduced by a constant factor w.h.p. (unless this number is already smaller than 8c ln n). Consequently, if the constant α is sufficiently large, the induction step succeeds, completing the induction. Recapitulating, after in total O(R) phases, no node in H will have more than O(log n) neighbors of degree larger than O(log n). The previous argument can be extended to reduce degrees even further. The difficulty arising is that once the expected number of high-degree nodes removed from the respective neighborhoods becomes smaller than Ω(log n), Chernoff’s bound does no longer guarantee that a constant fraction of high-degree neighbors is deleted in each phase. However, as used before, for critical nodes the applied probability bounds hold in each phase independently of previous rounds. Thus, instead of choosing α as a constant, we simply increase α with i. Formally, if j0 is the last index such that eR−j ≥ 8ec ln n, we define ( α if i ≤ j0 α(i) := i−j0 αe otherwise. This way, the factor e loss in size of expected values (weakening the outcome of Chernoff’s bound) is compensated for by increasing the number of considered phases by factor e (which due to independence appears in the exponent

10.2. ANALYSIS

137

of the bound) in each step. Hence, within dln(8ec ln n)e R X X ln n ln n √ = O R + α(i) ∈ O R + = O(R) ei log n √ √ i=dln

log n e

i=dln

log ne

√ phases no node√in H will have left more than O( log n) neighbors of degree larger than O( log n) w.h.p. Assuming that constants are chosen appropriately, this is the case after the first for-loop of the algorithm. Recall that in the second loop the algorithm removes all nodes√of degree at most R in each iteration. Thus, degrees in H are reduced to O( log n) in the first iteration of the loop, and subsequently all remaining nodes from H will be removed in the second iteration. Hence, indeed all nodes from H are deleted at the end of the second for-loop w.h.p. as claimed. √ Finally, recall that no node has more than O( log n) edges to nodes outside H. Choosing constants properly, these edges contribute only√a negligible fraction to the nodes’ degrees even once these degrees reach O( log n). Thus, the asymptotic statement obtained by the above√reasoning holds true also if we consider a subgraph H where nodes have O( log n) edges leaving the subgraph. Thus the proof concludes. We can now prove our bound on the running time of Algorithm 10.1. Theorem 10.9. Assume that G is a forest and the coloring steps of Algorithm 10.1 are performed by a subroutine running for O(R + log∗ n) rounds. Then the algorithm eventually terminates and √ outputs a maximal independent set. Furthermore, if R ≥ R(n) ∈ O( log n log log n), Algorithm 10.1 terminates within O(R) rounds w.h.p. Proof. Correctness is obvious because (i) adjacent nodes can never join I concurrently, (ii) the neighborhoods of nodes that enter I are deleted immediately, (iii) no nodes from V \ (∪v∈I Nv+ ) get deleted, and (iv) the algorithm does not terminate until V = ∅. The algorithm will terminate eventually, as in each iteration of the third loop all leaves and isolated nodes are deleted and any forest contains either of the two. √ Regarding the running time, assume that R ∈ O( log n log log n) is sufficiently large, root the tree at an arbitrary node v0 ∈ V , and consider any √ path P = (v0 , . . . , vk ) of length k ≥ ln n. Denote by Gi , i ∈ {0, . . . , k}, the connected component of G containing vi after removing the edges of P ¯ i the connected (sub)component of and—provided that δvi (R) < eR —by G Gi that contains vi and consists of nodes w of degree δw (R) < eR (as in Lemma 10.6). Then, we have by Lemma 10.6 for√ each i with a probability that is independently lower bounded by 1 − e−ω( ln n) that

138

CHAPTER 10. MIS ON TREES

(i) δvi (R) < eR and ¯ i have at most (ii) nodes in G

√

ln n neighbors w of degree δw (R) ≥ eR .

In other words, for each i, with probability independently bounded by 1− √ ¯ i exists and satisfies the prerequisites of Lemma 10.8, implying e−ω( ln n) , G ¯ i are deleted by the end of the second for-loop w.h.p. We that all nodes in G conclude that independently of all vj 6= vi , node vi is not deleted until the √ end of the second loop with probability e−ω( ln n) . Thus, the probability that no node on P is deleted is at most k √ e−ω( ln n) ⊆ e−ω(ln n) = n−ω(1) . √ Hence, when the second loop is completed, w.h.p. no path of length k ≥ ln n starting at v0 exists in the remaining graph. Since v0 was arbitrary, this immediately implies that w.h.p. after the second loop no paths of length √ k ≥ ln n exist anymore, i.e., the components of the remaining √ √ graph have diameter at most ln n. Consequently, it will take at most ln n iterations of the third loop until all nodes have been deleted. Summing up the running times for executing the three loops √ of the algorithm, we get that it terminates within O(R + (R + log∗ n) + ln n) = O(R) rounds w.h.p. We complete our analysis by deducing a uniform algorithm featuring the claimed bounds on time and bit complexity. Theorem 10.10. A uniform MIS algorithm exists that terminates on gen√ eral graphs within O(log n) rounds and on forests within O( log n log log n) rounds, both w.h.p. It can be implemented such that O(log n) bits are sent over each link w.h.p. Proof. Instead of running Algorithm 10.1 directly, we wrap it into an outer loop trying to guess a good value for R (i.e., R(n) ≤ R ∈ O(R(n)), where R(n) as in Theorem 10.9). Furthermore, we restrict the number of iterations of the third loop to R, i.e., the algorithm will terminate after O(R) steps, however, potentially without producing an MIS.2 Starting e.g. from two, with each call R is doubled. Once R reaches R(n), according to Theorem 10.9 the algorithm outputs an MIS w.h.p. provided that G is a forest. Otherwise, R continues to grow until it becomes logarithmic in n. At this point, the analysis of Luby’s algorithm by M´etivier et al. [80] applies to the first loop of our algorithm, showing that it terminates and return an MIS w.h.p. Hence, 2 There is no need to start all over again; one can build on the IS of previous iterations, although this does not change the asymptotic bounds.

10.2. ANALYSIS

139

as the running time of each iteration of the outer loop is (essentially) linear in R √ and R grows exponentially, the overall running time of the algorithm is O( log n log log n) on forests and O(log n) on arbitrary graphs w.h.p. Regarding the bit complexity, consider the first and third loop of Algorithm 10.1 first. In each iteration, a constant number of bits for state updates (entering MIS, being deleted without joining MIS, etc.) needs to be communicated as well as a random number that has to be compared to each neighbor’s random number. However, in most cases exchanging a small number of leading bits is sufficient to break symmetry. Overall, as shown by M´etivier et al. [80], this can be accomplished with a bit complexity of O(log n) w.h.p. Essentially, for every round their algorithm generates a random value and transfers the necessary number of leading bits to compare these numbers to each neighbor only. By Corollary 2.13, comparing O(log n) random numbers between neighbors thus requires O(log n) exchanged bits w.h.p., as in expectation each comparison requires to examine a constant number of bits. Thus, if nodes do not wait for a phase to complete, but rather continue to exchange random bits for future comparisons in a stream-like fashion, the bit complexity becomes O(log n). However, in order to avoid increasing the (sublogarithmic) time complexity of the algorithm on forests, more caution is required. Observe that in each iteration of the outer loop, we know that Θ(R) many random values need to be compared to execute the respective call of Algorithm 10.1 correctly. Thus, nodes may exchange the leading bits of these Θ(R) many random values concurrently, without risking to increase the asymptotic bit complexity. Afterwards, for the fraction of the values for which the comparison remains unknown, nodes send the second and the third bit to their neighbors simultaneously. In subsequent rounds, we double the number of sent bits per number repeatedly. Note that for each single value, this way the number of sent bits is at most doubled, thus the probabilistic upper bound on the total number of transmitted bits increases at most by a factor of two. Moreover, after log log n rounds, log n bits of each single value will be compared in a single round, thus at the latest after log log n + O(1) rounds all comparisons are completed w.h.p. Employing this scheme, the total time complexity of all executions of the first and third loop of Algorithm 10.1 is (in a forest) bounded by dlog R(n)e X O 2i + log log n ⊆ O(R(n) + log R(n) log log n) i=1

= w.h.p.

O

p log n log log n

140

CHAPTER 10. MIS ON TREES

It remains to show that the second loop of Algorithm 10.1 does not require the exchange of too many bits. The number of transmitted bits to execute this loop is determined by the number of bits sent by the employed coloring algorithm. Barenboim and Elkin [8] and Kuhn [49] independently provided deterministic coloring algorithms with running time O(R + log∗ n). These algorithms start from an initial coloring with a number of colors that is polynomial in n (typically one assumes identifiers of size O(log n)), which can be obtained by choosing random colors from the range {1, . . . , nO(1)) } w.h.p. Exchanging these colors (which also permits to verify that the random choices indeed resulted in a proper coloring) thus costs O(log n) bits.3 However, as the maximum degree of the considered subgraphs is R + 1, which is in O(log n) w.h.p., subsequent rounds of the algorithms deal with colors that are of (poly)logarithmic size in n. As exchanging coloring information is the dominant term contributing to message size in both algorithms, the overall bit complexity of all executions of the second loop of Algorithm 10.1 can be kept as low as O(log n + R(n) log log n) = O(log n).

3 To derive a uniform solution, one again falls back to doubling the size of the bit string of the chosen color until the coloring is locally feasible.

Chapter 11

A Lower Bound on Minimum Dominating Set Approximations in Unit Disk Graphs “Whoever solves one of these problems gets a bag of Swiss chocolate.” – An incentive Roger offered for devising the lower bound presented in this chapter.

In this chapter, we will show that in unit disk graphs, no deterministic distributed algorithm can compute an f (n) MDS approximation in g(n) rounds for any f , g with f (n)g(n) ∈ o(log∗ n). This bound holds even if message size is unbounded, nodes have unique identifiers, and the nodes know n. This chapter is based on [65].

11.1

Definitions and Preliminary Statements

The lower bound proof will reason about the following highly symmetric graphs. k Definition 11.1 (Rn ). For k, n ∈ N, we define the k-ring with n nodes k Rn := (Vn , Enk ) by

Vn

:=

{1, . . . , n} (

En

:=

{i, j} ∈

Vn 2

! ) |(i − j) mod n| ≤ k .

142

CHAPTER 11. AN MDS APPROXIMATION LOWER BOUND

1 See Figure 11.1 for an illustration. By Rn := Rn we denote the “simple” ring. Moreover, we will take numbers modulo n when designating nodes on the ring, e.g. we identify 3n + 5 ≡ 5 ∈ Vn .

Obviously, for any k and n this graph is a UDG (see Definition 9.7). k Lemma 11.2. Rn can be realized as a UDG.

Proof. For n > 2k + 1, place all nodes equidistantly on a circle of radius 1/(2 sin(kπ/n)). Otherwise, use any circle of radius at most 1/2.

3 Figure 11.1: R16 . Realized as UDG k is controlled by the scaling.

Our bound will be inferred from a classic result by Linial [70], which was later generalized to randomized algorithms by Naor [89]. Theorem 11.3. There is no deterministic distributed algorithm 3-coloring the ring Rn in fewer than 21 (log∗ n − 1) communication rounds, even if the nodes know n. Proof. See e.g. [91].

11.2. PROOF OF THE LOWER BOUND

143

We will use the following notion, which captures the amount of symmetry breaking information in the output of an algorithm. Definition 11.4 (σ(n)-Alternating Algorithms). Suppose A is an algorithm operating on Rn which assigns each node i ∈ Vn a binary output b(i) ∈ {0, 1}. We call A σ(n)-alternating, if the length ` of any monochromatic sequence b(i) = b(i + 1) = . . . = b(i + `) in the output of A is smaller than σ(n). If a σ(n)-alternating algorithm is given, one can easily obtain a 3-coloring of the ring Rn in O(σ(n)) time. Lemma 11.5. Given a σ(n)-alternating algorithm A running in O(σ(n)) rounds, a 3-coloring of the ring can be computed in O(σ(n)) rounds. Proof Sketch. Essentially, nodes simply need to find the closest switch from 0 to 1 (or vice versa) in the output bits. From there, nodes are colored alternatingly, while the third color is used to resolve conflicts where the alternating sequences meet. One has to respect the fact that nodes do not agree on “left” and “right”, though. See [65] for details.

11.2

Proof of the Lower Bound

To establish our lower bound, we construct a σ(n)-alternating algorithm using an MDS approximation algorithm. Lemma 11.6. Assume a deterministic f (n) approximation algorithm A for the MDS problem on UDG’s running in at most g(n) ≥ 1 rounds is given, where f (n)g(n) ∈ o(log∗ n). Then an o(log∗ n)-alternating algorithm A0 requiring o(log∗ n) communication rounds exists. k Proof. Assume w.l.o.g. that identifiers on Rn are from {1, . . . , n}. Consider k Rn and label each node i ∈ Vn with its input l(i), i.e., its identifier. Set σk (n) := max{f (n), k}g(n) and define n o n Lkn := l1 , . . . , lσk (n)+2kg(n) ∈ {1, . . . , n}σk (n)+2kg(n) | li 6= lj ∀i 6= j | l(1) = l1 ∧ . . . ∧ l(σk (n) + 2kg(n)) = lσk (n)+2kg(n) o k ⇒ b(kg(n) + 1) = . . . = b(σk (n) + kg(n)) = 1 on Rn ,

i.e., the set of sequences of identifiers such that σk (n) consecutive nodes will k take the decision b(v) = 1 when A is executed on Rn , where the choices of the leading and trailing kg(n) nodes may also depend on labels not in the considered sequence. As the decision of any node i ∈ Vn depends on (kg(n)) identifiers of nodes in Ni only because it cannot learn from information k further away in g(n) rounds of communication on Rn , Lkn is well-defined.

144

CHAPTER 11. AN MDS APPROXIMATION LOWER BOUND

We distinguish two cases. The first case assumes that values k0 , n0 ∈ N exist, such that for n ≥ n0 at most n/2 identifiers can simultaneously k0 participate in disjoint sequences from Lkn0 in a valid labeling of Rn (where no identifier is used twice). Thus, for n0 := max{n0 , 2n}, an injective mapping λn : {1, . . . , n} → {1, . . . , n0 } exists such that no element of Lkn00 is completely contained in the image of λn . Therefore, we can define A0 to simulate a run k0 of A on Rn 0 where node i ∈ {1, . . . , n} is labeled by λn (l(i)) and return the computed result. Each simulated round of A will require k0 communication rounds, thus the running time of A0 is bounded by k0 g(n0 ) ∈ o(log∗ n). At most 2k0 consecutive nodes will compute b(i) = 0, as A determines a DS, and by definition of Lkn00 at most σk0 (n0 ) − 1 ∈ O(f (n0 )g(n0 )) ⊂ o(log∗ (n0 )) = o(log∗ n) consecutive nodes take the decision b(i) = 1. Hence A0 is o(log∗ n)alternating. In the second case, no pair k0 , n0 ∈ N as assumed in the first case exists. Thus, for any k ∈ N some n ∈ N exists for which we can construct a labeling k of Rn with at least n2 many identifiers from disjoint sequences in Lkn . We line up these sequences one after another and label the remaining nodes in a k way resulting in a valid labeling of Rn . Running A on such an instance will yield at least nσk (n) n ≥ ∈ Ω(n) 2(σk (n) + 2kg(n)) 6 nodes choosing b(i) = 1. k On the other hand, a minimum dominating set of Rn has O(n/k) nodes. For k ∈ N, define that nk is the minimum value of n for which it is possible k to construct a labeling of Rn with n2 identifiers from sequences in Lkn . Thus, we have a lower bound of f (nk ) ∈ Ω(k) (11.1) on the approximation ratio of A. As the approximation quality f of A is sublinear, we conclude that limk→∞ nk = ∞. Therefore, a minimum value k(n) exists such that n0 := 2n < nk(n) . Consequently, we can define an injective relabeling function k(n) λn : {1, . . . , n} → {1, . . . , n0 }, such that no element of Ln0 lies completely 0 in the image of λn . We define A to be the algorithm operating on Rn , k(n) but returning the result of a simulated run of A on Rn0 , where we relabel all nodes i ∈ Vn by λn (l(i)). By definition of k(n) we have nk(n)−1 ≤ n0 . Together with (11.1) this yields k(n) = (k(n) − 1) + 1 ∈ O(f (nk(n)−1 ) + 1) ⊆ O(f (n0 )) = O(f (n)), (11.2) where the last step exploits the fact that f grows asymptotically sublinearly. Hence we can estimate the running time of A0 by k(n)g(n0 ) ∈ O(f (n)g(n)), using that g grows asymptotically sublinearly as well.

11.2. PROOF OF THE LOWER BOUND

145

Since the simulated run of A yields a dominating set, at worst 2k(n) ∈ O(f (n)) ⊆ O(f (n)g(n)) consecutive nodes may compute b(v) = 0. By the definitions of Lkn and λn at most σk(n) (n0 ) − 1 < max{f (n0 ), k(n)}g(n0 ) ∈ O(f (n)g(n)) consecutive nodes may take the decision b(i) = 1. Thus A0 is o(log∗ n)-alternating, as claimed. This result implies the lower bound, as the assumption that a good approximation ratio is possible leads to the contradiction that the ring could be 3-colored quickly. Theorem 11.7. No deterministic f (n) MDS approximation on UDG’s running in at most g(n) rounds exists such that f (n)g(n) ∈ o(log∗ n). Proof. Assuming the contrary, we may w.l.o.g. assume that g(n) ≥ 1 for all n ∈ N. Thus, we can combine Lemma 11.6 and Lemma 11.5 to construct an algorithm that 3-colors the ring in o(log∗ n) rounds, contradicting Theorem 11.3. By the same technique, we can show that on the ring, an f (n) MaxIS approximation that takes g(n) rounds with f (n)g(n) ∈ o(log∗ n) is impossible. Definition 11.8 (Maximum Independent Set Approximations). Given f ∈ R+ , an IS I is an f MaxIS approximation, if f |I| ≥ |M | for any MaxIS M . For f : N → R+ , a deterministic f approximation algorithm for the MaxIS problem outputs on any graph of n nodes an IS that is an f (n) approximation. Corollary 11.9. No deterministic f (n) MaxIS approximation on the ring running in at most g(n) rounds exists such that f (n)g(n) ∈ o(log∗ n). Proof. See [65]. We remark that it is an open question whether a randomized algorithm can break this barrier for MDS approximations. For the MaxIS problem it is known that in planar graphs (and thus in particular on the ring), for any fixed constant ε > 0 a constant-time randomized algorithm can guarantee a 1 + ε approximation w.h.p. [25].

Chapter 12

Minimum Dominating Set Approximations in Graphs of Bounded Arboricity “Be greedy!” – An approach common in computer science, and maybe too common in other areas. This chapter presents two MDS approximation algorithms from [66] devised for graphs of small arboricity A. The first algorithm employs a forest decomposition, achieving a guaranteed approximation ratio of O(A2 ) within O(log n) rounds w.h.p. The second computes an O(A log ∆) approximation deterministically in O(log ∆) rounds. Both algorithms require small messages only.

12.1

Constant-Factor Approximation

In this section, we present an algorithm that computes a dominating set at most a factor of O(A2 ) larger than optimum. After presenting the algorithm and its key ideas, we proceed with the formal proof of its properties.

Algorithm Our first algorithm is based on the following observations. Given an f -forest decomposition and an MDS M , the nodes can be partitioned into two sets. One set contains the nodes which are covered by a parent, the other contains the remaining nodes, which thus are themselves in M or have a child in M . Since each dominating set node can cover at most f parents, the latter set contains in total at most (f + 1)|M | many nodes. Even if all these covered

148

CHAPTER 12. MDS IN GRAPHS OF BOUNDED ARBORICITY

nodes elect all of their parents into the dominating set, we have chosen at most f (f + 1)|M | nodes. For the first set, we can exploit the fact that each node has at most f parents in a more subtle manner. Covering the nodes in this set by parents only, we need to solve a special case of set cover where each element is part of at most f sets. Such instances can be approximated well by a simple sequential greedy algorithm: Pick any element that is not yet covered and add all sets containing it; repeat this until no element remains. Since in each step we add at least one new set from an optimal solution, we get a factor f approximation. This strategy can be parallelized by computing a maximal independent set in the graph where two nodes are adjacent exactly if they share a parent, as adding the parents of the nodes in an independent set in any order would be a feasible execution of the sequential greedy algorithm. Putting these two observations together, first all parents of nodes from a maximal independent set in a helper graph are elected into the dominating set. In this helper graph, two nodes are adjacent if they share a parent. Afterwards, the remaining uncovered nodes have no parents, therefore it is uncritical with respect to the approximation ratio to select them all. Denoting for v ∈ V by P (v) the set of parents of v in a given forest decomposition of G, this approach is summarized in Algorithm 12.1. Algorithm 12.1: Parent Dominating Set input : f -forest decomposition of G output: dominating set D V P (v) ∩ P (w) 6= ∅ 1 H := V, {v, w} ∈ 2 2 Compute a maximal independent set I with respect to H S 3 D := v∈I P (v) + 4 D := D ∪ (V \ ND )

Analysis We need to bound the number of nodes that join the dominating set because they are elected by children. Lemma 12.1. In Line 3 of Algorithm 12.1, at most f (f +2)|M | many nodes enter D, where M denotes an MDS of G. Proof. Denote by VC ⊆ V the set of nodes that have a child in M or are themselves in M . We have that |VC | ≤ (f + 1)|M |, since no node has more than f parents. Each such node adds at most f parents to D in Line 3 of

12.1. CONSTANT-FACTOR APPROXIMATION

149

the algorithm, i.e., in total at most f (f + 1)|M | many nodes join D because they are elected by children in I ∩ VC . Now consider the set of nodes VP ⊆ V that have at least one parent in M , in particular the nodes in I ∩ VP that are also in the computed independent set. By the definition of H and the fact that I is an independent set, no node in M can have two children in I. Thus, |I ∩ VP | ≤ |M |. Since no node has more than f parents, we conclude that at most f |M | many nodes join |D| because they are elected into the set by a child in I ∩ VP . Finally, observe that since M is a dominating set, we have that VC ∪VP = V and thus |D| ≤ f |I ∩ VC | + f |I ∩ VP | ≤ f (f + 1)|M | + f |M | = f (f + 2)|M |, concluding the proof. The approximation ratio of the algorithm now follows easily. Theorem 12.2. Algorithm 12.1 outputs a dominating set D containing at most (f 2 + 3f + 1)|M | nodes, where M is an optimum solution. Proof. By Lemma 12.1, at most f (f + 2)|M | nodes enter D in Line 3 of the algorithm. Since I is an MIS in H, all nodes that have a parent are adjacent to at least one node in D after Line 3. Hence, the nodes selected in Line 4 must be covered by a child in M or themselves be in M . As no node has more than f parents, thus in Line 4 at most (f + 1)|M | many nodes join D. Altogether, at most (f 2 + 3f + 1)|M | many nodes may end up in D as claimed. Employing known algorithms for computing an O(G(A))-forest decomposition and an MIS, we can construct a distributed MDS approximation algorithm. Corollary 12.3. In any graph G, a factor O(A(G)2 ) approximation to an MDS can be computed distributedly in O(log n) rounds w.h.p., provided that nodes know a polynomial upper bound on n, or a linear upper bound on A(G). In particular, on graphs of bounded arboricity a constant-factor approximation can be obtained in O(log n) rounds w.h.p. This can be accomplished with messages of size O(log n). Proof. We run Algorithm 12.1 in a distributed fashion. To see that this is possible, observe that (i) nodes need to know whether a neighbor is a parent or a child only, (ii) that H can be constructed locally in two rounds and (iii) a synchronous round in H can be simulated by two rounds in G. Thus, we may simply pick distributed algorithms to compute a forest decomposition

150

CHAPTER 12. MDS IN GRAPHS OF BOUNDED ARBORICITY

of G and an MIS and plug them together to obtain a distributed variant of Algorithm 12.1. For the forest decomposition, we employ the algorithm from [8], yielding a decomposition into O(A(G)) forests in O(log n) rounds; this algorithm is the one that requires the bound on n or A(G), respectively, that is asked for in the preliminaries of the corollary. An MIS can be computed in O(log n) rounds w.h.p. by well-known algorithms [2, 43, 73], or a more recent similar technique [80]. In total the algorithm requires O(log n) rounds w.h.p. and according to Theorem 12.2 the approximation guarantee is O(A(G)2 ). Regarding the message size, the algorithm to compute a forest decomposition requires messages of O(log n) bits. Thus, we need to check that we do not require large messages because we compute an MIS on H. Formulated abstractly, the algorithm from [80] breaks symmetry by making each node still eligible for entering the IS choosing a random value in each round and permitting it to join the IS if its value is a local minimum. This concept can for instance be realized by taking O(log n) random bits as encoding of some number and comparing it to neighbors. The respective values will differ w.h.p. This approach can be emulated using messages of size O(log n) in G: Nodes send their random values to all parents in the forest decomposition, which then forward the smallest values only to their children.1 We remark that the same approach can be used by a centralized algorithm in order to compute an O(A(G)2 ) approximation within O(A(G)n) steps. A sequential algorithm does not need to incur an overhead of O(log n), as a forest decomposition can be determined in linear time and finding an MIS becomes trivial. Corollary 12.4. Deterministically, on any graph G an O(A(G)2 ) MDS approximation can be computed in O(|E| + n) ⊆ O(A(G)n) centralized steps. Proof. See [66].

12.2

Uniform Deterministic Algorithm

Algorithm 12.1 might be unsatisfactory with regard to several aspects. Its running time is logarithmic in n even if the maximum degree ∆ is small. This cannot be improved upon by any approach that utilizes a forest decomposition, as a lower bound of Ω(log n/ log f ) is known on the time to compute a forest decomposition into f forests [8]. The algorithm is not uniform, as it necessitates global knowledge of a bound on A(G) or n. 1 If (an upper bound on) n is not known, one can start with constantly many bits and double the number of used bits in each round where two nodes pick the same value. This will not slow down the algorithm significantly and bound message size by O(log n) w.h.p.

12.2. UNIFORM DETERMINISTIC ALGORITHM

151

Moreover, the algorithm requires randomization in order to compute an MIS quickly. Considering deterministic algorithms, one might pose the question how much initial symmetry breaking information needs to be provided to the nodes. While randomized algorithms may generate unique identifiers of size O(log n) in constant time w.h.p., many deterministic algorithms assume them to be given as input. Milder assumptions are the ability to distinguish neighbors by means of a port numbering and/or an initially given orientation of the edges. In this section, we show that a uniform, deterministic algorithm exists that requires a port numbering only, yet achieves a running time of O(log ∆) and a good approximation ratio. The size of the computed dominating set is bounded linearly in the product of the arboricity A(G) of the graph and the logarithm of the maximum degree ∆.

Algorithm The basic idea of Algorithm Greedy-by-Degree (Algorithm 12.2) is that it is always feasible to choose nodes of high residual degree simultaneously, i.e., all the nodes that cover up to a constant factor as many nodes as the one covering the most uncovered nodes. Definition 12.5 (Residual Degree). Given a set D ⊆ V , the residual degree + of node v ∈ V with respect to D is δ¯v := |Nv+ \ ND |. This permits to obtain strong approximation guarantees without the structural information provided by knowing A(G) or a forest decomposition; the mere fact that the graph must be “locally sparse” enforces that if many nodes are elected into the set, also the dominating set must be large. A difficulty arising from this approach is that nodes are not aware of the current maximum residual degree in the graph. Hence, every node checks whether there is a node in its 2-hop neighborhood having a residual degree larger by a factor two. If not, the respective nodes may join the dominating set (even if their degree is not large from a global perspective), implying that the maximum residual degree drops by a factor of two in a constant number of rounds. A second problem occurs once residual degrees become small. In fact, it may happen that a huge number of already covered nodes can each cover the same small set of A(G) − 1 nodes. For this reason, it is mandatory to ensure that not more nodes join the dominating set than actually need to be covered. To this end, nodes that still need to be covered elect one of their neighbors (if any) that are feasible according to the criterion of (locally) large residual degree explained above. This scheme is described in Algorithm 12.2. Note that nodes never leave D once they entered it. Thus, nodes may terminate based on local knowledge only when executing the algorithm, as

152

CHAPTER 12. MDS IN GRAPHS OF BOUNDED ARBORICITY

Algorithm 12.2: Greedy-by-Degree. output: dominating set D 1 D := ∅ + 2 while V 6= ND do 3 C := ∅ // candidate set 4 for v ∈ V in parallel do + 5 δ¯v := |Nv+ \ ND | // residual degree 6 ∆v := maxw∈Nv+ {δ¯w } // maximum within one hop 7 ∆v := maxw∈Nv+ {∆w } // maximum within two hops 8 if dlog δ¯v e ≥ dlog ∆v e then 9 C := C ∪ {v} 10 end + 11 if v ∈ NC+ \ ND then 12 choose any w ∈ C ∩ Nv+ // e.g. smallest port number 13 D := D ∪ {w} // uncovered nodes select a candidate 14 end 15 end 16 end

they can cease executing the algorithm as soon as δ¯v = 0, i.e., their entire inclusive neighborhood is covered by D. Moreover, it can easily be verified that one iteration of the loop can be executed within six rounds by a local algorithm that relies on port numbers only.

Analysis In the sequel, when we talk of a phase of Algorithm 12.2, we refer to a complete execution of the while loop. We start by proving that not too many nodes with small residual degrees enter D. Lemma 12.6. Denote by M an MDS of G. During the execution of Algorithm 12.2, in total at most 16A(G)|M | nodes join D in Line 13 of the algorithm after computing δ¯v ≤ 8A(G) in Line 5 of the same phase. Proof. Fix a phase of the algorithm. Consider the set S consisting of all nodes v ∈ V that become covered in some phase by some node w ∈ Nv+ that computes δ¯w ≤ 8A(G) and joins D. As according to Line 8 nodes join D subject to the condition that residual degrees throughout their 2-hop neighborhoods are less than twice as large as their own, no node m ∈ M can cover more than 16A(G) nodes from S. Hence, |S| ≤ 16A(G)|M |. Because

12.2. UNIFORM DETERMINISTIC ALGORITHM

153

of the rule that a node needs to be elected by a covered node in order to enter D, this is also a bound on the number of nodes joining D in a phase when they have residual degree at most 8A(G). Next, we show that in each phase, at most a constant factor more nodes of large residual degree are chosen than are in an MDS. Lemma 12.7. If M is an MDS, in each phase of Algorithm 12.2 at most 16A(G)|M | nodes v ∈ V that compute δ¯v > 8A(G) in Line 5 join D in Line 13. Proof. Fix some phase of the algorithm and denote by D0 the set of nodes v ∈ V joining D in Line 13 of this phase after computing δ¯v > 8A(G). Define V 0 to be the set of nodes that had not been covered at the beginning of the phase. Define for i ∈ {0, . . . , dlog ne} that Mi

:=

Vi

:=

{v ∈ M | δ¯v ∈ (2i−1 , 2i ]} ( ) 0 i−1 i ¯ v ∈ V max {δw } ∈ (2 , 2 ] + w∈Nv

Di

:=

0

{v ∈ D | δ¯v ∈ (2i−1 , 2i ]}.

Sdlog ne Note that i=dlog 8A(G)e Di = D0 . Consider any j ∈ {dlog 8A(G)e, . . . , dlog ne}. By definition, nodes in Vj may be covered by nodes from Mi for i ≤ j only. Thus, j X

2i |Mi | ≥ |Vj |.

i=0

S Nodes v ∈ Dj cover at least 2j−1 +1 nodes from the set i∈{j,...,dlog ne} Vi , as by definition they have no neighbors in Vi for i < j. On the other hand, Lines 5 to 8 of the algorithm impose that these nodes must not have any ¯ neighbors of residual degree larger than 2dlog δv e = 2j , i.e., these nodes cannot be in a set Vi for i > j. Hence, each node v ∈ Dj has at least 2j−1 neighbors in Vj . This observation implies that the subgraph induced by Dj ∪ Vj has at least 2j−2 |Dj | ≥ 2A(G)|Dj | edges. On the other hand, by definition of the arboricity, this subgraph has fewer than A(G)(|Dj | + |Vj |) edges. It follows that |Dj | <

j X A(G)|Vj | ≤ 23−j A(G)|Vj | ≤ 23−j A(G) 2i |Mi |. − A(G) i=0

2j−2

154

CHAPTER 12. MDS IN GRAPHS OF BOUNDED ARBORICITY

We conclude that dlog ne

|D0 |

=

X

|Dj |

j=dlog 8A(G)e dlog ne

≤

X

23−j A(G)

dlog ne

j=0

<

8A(G)

j

X X

8A(G)

2i−j |Mi |

i=0

dlog ne ∞ X X i=0

2i |Mi |

i=0

j=dlog 8A(G)e

≤

j X

2i−j |Mi |

j=i

dlog ne

=

16A(G)

X

|Mi |

i=0

≤

16A(G)|M |,

as claimed. We now can bound the approximation quality of the algorithm. Theorem 12.8. Algorithm 12.2 terminates within 6dlog(∆ + 1)e rounds and outputs a dominating set that is at most a factor 16A(G) log ∆ larger than optimum. The message size can be bounded by O(log log ∆). Proof. We first examine the running time of the algorithm. Denote by ∆(i) the maximum residual degree after the ith phase, i.e., ∆(0) = ∆ + 1 (as a node also covers itself). As observed earlier, each phase of Algorithm 12.2 takes six rounds. Because all nodes v computing a δ¯v satisfying dlog δ¯v e = dlog ∆(i)e join C in phase i and any node in NC+ becomes covered, we have that dlog ∆(i + 1)e ≤ dlog ∆(i)e − 1 for all phases i. Since the algorithm terminates at the end of the subsequent phase once ∆(i) ≤ 2, in total at most dlog ∆(0)e = dlog(∆ + 1)e phases are required. Having established the bound on the running time of the algorithm, its approximation ratio directly follows2 by applying Lemmas 12.6 and 12.7. The bound on the message size follows from the observation that in each phase nodes need to exchange residual degrees rounded to powers of two and a constant number of binary values only. Like it is possible for the MDS approximation algorithm for general graphs from [56], we can sacrifice accuracy in order to speed up the computation. 2

Note that in the last three phases the maximum degree is at most 8 ≤ 8A(G).

12.2. UNIFORM DETERMINISTIC ALGORITHM

155

Corollary 12.9. For any integer α ≥ 2, Algorithm 12.2 can be modified such that it has a running time of O(logα ∆) and approximation ratio O(A(G)α logα ∆). The size of messages becomes O(log logα ∆) with this modification. Proof. We simply change the base of the logarithms in Line 8 of the algorithm, i.e., instead of rounding residual degrees to integer powers of two, we round to integer powers of α. Naturally, this affects the approximation guarantees linearly. In the proof of Lemma 12.7, we just replace the respective powers of two by powers of α as well, yielding a bound of O(A(G)α logα ∆) on the approximation ratio by the same reasoning as in Theorem 12.8. Similarly, the bound on the message size becomes O(log logα ∆). If it was not for the computation of an MIS, we could speed up Algorithm 12.1 in almost the same manner (accepting a forest decomposition into a larger number of forests). However, the constructed helper graph is of bounded independence, but not arboricity or growth. For this graph class currently no distributed algorithm computing an MIS in time o(log n) is known. Finally, we would like to mention that if nodes know A(G) (or a reasonable upper bound), a port numbering is not required anymore. In this case, nodes will join D without the necessity of being elected by a neighbor, however only if the prerequisite δ¯v > 8A(G) is satisfied. To complete the dominating set, uncovered nodes may join D independently of δ¯v once their neighborhood contains no more nodes of residual degree larger than 8A(G). It is not hard to see that with this modification, essentially the same analysis as for Algorithm 12.2 applies, both with regard to time complexity and approximation ratio.

Chapter 13

Minimum Dominating Set Approximations in Planar Graphs “Well, it’s a constant.” – Jukka Suomela on the approximation ratio of the algorithm presented in this chapter.

In this chapter, which is based on [62], we introduce an algorithm computing a constant approximation of a minimum dominating set in planar graphs in constant time.1 Assuming maximum degree ∆ and identifiers of size O(log n), the algorithm makes use of messages of size O(∆ log n). As planar graphs exhibit unbounded degree, the algorithm is thus not suitable for practice. Moreover, the constant in the approximation ratio is 130, i.e., there is a large gap to the lower bound of 5 − ε (for any constant ε > 0). Nevertheless, we demonstrate that in planar graphs in principle it is feasible to obtain a constant MDS approximation in a constant number of distributed rounds.

13.1

Algorithm

The key idea of the algorithm is to exploit planarity in two ways. On the one hand, planar graphs have arboricity three, i.e., the number of edges of any subgraph is linear in its number of nodes. What is more, as planarity is preserved under taking minors, so does any minor of the graph. On the other hand, in a planar graph circles are barriers separating parts of the graph from 1 Note that the original paper [61] contained an error and the stated algorithm does not compute a constant MDS approximation. Moreover, the proof of the algorithm from [25] is incomplete.

158

CHAPTER 13. MDS IN PLANAR GRAPHS

Algorithm 13.1: MDS Approximation in Planar Graphs output: DS D of G 1 D := ∅ 2 for v ∈ V in parallel do (2) 3 if @A ⊆ Nv \ {v} such that Nv ⊆ NA+ and |A| ≤ 6 then 4 D := D ∪ {v} 5 end 6 end 7 for v ∈ V in parallel do + | // residual degree 8 δ¯v := |Nv+ \ ND + 9 if v ∈ V \ ND then 10 ∆v := maxw∈Nv+ {δ¯w } // maximum within one hop 11 choose any d(v) ∈ {w ∈ Nv+ | δ¯w = ∆v } 12 D := D ∪ {d(v)} 13 end 14 end

others; any node enclosed in a circle cannot cover nodes on the outside. This is a very strong structural property enforcing that dominating sets are either large or exhibit a simple structure. It will become clear in the analysis how these properties are utilized by the algorithm. The algorithm consists of two main steps. In the first step all nodes check whether their neighborhood can be covered by six or less other nodes. After learning about their two-hop neighborhood in two rounds, nodes can decide this locally by means of a polynomial-time algorithm.2 If this is not the case, they join the (future) dominating set. In the second step, any node that is not yet covered elects a neighbor of maximal residual degree (i.e., one that covers the most uncovered nodes, see Definition 12.5) into the set. Algorithm 13.1 summarizes this scheme.

13.2

Analysis

As evident from the description of the algorithm, it can be executed in six rounds and, due to the second step, computes a dominating set. Therefore, we need to bound the number of nodes selected in each step in terms of the size of a minimum dominating set M of the planar graph G. For the purpose of our analysis, we fix some MDS M of G. By D1 and D2 we denote the sets of 2 Trivially, one can try all combinations of six nodes. Note, however, that planarity permits more efficient solutions.

13.2. ANALYSIS

159

nodes that enter D in the first and second step of the algorithm, respectively. Moreover, we denote neighborhoods in a graph H 6= G by Nv (H), NA+ (H), etc. We begin by bounding the number of nodes in D1 \ M after the first step. Lemma 13.1. |D1 \ M | < 3|M |. Proof. We construct the following subgraph H = (VH , EH ) of G (see Figure 13.1). + ∪ M and EH := ∅. • Set VH := ND 1 \M

• Add all edges with an endpoint in D1 \ M to EH . + • Add a minimal subset of edges from E to EH such that VH = NM (H), i.e., M is a DS in H.

Thus, each node v ∈ VH \ (D1 ∪ M ) has exactly one neighbor m ∈ M , as we added a minimal number of edges for M to cover VH . For all such nodes v, we contract the edge {v, m}, where we identify the resulting node with m. + In other words, the star subgraph of H induced by Nm (H) \ D1 is collapsed ¯ into m. By Lemma 2.24, the resulting minor H = (VH¯ , EH¯ ) of G satisfies ¯ induced by that |EH¯ | < 3|VH¯ |. Due to the same lemma, the subgraph of H D1 \ M has fewer than 3|D1 \ M | edges. As the neighborhood of a node from D1 \ M ⊂ VH¯ cannot be covered by fewer than seven nodes, the performed edge contractions did not reduce the degree of such a node below seven. Altogether, we get that

<

7|D1 \ M | − 3|D1 \ M | X ¯ − {d, d0 } ∈ EH¯ | d, d0 ∈ D1 \ M δd (H)

≤

|EH¯ |

<

3|VH¯ |

≤

3(|D1 \ M | + |M |),

d∈D1 \M

which can be rearranged to yield the claimed bound. To bound the number of nodes |D2 | that is chosen in the second step of the algorithm, more effort is required. We consider the following subgraph of G. Definition 13.2. We define H = (VH , EH ) to be the subgraph of G obtained from the following construction. • Set VH := ∅ and EH := ∅.

160

CHAPTER 13. MDS IN PLANAR GRAPHS

d ∈ D1 \ M

Nd ≥ 7 nodes from M

Figure 13.1: Part of the subgraph constructed in Lemma 13.1.

• For each node d ∈ D2 for which this is possible, add one node v ∈ V \M to VH such that d = d(v) in Line 11 of the algorithm. • Add M \ D1 to VH and a minimal number of edges to EH such that + NM \D1 (H) = VH , i.e., M \ D1 covers the nodes added to H so far + (this is possible as only nodes from V \ ND elect nodes into D2 ). 1 • For each m ∈ M \ D1 , add a minimal number of nodes and edges to H such that there is a set Cm ⊆ VH \ {m} of minimal size satisfying Nm (H) ⊆ NC+m (H), i.e., Cm covers m’s neighbors in H. We define that C := ∪m∈M \D1 Cm . • Remove all nodes v ∈ VH \ (C ∪ M ) for which d(v) ∈ M ∪ C. • For each m ∈ M \ D1 , remove all edges to Cm . See Figure 13.2 for an illustration. In order to derive our bound on |D2 |, we consider a special case first. Lemma 13.3. Assume that for each node m ∈ M \ D1 it holds that (i) no node m0 ∈ M ∩ Cm covers more than seven nodes in Nm (H) and (ii) no node v ∈ Cm \ M covers more than four nodes in Nm (H). Then it holds that |D2 | < 98|M |. Proof. Denote by A1 ⊆ VH \ (M ∪ C) the nodes in VH that elect others into D2 and have two neighbors in M , i.e., when we added C to VH , they became covered by a node in M ∩ C. Analogously, denote by A2 ⊆ VH \ (M ∪ C) the set of electing nodes for which the neighbor in C is not in M . Observe that

13.2. ANALYSIS

161

A := A1 ∪ A2 = VH \ (M ∪ C) and A1 ∩ A2 = ∅. Moreover, we claim that |A| ≥ |D2 |−14|M |. To see this, recall that in the first step of the construction of H, we choose for each element of |D2 | that is not elected by elements of M only one voting node v, i.e., at least |D2 | − |M | nodes in total. In the second last step of the construction, we remove v if d(v) ∈ {m}∪Cm for some m ∈ M \ D1 . As m ∈ M \ D1 , its neighborhood can be covered by six or less nodes from V \ {m}. Therefore |Cm | ≤ 6 for all M \ D1 and we remove in total at most 7|M | nodes in the second last step. Finally, in the last step we cut off at most |C| ≤ 6|M | voting nodes from their dominators in M \ D1 . The definition of A explicitly excludes these nodes, hence |A| ≥ |D2 |−14|M |. We contract all edges from nodes a ∈ A to the respective nodes m ∈ M \ D1 covering them we added in the third step of the construction of H. ¯ = (VH¯ , EH¯ ). For every seven nodes in Denote the resulting minor of G by H A1 , there must be a pair of nodes m, m0 ∈ M \ D1 such that m ∈ Cm0 and vice versa, as by assumption no such pair shares more than seven neighbors. Thus, for every seven nodes in A1 , we have two nodes less in VH¯ than the ¯ thus has fewer upper bound of |VH¯ | ≤ |M | + |C| ≤ 7|M |. By Lemma 2.24, H than 6|A1 | 2|A1 | 3|VH¯ | ≤ 3|M ∪ C| ≤ 3|M | + 3 6|M | − = 21|M | − 7 7 edges. On the other hand, |A1 | |A2 | + , 7 4 as by assumption each pair of nodes from M may share at most seven neighbors in A1 and pairs of nodes m ∈ M \ D1 , v ∈ Cm \ M share at most four neighbors. We conclude that |EH¯ | ≥

|A2 | < 84|M | − 4|A1 | and therefore |D2 | ≤ |A1 | + |A2 | + 14|M | < 98|M | − 3|A1 | ≤ 98|M |. In order to complete our analysis, we need to cope with the case that a node m ∈ M \ D1 and an element of Cm share many neighbors. In a planar graph, this results in a considerable number of nested circles which separate their interior from their outside. This necessitates that nodes from the optimal solution M are enclosed that we may use to compensate for the increased number of nodes in A in comparison to the special case from Lemma 13.3.

162

CHAPTER 13. MDS IN PLANAR GRAPHS

m ∈ M \ D1

Nm (H) ⊆ VH \ (M ∪ C)

Cm with |Cm | ≤ 6

Figure 13.2: Part of the subgraph H from Definition 13.2.

Lemma 13.4. Suppose the subgraph H from Definition 13.2 violates condition (i) or (ii) from Lemma 13.3. Fix a planar embedding of G and consider either (i) nodes m ∈ M \ D1 and v ∈ M ∩ Cm with |Nm (H) ∩ Nv (H)| ≥ 8 or (ii) nodes m ∈ M \ D1 and v ∈ Cm \ M with |Nm (H) ∩ Nv (H)| ≥ 5. Then the outmost circle formed by m, v, and two of their common neighbors in H must enclose some node m0 ∈ M (with respect to the embedding). Proof. Set A˜ := Nm (H) ∩ Nv (H). Consider case (i) first and assume for contradiction that there is no node from M enclosed in the outmost circle. ˜ = 8 (otherwise we simply ignore some W.l.o.g., we may assume that |A| ˜ There are four nodes from A˜ that are enclosed by two nested nodes from A). circles consisting of v, m, and the four nodes that are the outer nodes from A˜ according the embedding (see Figure 13.3). Recall that by the second last step of the construction of H nodes a ∈ A˜ satisfy that d(a) 6∈ {m, v} ⊆ M . Therefore, these enclosed nodes elected (distinct) nodes into D2 that are enclosed by the outmost circle. As the electing nodes a ∈ A˜ are connected to m and v, by Line 11 of the Algorithm the nodes d(a) they elected must have at least residual degree δ¯d(a) ≥ max{δ¯v , δ¯m }. In other words, they cover as + least as many nodes from V \ ND as both m and v. 1 Denote by ` the number of enclosed nodes from G that are neither in A˜ ⊆ + V \ ND nor already covered by D1 . We thus have a subgraph S = (VS , ES ) 1 ˜ + |{v, m}| = ` + 10 nodes and of G that has |VS | = l + |A| |ES |

≥

|Nv (S)| + |Nm (S)| + 4 max{|Nv (S)|, |Nm (S)|} − 18

≥

3(|Nv (S)| + |Nm (S)| − 6)

13.2. ANALYSIS

163

m ∈ M \ D1 a ∈ A˜

A˜ ⊆ V \ M

v ∈ Cm ∩ M

d(a) ∈ D2 \ (M ∪ C)

Figure 13.3: Example of a subgraph considered in the first case of the proof of Lemma 13.4. While the choices d(a) of the two leftmost and rightmost nodes a ∈ A˜ may have large degrees because of nodes outside the outer circle, the choices of the four inner nodes must have many neighbors that are not covered by D1 on or inside the outer circle.

edges, where we subtracted 18 because (i) no edge is required for one of the four nodes to cover itself, (ii) we might have counted 42 = 6 edges between pairs of the four considered nodes d(a) ∈ D2 twice, and (iii) we might have counted up to 8 edges between these four nodes and {v, m} twice. As we made sure that A˜ ∩ M = ∅ by adding nodes from V \ M only to VH in the second construction step in Definition 13.2, the assumption that no other node from M is enclosed by the outmost circle implies that everything inside is covered by {v, m}. Therefore, it holds that ˜ + ` = ` + 16. |Nv (S)| + |Nm (S)| ≥ 2|A| However, Lemma 2.24 lower bounds |VS | in terms of |ES |, yielding 3(` + 10) = 3|VS | > |ES | ≥ 3(|Nv (S)| + |Nm (S)| − 6) ≥ 3(` + 10), a contradiction. Case (ii) is treated similarly, but it is much simpler. This time, the assumption that no node from M is enclosed by the outmost circle implies that all the nodes inside must be covered by m alone, as M is a DS. Since v ˜ for the node d(a) 6∈ and m are connected via the (at least) five nodes in A, ˜ it must hold that {m, v} elected into D2 by the innermost node a ∈ A, + + Nd(a) \ Nm ⊆ {v} (see Figure 13.4). However, there are at least two nodes + in A˜ ⊆ V \ ND that are not connected to d(a), i.e., we get the contradiction 1 that a would have preferred m over d(a) in Line 11 of the algorithm. Next, we repeatedly delete nodes from H until eventually the preconditions of Lemma 13.3 are met. Arguing as in the proof of Lemma 13.4, we can

164

CHAPTER 13. MDS IN PLANAR GRAPHS

m ∈ M \ D1

a ∈ A˜

A˜ ⊆ V \ M

d(a) ∈ D2 \ (M ∪ C) v ∈ Cm \ M Figure 13.4: Example of a subgraph considered in the second case of the proof of Lemma 13.4. Supposing there is no other node m0 ∈ M inside the outer circle, apart from v all neighbors of the node chosen by the innermost node from A˜ must also be neighbors of m.

account for deleted nodes by allocating them to enclosed nodes from M ∪ C. Doing this carefully, we can make sure that no nodes from M ∪ C need to compensate for more than four deleted nodes. Corollary 13.5. |D2 | < 126|M |. Proof. Fix an embedding of G and thus of all its subgraphs. We will argue with respect to this embedding only. We use the notation from the proof of Lemma 13.3. Starting from H, we iteratively delete nodes from A until we obtain a subgraph H 0 satisfying the prerequisites of the lemma. Assume that H 0 := H violates one of the preconditions of Lemma 13.3. No matter which of the conditions (i) and (ii) from Lemma 13.3 is violated, we choose respective nodes m ∈ M \ D1 and v ∈ Cm satisfying precondition (i) or (ii) of Lemma 13.4 such that the smallest circle formed by m, v, and a1 , a2 ∈ A˜ := Nv+ (H 0 ) ∩ Nm (H 0 ) enclosing an element m0 ∈ M has minimal area. We delete the two elements from A˜ ⊆ A participating in the circle. Since the area of the circle is minimal, there is no third element from A˜ enclosed in the circle. We repeat this process until H 0 satisfies the preconditions of Lemma 13.3. We claim that we can account for deleted nodes in terms of nodes from M ∪C in a way such that no element of M ∪ C needs to compensate for more than

13.2. ANALYSIS

165

four deleted nodes. Whenever we delete a pair of nodes, we count a node from M ∪ C enclosed by the respective circle that has not yet been counted twice. We need to show that this is indeed always possible. To see this, observe that the minimality of the enclosed area of a chosen circle X together with the planarity of G ensures that any subsequent circle X 0 either encloses this circle or its enclosed area is disjoint from the one of X. In the latter case, we obviously must find a different node from M ∪ C enclosed in X 0 than the one we used when deleting nodes from X. Hence, we need to examine the case when there are three nested circles X1 , X2 , and X3 that occur in the construction. If the nodes m ∈ M and v ∈ Cm participating in each circle are not always the same, one node from the first such pair becomes enclosed by one of the subsequent circles. Hence, the remaining difficulty is that we could have three such nested circles formed by nodes m ∈ M , v ∈ Cm , and three pairs of nodes from Nm (H) ∩ Nv (H) (see Figure 13.5). Any node chosen by a node a 6∈ {m, v} lying on the outmost circle X3 is separated from nodes enclosed by X1 by X1 . Therefore, nodes m0 ∈ M enclosed by X1 can cover only nodes that are either not adjacent to the nodes from D2 considered in Lemma 13.4 (when applied to H 0 after X1 and X2 already have been removed) or lie on X1 . Since the nodes on X1 are m, v, and two of their shared neighbors in H, we can thus argue analogously to the proof of Lemma 13.4 in order to find a node m00 ∈ M enclosed by X3 , but not enclosed by X1 . Altogether, for each element of M ∪ C we remove at most two times two nodes each from A, i.e., in total no more than 4|M ∪ C| ≤ 28|M | nodes. To the remaining subgraph H 0 , we apply Lemma 13.3, yielding |D2 | < (28 + 98)|M | = 126|M |. Having determined the maximum number of nodes that enter the dominating set in each step, it remains to assemble the results and finally state the approximation ratio our algorithm achieves. Theorem 13.6. |D| < 130|M |. Proof. Combining Lemma 13.1 and Corollary 13.5, we obtain |D| ≤ |M | + |D1 \ M | + |D2 | < (1 + 3 + 126)|M | = 130|M |.

166

CHAPTER 13. MDS IN PLANAR GRAPHS

m ∈ M \ D1

m0 ∈ M

Nm (H) ∩ Nv (H) m∈M

v∈C

Figure 13.5: Example of a sequence of three nested circles as considered in Corollary 13.5. Each pair of two voting nodes involved in a circle is deleted from H 0 after it has been accounted for. Therefore, all neighbors of the two outmost nodes from Nm (H) ∩ Nv (H) are not adjacent to nodes inside the innermost circle.

Chapter 14

Conclusions “I was confused and uncertain about all the little details of life. But now, while I’m still confused and uncertain, it’s on a much higher plane [. . . ]” – Terry Pratchett, Equal Rites.

In this thesis, we examined several coordination problems arising in distributed systems. We believe all presented findings to be of theoretical interest. For this reason, we would like to conclude our exposition with an educated guess on their practical significance. In Part I, we considered the problem of synchronizing clocks in a distributed system. Chapter 4 discussed this topic in the context of wireless networks. We were able to derive and analyze a model whose implications could be validated in practice. PulseSync, the presented algorithm tailored to this model, outperforms FTSP, a commonly used synchronization protocol for sensor networks. Considering that there are ongoing efforts to form an applicable protocol out of the prototypical implementation, we hope to see first systems employing our approach in the medium term. Less clear is the situation for our second algorithm Aµ introduced in Chapter 5. However, several indicators suggest that a potential implementation could be beneficial. Firstly, the algorithm is very robust. It can tolerate worst-case crash failures, is self-stabilizing, and does not depend on a benign behavior of clock drifts or message jitters. Secondly, it has a low complexity. While protocols like PulseSync or FTSP compute a regression line out of a history of recent clock estimates, Aµ follows simpler rules based on its current estimates of neighbors’ clock values. These estimates can be obtained in an arbitrary manner; upper and lower bounds show that Aµ makes best use of the available information up to a small factor. Thirdly, the synchronization

168

CHAPTER 14. CONCLUSIONS

guarantees of Aµ are deterministic, which for smaller systems and/or large time frames carries the advantage of more reliable logical clocks. With these qualities, Aµ might e.g. help providing distributed clock generation at hardware level (cf. [36]). Another canonical application area for Aµ are highly dynamic ad-hoc networks. In such systems, the impossibility of preserving strong connectivity at all times favors an adaptive algorithm that is robust, yet ensures strong skew bounds whenever the topology is benign. In Part II of this thesis, we analyzed the distributed balls-into-bins problem. Clearly, the practical merit of our bound of (1 − o(1)) log∗ n on the time complexity of symmetric algorithms with small maximal bin load and a linear number of messages is negligible. At best, it shows that one should not invest time into the search for a constant-time solution. Similarly, the given algorithms that circumvent the lower bound are of no practical concern, as they are more complex than the simpler symmetric algorithms. In fact, the additional rounds of communication used to ensure a constant running time render these algorithms slower than the symmetric ones for any realistic values1 of n, unless a considerable overhead in message complexity is accepted. In contrast, we assess in particular A2b notable practical relevance. In comparison to previous distributed balls-into-bins algorithms, A2b is minimalistic in terms of complexity, yet achieves a maximal bin load of two in log∗ n + O(1) rounds due to adaptivity. Moreover, the algorithm convinces by its robustness. Since the constant hidden in the additive term of O(1) is small (and can be reduced further by a more careful analysis), we estimate A2b to be well-suited for real-world use. In Part III of our presentation, we looked into the subjects of MDS approximation and MIS computation in various graph families. For applications, it appears questionable whether the MIS algorithm from Chapter 10 is advantageous. On the one hand, it competes with the O(log n) time solutions on general graphs which do not require a complicated coloring subroutine. On the other hand, our analysis yields the asymptotically stronger bound on the running time for forests only. Nevertheless, if tuned well, the algorithm might reduce running times for instances that typically occur in practice, as it might constitute an efficient heuristic also for graphs that are not forests. Like the lower bound from Chapter 7, the log∗ lower bound from Chapter 11 on the product between running time and approximation ratio of MDS approximation algorithms in unit disk graphs is a primarily theoretic statement. Similarly, the algorithm for planar graphs given in Chapter 13 is of purely theoretic interest, as it employs by far too large messages to be feasible in practice. Fascinating—and probably challenging—open questions emerging from this result are whether a constant-time O(1) approximation can be achieved with reasonably sized messages and what the precise approxima1

For log∗ n to be larger than five, n must be at least 265,536 .

169

tion ratio is that can be achieved by distributed constant-time algorithms. Finally, from a practical viewpoint, the algorithms from Chapter 12 are more promising. Algorithm Greedy-by-Degree appeals by its low time complexity and message size. The main advantage of Algorithm Parent Dominating Set is that it achieves a constant approximation ratio on graphs of bounded arboricity. Moreover, both algorithms are considerably less complex to implement than the one from [56], making them an expedient alternative in the quite general class of graphs of small arboricity.

Bibliography [1] M. Adler, S. Chakrabarti, M. Mitzenmacher, and L. Rasmussen. Parallel Randomized Load Balancing. In Proc. 27th Symposium on Theory of Computing (STOC), pages 238–247, 1995. [2] N. Alon, L. Babai, and A. Itai. A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem. Journal of Algorithms, 7(4):567–583, 1986. [3] B. Awerbuch. Complexity of Network Synchronization. Journal of the ACM, 32(4):804–823, 1985. [4] B. Awerbuch, Y. Azar, E. F. Grove, M.-Y. Kao, P. Krishnan, and J. S. Vitter. Load Balancing in the Lp Norm. In Proc. 36th Symposium on Foundations of Computer Science (FOCS), pages 383–391, 1995. [5] B. Awerbuch and M. Sipser. Dynamic Networks are as Fast as Static Networks. In Proc. 29th Symposium on Foundations of Computer Science (FOCS), pages 206–219, 1988. [6] Y. Azar, A. Z. Broder, A. R. Karlin, and E. Upfal. Balanced Allocations. SIAM Journal on Computing, 29(1):180–200, 1999. [7] R. Bar-Yehuda, O. Goldreich, and A. Itai. On the Time-Complexity of Broadcast in Multi-hop Radio Networks: An Exponential Gap Between Determinism and Randomization. Journal of Computer and System Sciences, 45(1):104–126, 1992. [8] L. Barenboim and M. Elkin. Distributed (∆ + 1)-Coloring in Linear (in ∆) Time. In Proc. 41st Symposium on Theory of Computing (STOC), pages 111–120, 2009. [9] L. Barenboim and M. Elkin. Sublogarithmic Distributed MIS algorithm for Sparse Graphs using Nash-Williams Decomposition. Distributed Computing, 22(5–6):363–379, 2009. 171

172

BIBLIOGRAPHY

[10] H. Bast and T. Hagerup. Fast and Reliable Parallel Hashing. In Proc. 3rd Symposium on Parallel Algorithms and Architectures (SPAA), pages 50–61, 1991. [11] M. Ben-Or, D. Dolev, and E. N. Hoch. Fast Self-Stabilizing Byzantine Tolerant Digital Clock Synchronization. In Proc. 27th Symposium on Principles of Distributed Computing (PODC), pages 385–394, 2008. [12] I. Ben-Zvi and Y. Moses. Beyond Lamport’s Happened-Before: On the Role of Time Bounds in Synchronous Systems. In Proc. 24th Symposium on Distributed Computing (DISC), pages 421–436, 2010. [13] P. Berenbrink, T. Friedetzky, L. A. Goldberg, P. W. Goldberg, Z. Hu, and R. Martin. Distributed Selfish Load Balancing. SIAM Journal on Computing, 37(4):1163–1181, 2007. [14] P. Berenbrink, T. Friedetzky, Z. Hu, and R. Martin. On Weighted Balls-into-Bins Games. Theoretical Computer Science, 409(3):511–520, 2008. [15] P. Berenbrink, F. Meyer auf der Heide, and K. Schr¨ oder. Allocating Weighted Jobs in Parallel. In Proc. 9th Symposium on Parallel Algorithms and Architectures (SPAA), pages 302–310, 1997. [16] S. Biaz and J. Welch. Closed Form Bounds for Clock Synchronization Under Simple Uncertainty Assumptions. Information Processing Letters, 80(3):151–157, 2001. [17] I. Chlamtac and S. Kutten. On Broadcasting in Radio Networks – Problem Analysis and Protocol Design. IEEE Transactions on Communications, 33(12):1240–1246, 1985. [18] I. Chlamtac and O. Weinstein. The Wave Expansion Approach to Broadcasting in Multihop Radio Networks. IEEE Transactions on Communications, 39(3):426–433, 1991. [19] R. Cole and U. Vishkin. Deterministic Coin Tossing with Applications to Optimal Parallel List Ranking. Information and Control, 70(1):32– 53, 1986. [20] M. Croucher. Supercomputers Vs Mobile Phones. Walking Randomly, http://www.walkingrandomly.com/?p=2684, June 2010. [21] A. Czumaj and W. Rytter. Broadcasting Algorithms in Radio Networks with Unknown Topology. Journal of Algorithms, 60(2):115–143, 2006.

BIBLIOGRAPHY

173

[22] A. Czumaj and V. Stemann. Randomized Allocation Processes. Random Structures and Algorithms, 18(4):297–331, 2001. [23] A. Czygrinow and M. Ha´ n´ckowiak. Distributed Almost Exact Approximations for Minor-Closed Families. In Proc. 14th European Symposium on Algorithms (ESA), pages 244–255, 2006. [24] A. Czygrinow and M. Hanckowiak. Distributed Approximation Algorithms for Weighted Problems in Minor-Closed Families. In Proc. 13th Computing and Combinatorics Conference (COCOON), pages 515– 525, 2007. [25] A. Czygrinow, M. Ha´ n´ckowiak, and W. Wawrzyniak. Fast Distributed Approximations in Planar Graphs. In Proc. 22nd Symposium on Distributed Computing (DISC), pages 78–92, 2008. [26] E. W. Dijkstra. Self-stabilizing systems in spite of distributed control. Communications of the ACM, 17(11):643–644, 1974. [27] D. Dolev and E. N. Hoch. Byzantine Self-Stabilizing Pulse in a Bounded-Delay Model. In Proc. 9th Conference on Stabilization, Safety, and Security of Distributed Systems (SSS), pages 234–252, 2007. [28] S. Dolev and J. Welch. Self-Stabilizing Clock Synchronization in the Presence of Byzantine Faults. Journal of the ACM, 51:780–799, 2004. [29] D. Dubhashi and D. Ranjan. Balls and Bins: A Study in Negative Dependence. Random Structures and Algorithms, 13:99–124, 1996. [30] J. Elson, L. Girod, and D. Estrin. Fine-Grained Network Time Synchronization Using Reference Broadcasts. In Proc. 5th Symposium on Operating Systems Design and Implementation (OSDI), pages 147–163, 2002. [31] G. Even and M. Medina. Revisiting Randomized Parallel Load Balancing Algorithms. In Proc. 16th Colloquium on Structural Information and Communication Complexity (SIROCCO), pages 209–221, 2009. [32] G. Even and M. Medina. Parallel Randomized Load Balancing: A Lower Bound for a More General Model. In Proc. 36th Conference on Theory and Practice of Computer Science (SOFSEM), pages 358–369, 2010. [33] R. Fan and N. Lynch. Gradient Clock Synchronization. In Proc. 23rd Symposium on Principles of Distributed Computing (PODC), pages 320–327, 2004.

174

BIBLIOGRAPHY

[34] F. Fich and E. Ruppert. Hundreds of Impossibility Results for Distributed Computing. Distributed Computing, 16(2–3):121–163, 2003. [35] M. J. Fischer, N. A. Lynch, and M. Merritt. Easy Impossibility Proofs for Distributed Consensus Problems. In Proc. 4th Symposium on Principles of Distributed Computing (PODC), pages 59–70, 1985. [36] M. F¨ ugger, U. Schmid, G. Fuchs, and G. Kempf. Fault-Tolerant Distributed Clock Generation in VLSI Systems-on-Chip. In Proc. 6th European Dependable Computing Conference (EDCC), pages 87–96, 2006. [37] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York, 1979. [38] J. Gil, F. Meyer auf der Heide, and A. Wigderson. The Tree Model for Hashing: Lower and Upper Bounds. SIAM Journal on Computing, 25(5):936–955, 1996. [39] A. Giridhar and P. Kumar. Distributed Clock Synchronization over Wireless Networks: Algorithms and Analysis. In Proc. 45th Conference on Decision and Control (CDC), pages 4915–4920, 2006. [40] A. Goldberg, S. Plotkin, and G. Shannon. Parallel Symmetry-Breaking in Sparse Graphs. In Proc. 19th Conference on Theory of Computing (STOC), pages 315–324, 1987. [41] G. H. Gonnet. Expected Length of the Longest Probe Sequence in Hash Code Searching. Journal of the ACM, 28(2):289–304, 1981. [42] T. Hagerup. The Log-Star Revolution. In Proc. 9th Symposium on Theoretical Aspects of Computer Science (STACS), pages 259–278, 1992. [43] A. Israeli and A. Itai. A Fast and Simple Randomized Parallel Algorithm for Maximal Matching. Information Processing Letters, 22(2):77– 80, 1986. [44] R. M. Karp, M. Luby, and F. Meyer auf der Heide. Efficient PRAM Simulation on a Distributed Memory Machine. Algorithmica, 16:517– 542, 1996. [45] K. Kenthapadi and R. Panigrahy. Balanced Allocation on Graphs. In Proc. 7th Symposium on Discrete Algorithms (SODA), pages 434–443, 2006.

BIBLIOGRAPHY

175

[46] S. Kim, S. Pakzad, D. Culler, J. Demmel, G. Fenves, S. Glaser, and M. Turon. Health Monitoring of Civil Infrastructures Using Wireless Sensor Networks. In Proc. 6th Conference on Information Processing in Sensor Networks (IPSN), pages 254–263, 2007. [47] R. Kleinberg, G. Piliouras, and E. Tardos. Load Balancing without Regret in the Bulletin Board Model. In Proc. 28th Symposium on Principles of Distributed Computing (PODC), pages 56–62, 2009. [48] E. Koutsoupias, M. Mavronicolas, and P. G. Spirakis. Approximate Equilibria and Ball Fusion. Theory of Computing Systems, 36(6):683– 693, 2003. [49] F. Kuhn. Weak Graph Coloring: Distributed Algorithms and Applications. In In Proc. 21st Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 138–144, 2009. [50] F. Kuhn, C. Lenzen, T. Locher, and R. Oshman. Optimal Gradient Clock Synchronization in Dynamic Networks. In Proc. 29th Symposium on Principles of Distributed Computing (PODC), pages 430–439, 2010. [51] F. Kuhn, C. Lenzen, T. Locher, and R. Oshman. Optimal Gradient Clock Synchronization in Dynamic Networks. Computing Research Repository, abs/1005.2894, 2010. [52] F. Kuhn, T. Locher, and R. Oshman. Gradient Clock Synchronization in Dynamic Networks. In Proc. 21st Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 270–279, 2009. [53] F. Kuhn, T. Moscibroda, and R. Wattenhofer. Unit Disk Graph Approximation. In Proc. 2nd Workshop on Foundations of Mobile Computing (DIALM-POMC), pages 17–23, 2004. [54] F. Kuhn, T. Moscibroda, and R. Wattenhofer. Local Computation: Lower and Upper Bounds. Computing Research Repository, abs/1011.5470, 2010. [55] F. Kuhn and R. Oshman. Gradient Clock Synchronization using Reference Broadcasts. Computing Research Repository, abs/0905.3454, 2009. [56] F. Kuhn and R. Wattenhofer. Constant-Time Distributed Dominating Set Approximation. Distributed Computing, 17(4):303–310, 2005. [57] L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM, 21(7):558–565, 1978.

176

BIBLIOGRAPHY

[58] C. Lenzen, T. Locher, and R. Wattenhofer. Clock Synchronization with Bounded Global and Local Skew. In Proc. 49th Symposium on Foundations of Computer Science (FOCS), pages 509–518, October 2008. [59] C. Lenzen, T. Locher, and R. Wattenhofer. Tight Bounds for Clock Synchronization. In Proc. 28th Symposium on Principles of Distributed Computing (PODC), pages 46–55, August 2009. [60] C. Lenzen, T. Locher, and R. Wattenhofer. Tight Bounds for Clock Synchronization. Journal of the ACM, 57(2), 2010. [61] C. Lenzen, Y. A. Oswald, and R. Wattenhofer. What Can Be Approximated Locally? Case Study: Dominating Sets in Planar Graphs. In Proc. 20th Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 4–54, 2008. [62] C. Lenzen, Y. A. Pignolet, and R. Wattenhofer. What Can Be Approximated Locally? Case Study: Dominating Sets in Planar Graphs. Technical report, ETH Zurich, 2010. ftp.tik.ee.ethz.ch/pub/ publications/TIK-Report-331.pdf. [63] C. Lenzen, P. Sommer, and R. Wattenhofer. Optimal Clock Synchronization in Networks. In Proc. 7th Conference on Embedded Networked Sensor Systems (SenSys), pages 225–238, 2009. [64] C. Lenzen, J. Suomela, and R. Wattenhofer. Local Algorithms: SelfStabilization on Speed. In Proc. 11th Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS), 2009. [65] C. Lenzen and R. Wattenhofer. Leveraging Linial’s Locality Limit. In Proc. 22nd Symposium on Distributed Computing (DISC), pages 394– 407, 2008. [66] C. Lenzen and R. Wattenhofer. Minimum Dominating Set Approximation in Graphs of Bounded Arboricity. In Proc. 24th Symposium on Distributed Computing (DISC), pages 510–524, 2010. [67] C. Lenzen and R. Wattenhofer. MIS on Trees. In Proc. 30th Symposium on Principles of Distributed Computing (PODC), 2011. To appear. [68] C. Lenzen and R. Wattenhofer. Tight Bounds for Parallel Randomized Load Balancing. Computing Research Repository, abs/1102.5425, 2011.

BIBLIOGRAPHY

177

[69] C. Lenzen and R. Wattenhofer. Tight Bounds for Parallel Randomized Load Balancing. In Proc. 43rd Symposium on Theory of Computing (STOC), 2011. To appear. [70] N. Linial. Locality in Distributed Graph Algorithms. SIAM Journal on Computing, 21(1):193–201, 1992. [71] T. Locher. Foundations of Aggregation and Synchronization in Distributed Systems. PhD thesis, ETH Zurich, 2009. Diss. ETH No. 18249. [72] Z. Lotker, B. Patt-Shamir, and D. Peleg. Distributed MST for Constant Diameter Graphs. Distributed Computing, 18(6), 2006. [73] M. Luby. A Simple Parallel Algorithm for the Maximal Independent Set Problem. SIAM Journal on Computing, 15(4):1036–1055, 1986. [74] J. Lundelius and N. Lynch. An Upper and Lower Bound for Clock Synchronization. Information and Control, 62(2/3):190–204, 1984. [75] N. Lynch. A Hundred Impossibility Proofs for Distributed Computing. In Proc. 8th Symposium on Principles of Distributed Computing (PODC), pages 1–28, 1989. [76] M. Marathe, H. Breu, H. B. Hunt III, S. S. Ravi, and D. J. Rosenkrantz. Simple Heuristics for Unit Disk Graphs. Journal of Networks, 25:59–68, 1995. ´ L´edeczi. The Flooding Time [77] M. Mar´ oti, B. Kusy, G. Simon, and A. Synchronization Protocol. In Proc. 2nd Conference on Embedded Networked Sensor Systems (SenSys), pages 39–49, 2004. [78] Y. Matias and U. Vishkin. Converting High Probability into NearlyConstant Time—with Applications to Parallel Hashing. In Proc. 23rd Symposium on Theory of Computing (STOC), pages 307–316, 1991. [79] L. Meier and L. Thiele. Gradient Clock Synchronization in Sensor Networks. Technical report, ETH Zurich, 2005. TIK Report 219. [80] Y. M´etivier, J. M. Robson, N. Saheb Djahromi, and A. Zemmari. An optimal bit complexity randomised distributed MIS algorithm. In Proc. 16th Colloquium on Structural Information and Communication Complexity (SIROCCO), pages 323–337, 2009. [81] F. Meyer auf der Heide, C. Scheideler, and V. Stemann. Exploiting Storage Redundancy to Speed up Randomized Shared Memory Simulations. Theoretical Computer Science, 162(2):245–281, 1996.

178

BIBLIOGRAPHY

[82] M. Mitzenmacher. The Power of Two Choices in Randomized Load Balancing. PhD thesis, University of California, Berkeley, 1996. [83] M. Mitzenmacher. How Useful is Old Information? Technical report, Systems Research Center, Digital Equipment Corporation, 1998. [84] M. Mitzenmacher. On the Analysis of Randomized Load Balancing Schemes. In Proc. 10th Symposium on Parallel Algorithms and Architectures (SPAA), pages 292–301, 1998. [85] M. Mitzenmacher, B. Prabhakar, and D. Shah. Load Balancing with Memory. In Proc. 43th Symposium on Foundations of Computer Science (FOCS), pages 799–808, 2002. [86] M. Mitzenmacher, A. Richa, and R. Sitaraman. Handbook of Randomized Computing, volume 1, chapter The Power of Two Random Choices: A Survey of the Techniques and Results, pages 255–312. Kluwer Academic Publishers, Dordrecht, 2001. [87] T. Moscibroda, P. von Rickenbach, and R. Wattenhofer. Analyzing the Energy-Latency Trade-Off During the Deployment of Sensor Networks. In Proc. 25th Conference on Computer Communications (INFOCOM), pages 2492–2504, 2006. [88] T. Moscibroda and R. Wattenhofer. The Complexity of Connectivity in Wireless Networks. In Proc. 25th Conference on Computer Communications (InfoCom), pages 25–37, 2006. [89] M. Naor. A Lower Bound on Probabilistic Algorithms for Distributive Ring Coloring. SIAM Journal on Discrete Mathematics, 4(3):409–412, 1991. [90] A. Panconesi and A. Srinivasan. On the Complexity of Distributed Network Decomposition. Journal of Algorithms, 20(2):356–374, 1996. [91] D. Peleg. Distributed Computing: A Locality-Sensitive Approach. Society for Industrial and Applied Mathematics, 2000. [92] D. Peleg and V. Rubinovich. A Near-Tight Lower Bound on the Time Complexity of Distributed MST Construction. In Proc. 40th Symposium on Foundations of Computer Science (FOCS), pages 253–261, 1999. [93] Y. Peres, K. Talwar, and U. Wieder. The (1 + β)-Choice Process and Weighted Balls-into-Bins. In Proc. 21th Symposium on Discrete Algorithms (SODA), pages 1613–1619, 2010.

BIBLIOGRAPHY

179

[94] R. Raman. The Power of Collision: Randomized Parallel Algorithms for Chaining and Integer Sorting. In Proc. 10th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), pages 161–175, 1990. [95] R. Raz and S. Safra. A Sub-Constant Error-Probability Low-Degree Test, and a Sub-Constant Error-Probability PCP Characterization of NP. In Proc. 29th Symposium on Theory of Computing (STOC), pages 475–484, 1997. [96] T. Schmid, Z. Charbiwala, Z. Anagnostopoulou, M. B. Srivastava, and P. Dutta. A Case Against Routing-Integrated Time Synchronization. In Proc. 8th Conference on Embedded Networked Sensor Systems (SenSys), pages 267–280, 2010. [97] T. Schmid, P. Dutta, and M. B. Srivastava. High-Resolution, LowPower Time Synchronization an Oxymoron no More. In Proc. 9th Conference on Information Processing in Sensor Networks (IPSN), pages 151–161, 2010. [98] J. Schneider and R. Wattenhofer. A log-star Distributed Maximal Independent Set Algorithm for Growth-Bounded Graphs. In Proc. 27th Symposium on Principles of Distributed Computing (PODC), pages 35–44, 2008. [99] T. K. Srikanth and S. Toueg. Optimal Clock Synchronization. Journal of the ACM, 34(3):626–645, 1987. [100] V. Stemann. Parallel Balanced Allocations. In Proc. 8th Symposium on Parallel Algorithms and Architectures (SPAA), pages 261–269, 1996. [101] A. A. Syed and J. Heidemann. Time Synchronization for High Latency Acoustic Networks. In Proc. 25th Conference on Computer Communications (InfoCom), pages 892–903, 2006. [102] K. Talwar and U. Wieder. Balanced Allocations: The Weighted Case. In Proc. 39th Symposium on Theory of Computing (STOC), pages 256– 265, 2007. [103] B. V¨ ocking. How Asymmetry Helps Load Balancing. Journal of the ACM, 50(4):568–589, 2003. [104] U. Wieder. Balanced Allocations with Heterogenous Bins. In Proc. 19th Symposium on Parallel Algorithms and Architectures (SPAA), pages 188–193, 2007.

Curriculum Vitae June 12, 1982

Born in Kleve, Germany

1988–2001

Primary and secondary education in Goch, Germany

2001–2002

Civilian service, Goch, Germany

2002–2007

Studies in mathematics, University of Bonn, Germany

October 2007

Diploma in mathematics, University of Bonn, Germany

2007–2010

Ph.D. student, research and teaching assistant, Distributed Computing Group, Prof. Roger Wattenhofer, ETH Zurich, Switzerland

January 2011

Ph.D. degree, Distributed Computing Group, ETH Zurich, Switzerland Advisor: Professor Roger Wattenhofer Co-examiners: Professor Danny Dolev, University of Jerusalem, Israel Professor Berthold V¨ ocking, University of Aachen, Germany

Publications The following list comprises the publications2 I co-authored during my time with the Distributed Computing Group at ETH Zurich. 1. Tight Bounds for Parallel Randomized Load Balancing. Christoph Lenzen and Roger Wattenhofer. 43rd Symposium on Theory of Computing (STOC). San Jose, California, USA, June 2011. 2. MIS on Trees. Christoph Lenzen and Roger Wattenhofer. 30th Symposium on Principles of Distributed Computing (PODC). San Jose, California, USA, June 2011. 3. Minimum Dominating Set Approximation in Graphs of Bounded Arboricity. Christoph Lenzen and Roger Wattenhofer. 24th Symposium on Distributed Computing (DISC). Cambridge, Massachusetts, USA, September 2010. 4. Optimal Gradient Clock Synchronization in Dynamic Networks. Fabian Kuhn, Christoph Lenzen, Thomas Locher, and Rotem Oshman. 29th Symposium on Principles of Distributed Computing (PODC). Zurich, Switzerland, July 2010. 5. Brief Announcement: Exponential Speed-Up of Local Algorithms using Non-Local Communication. Christoph Lenzen and Roger Wattenhofer. 29th Symposium on Principles of Distributed Computing (PODC). Zurich, Switzerland, July 2010. 6. Tight Bounds for Clock Synchronization. Christoph Lenzen, Thomas Locher, and Roger Wattenhofer. Journal of the ACM, 57(2). January 2010. 2

Technical reports accompanying conference papers are not listed separately.

7. Clock Synchronization: Open Problems in Theory and Practice. Christoph Lenzen, Thomas Locher, Philipp Sommer, and Roger Wattenhofer. 36th Conference on Current Trends in Theory and Pracˇ tice of Computer Science (SOFSEM). Spindler˚ uv Ml´ yn, Czech Republic, January 2010. 8. A Review of PODC 2009. Keren Censor and Christoph Lenzen. Distributed Computing Column 36, SIGACT News 40(4). December 2009. 9. Local Algorithms: Self-Stabilization on Speed. Christoph Lenzen, Jukka Suomela, and Roger Wattenhofer. 11th Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS). Lyon, France, November 2009. 10. Optimal Clock Synchronization in Networks. Christoph Lenzen, Philipp Sommer, and Roger Wattenhofer. 7th Conference on Embedded Networked Sensor Systems (SenSys). Berkeley, California, USA, November 2009. 11. Tight Bounds for Clock Synchronization. Christoph Lenzen, Thomas Locher, and Roger Wattenhofer. 28th Symposium on Principles of Distributed Computing (PODC). Calgary, Canada, August 2009. 12. Clock Synchronization with Bounded Global and Local Skew. Christoph Lenzen, Thomas Locher, and Roger Wattenhofer. 49th Symposium on Foundations of Computer Science (FOCS). Philadelphia, Pennsylvania, USA, October 2008. 13. Leveraging Linial’s Locality Limit. Christoph Lenzen and Roger Wattenhofer. 22nd Symposium on Distributed Computing (DISC). Arcachon, France, September 2008. 14. What Can Be Approximated Locally? Case Study: Dominating Sets in Planar Graphs. Christoph Lenzen, Yvonne Anne Oswald, and Roger Wattenhofer. 20th Symposium on Parallelism in Algorithms and Architectures (SPAA). Munich, Germany, June 2008.

Series in Distributed Computing edited by Roger Wattenhofer

Vol. 1:

Aaron Zollinger, Networking Unleashed: Routing and Topology Control in Ad Hoc and Sensor Networks. ISBN 3-86628-022-X

Vol. 2:

Fabian Kuhn, The Price of Locality: Exploring the Complexity of Distributed Coordination Primitives. ISBN 3-86628-041-6

Vol. 3:

Thomas Moscibroda, Locality, Scheduling, and Selfishness: Algorithmic Foundations of Highly Decentralized Networks. ISBN 3-86628-104-8

Vol. 4:

Regina O’Dell, Understanding Ad hoc Networks. From Geometry to Mobility. ISBN 3-86628-113-7

Vol. 5:

Keno Albrecht, Mastering Spam. A Multifaceted Approach with the Spamato Filter System. ISBN 3-86628-126-9

Vol. 6:

Stefan Schmid, Dynamics and Cooperation. Algorithmic Challenges in Peer-to-Peer Computing. ISBN 3-86628-205-2

Vol. 7:

Pascal von Rickenbach, Energy-Efficient Data Gathering in Sensor Networks. ISBN 3-86628-228-1

Vol. 8:

Thomas Locher, Foundations of Aggregation and Synchronization in Distributed Systems. ISBN 3-86628-254-0

Vol. 9:

Yvonne Anne Pignolet, Algorithmic Challenges in Wireless Networks. Interference, Energy and Incentives. ISBN 3-86628-265-6

Vol. 10:

Olga Goussevskaia, Computational Complexity and Scheduling Algorithms for Wireless Networks. ISBN 3-86628-279-6

Vol. 11:

Roland Flury, Routing on the Geometry of Wireless Ad Hoc Networks. ISBN 3-86628-280-X

Vol. 12:

Nicolas Burri, Ultra-Low Power Sensor Networks: Development Tools, Design, and Implementation. ISBN 3-86628-379-2

Vol. 13:

Michael Kuhn, Understanding and Organizing User Generated Data. Methods and Applications. ISBN 3-86628-381-4

Hartung-Gorre Verlag, Konstanz − http://www.hartung-gorre.de

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close