The problem statement

Caution

Under construction.

The idea

The idea behind this problem is, how could we optimally partition code on a HPC distributed system, in a fully automated manner, by the compiler, with complete knowledge of how the entire system interacts.

Definitions

We have to start off with a couple definitions first:

The basics

We define $F$ as the set of all the functions in the system.
We define $N$ as the set of all the nodes we want to partition the functionality to.
We define a calling pair as an ordered pair of functions $(f_{1}, f_{2}) \in F \times F$ , where function $f_{1}$ directly calls $f_{2}$ .
We define $CP$ as the set of all calling pairs.
We define a partitioning as a function $p : F \to N$ .
We define a node boundary as $(f_{1}, f_{2}) \in CP$ , where $p (f_{1}) \neq = p (f_{2})$ , given that $p$ is the chosen partitioning.
We define $B (p)$ as the set of all the node boundaries, given the partitioning $p$ .
We define $C n t (f_{1}, f_{2})$ as the average number of times function $f_{1}$ has to call $f_{2}$ .
We define $Ct (f_{1}, f_{2}, p)$ as the cost of calling $f_{2}$ from $f_{1}$ , given partitioning $p$ . We can further expand on this, as follows:

Ct (f_{1}, f_{2}, p) = ⎩ ⎨ ⎧ C n t (f_{1}, f_{2}) \cdot (M_{c} + T r p (l_{c}, p (f_{1}), p (f_{2})) + U_{c} + M_{r} + T r p (l_{r}, p (f_{2}), p (f_{1})) + U_{r}), if (f_{1}, f_{2}) \in B (p) 0, otherwise

Here, $M_{c}$ , $U_{c}$ , $M_{r}$ and $U_{r}$ are the marshalling and un-marshalling costs associated with the RPC, for the call and response respectively. $T r p (l, n_{1}, n_{2})$ is the transport cost of a marshalled message of length $l$ between nodes $n_{1}$ and $n_{2}$ , which has to be added twice: once for the call and once for the response.

We define $C (f)$ as the total number of times that function $f$ is called, during a normal, as-close-to-the-real-use scenario as possible, in a given time frame. We assume that it is possible to run instrumented binaries in a staging state, routing a statistically representative fraction of the real traffic to those instrumented binaries, and then estimating the total number of times those functions would be called from that.

Introducing distributed computing

As of right now, if we were to optimise for the cost function defined above, we would find quite a simple answer: $B (p) = \emptyset$ . This works out well for small systems where the load is small, so a single node can handle all the functionality of the app, but we are going for distributed systems, so we will also introduce the following 5 additional concepts:

We define a load criterion as some computing resource that would be required for running that function. This does not include the load incurred by function calls within that function, either to other functions or any form of recursion. Example of load criterions could include busy CPU time, memory usage, external storage usage (for example, persistent storage through disks). The list of criterions considered is predefined and static.
We define $K$ as the set of all load criterions.
We define $L_{1, k} (f, n)$ as the average load that node $n$ would incur if we were to put function $f$ on it, from a single function call, with regards to the load criterion $k$ ; we are going to assume that function $f$ might take up different loads if put on different nodes, for example, because of certain specialised hardware that would allow heavy computations to run faster on a node when they are present as opposed to another node where they aren’t.
We define $C p_{k} (n)$ as the capacity of node $n$ , that is, the total load, with regards to load criterion $k$ , that it can handle before it gets “overwhelmed,” and performance starts to suffer drastically.
We define $L_{k} (f, n)$ as the total load incurred if we were to put function $f$ on node $n$ , with regards to load criterion $k$ . $L_{1, k}$ is for a single function call, whereas this would be for all calls to the function $f$ , so we expect it to be about $C (f) \cdot L_{1, k} (f, n)$ .

The actual problem statement

Given all the definitions outlined above, we are finally able to give a formal definition of the problem:

Find a set of nodes $N$ and a partitioning of the functions in the system $p$ such that the following conditions all hold simultaneously:

No unused nodes

Formally,

$\forall n \in N, \exists f \in F, p (f) = n$

No overload

We can roughly describe a no-overload condition as:

\forall k \in K, \forall n \in N, f \in F ∣ p (f) = n \sum L_{k} (f, n) \leq C p_{k} (n)

Or, the total load of a node always stays below, or is at most equal to the capacity of the node, with regards to every load criterion.

For now, we assume that the following holds:

L_{k} (f, n) \leq C p_{k} (n), \forall f \in F, \forall n \in N, \forall k \in K

There are functions that could overload a single node by themselves if we were to put them on only one, therefore this doesn’t always hold, and so we will make some minor adjustments in the partitioning algorithm, as well as introducing load balancing, in a later chapter, but for now we will stick with this.

Minimize function call cost

Formally, find partitioning $p$ :

p min ((f_{1}, f_{2}) \in CP \sum Ct (f_{1}, f_{2}, p))

dnbln.dev

Table of Contents

The problem statement

Seedling

The idea

Definitions

The basics

Introducing distributed computing

The actual problem statement

No unused nodes

No overload

Minimize function call cost

Graph View

Backlinks