# Software Project: Assignment 2

*** Due date: Feb 17 at start of class. Feb 22 (Wed) ****

## Goal

Design, implement and evaluate a traffic engineering (TE) algorithm for routing elephant flows on a leaf-spine datacenter topology.

## Problem description

### Topology

We define a leaf-spine topology with L leaf switches and S spine switches. Each leaf switch is connected to all S spine switches. All links between leaf and spine switches have uniform capacity of with numerical value of 1 unit. In this topology, there are S different routes between any two leaf nodes via one of the S spine switches. For example, leaf nodes D and F have two paths between them through spine switches DAF and DBF respectively.

In some cases, a partial link failure can reduce the capacity of a link. We will consider only one type of failure: a partial link failure that reduces the capacity of a link by half of its original capacity, i.e. link capacity of 0.5.

### Traffic pattern

Our traffic consists only of flows of infinite length. These represent the so-called "elephant flows" in a datacenter that usually need a specialized routing. For traffic engineering, we need to consider only those flows that are across leaf switches.  We consider a random traffic pattern in which the source leaf switch and the destination leaf switch (different from source) for a flow is chosen randomly.

### Flow throughput

The throughput of a flow depends on its routing as well on the routing of other flows in the datacenter. If a flow does not share any link with any other flow, its throughput = 1. In the common case, a link will have some flows for which it is the bottleneck link and other flows for which it isn't the bottleneck link. The throughput of these other flows will be determined by the share they receive on their respective bottleneck links. The throughput of a flow that is bottlenecked on this link is equal to the remaining link capacity (after subtracting the throughput of the other flows) divided by the number of flows bottlenecked on this link. For example, if a link has 4 flows of which 3 flows are bottlenecked on this link, but one of the flows is bottlenecked on some other link and has a throughput of 1/8. The flows that are bottlenecked on this link will each have a throughput of (1 - 1/8)/3 = 7/24.

### TE algorithm objective

The goal of the TE algorithm is to maximize the sum of throughput of all flows.

### Baselines for comparison

We will compare the newly proposed TE scheme against these two baseline routing schemes. (1) Random: choose any one of the S paths for a flow randomly (2) Round-robin: if a leaf switch has n outgoing flows, i-th flow takes the route via (i%S)-th spine switch.

## Experimental procedure

Evaluate three topologies with (L=4, S=8), (L=4, S=8) and (L=8, S=8) respectively. For each topology, consider two scenarios. First, all links have uniform capacity and second, 25% of randomly chosen links have capacity reduced by half. Thus, we will evaluate 3*2 = 6 topologies in total.

In each case, calculate the total throughput for a varying number of flows. The number of flows N is defined as follows: N = f * (L*S), where f = {1/8, 1/4, 1/2, 1, 2, 4, 8}.

To reduce variability in output due to randomness in inputs, take 10 trials of each experiment and report the median value of total throughput.

### Output graph

For each topology, present your result in the form of a graph whose x-axis shows the number of flows and y-axis shows the total throughput. Use different lines to show the output of each traffic engineering algorithm. An illustrative output graph drawn in a spreadsheet program is shown below.

## Submission instructions

Your submission should include the following:

• Brief description of the key ideas in your algorithm. Optionally, you can use diagrams or pseudo-code to help explain your algorithm.
• Present the graph for each topology, and interpret your findings and provide an explanation for the trends you observe.
• Submit your code as a zip file with a brief README. We will review your code, but will not necessarily run the program.

-------

## FAQ

Some further clarifications based on questions from some students.

• Experimental inputs
• A flow is defined only by a pair of leaf nodes, say L1 (src) and L2 (dest).
• Each link has a capacity of 1 unit in both forward and reverse directions.
• For each topology, you need to do seven (7) separate experiments. The number of flows in these experiments is calculated as follows. Suppose (L, S) = (4, 4), then these seven exps will have 2, 4, 8, 16, 32, 64 and 128 flows respectively corresponding to different values of f = {1/8, 1/4, 1/2, 1, 2, 4, 8}.
• Before each trial, you can initialize a random number generator with a different seed so that you get different random inputs in each experiment.
• You are welcome to evaluate other traffic patterns beside random, but you must evaluate random.
• TE algorithm:
• The routing algorithm calculates which spine node the the flow goes through. There may be multiple flows between the same pair of L1 and L2. Each of these flows can possibly be routed through a different spine node.
• You must compare against the two baselines Random and RoundRobin in each experiment.
• Throughput calculation
• The final output of a trial of an experiment is a single number: the total throughput. For example, if you have 4 flows in an experiment with throughputs of 1/8, 1/8, 1/4 and 1/2 respectively, then output is 1.
• The throughput of each flow on both links (from leaf to spine and spine to leaf) is equal and determined by which link is the bottleneck. Implicitly, we are assuming TCP flows where the source (e.g. server) infers network bottleneck and adapts the sending window size.
• To calculate throughput, consider links in the decreasing order of number of flows on a link.
• As the number of flows increases, the median total throughput for a topology cannot decrease. It either increases or remains the same.
• Total throughput can never be more than the sum of capacities of links between leaf and spine nodes. For example, in a (L=4, S=4) network, total throughput <= 16.