Saba EuroSys 2023

Rethinking Datacenter Network Allocation from Application’s Perspective

Posted by Yiran on April 3, 2023

提出应当按照应用对带宽的敏感度来分配网络带宽

Background and Motivation

传统方法:按照最大最小公平分配带宽

1. Sensitivity to bandwidth in applications:
Profiling different workloads in isolation and compute the completion time for different percentages of network bandwidth (25% and 75%):

2. Using sensitivity in bandwidth allocation:
The intuition is that by providing more bandwidth to the applications that are most sensitive, their completion time can be reduced without disadvantaging applications with low latency to network bandwidth.

3. Why does the bandwidth sensitivity arise?
本质上不同应用的通信阶段占比不同

4. Sensitivity-aware bandwidth allocation:
Therefore, in this paper, we argue that application sensitivity should be the primary factor driving network bandwidth allocation rather than traditional network-level, application-agnostic metrics.
面临的挑战:
(1) Sensitivity Differentiation: An application-aware solution requires a robust approach to capture the application’s sensitivity to network bandwidth.
(2) Dynamism: At the datacenter scale, a multitude of applications will share the network, with new applications arriving and others terminating or migrating over time. A bandwidth allocation mechanism must be able to handle such dynamism in a timely and resource-effective manner.
(3) Practicality: To facilitate adoption and maximize generality, a bandwidth allocation scheme should not require changes to deployed hardware and/or network protocols.

Saba

Offline Profiler

敏感度模型:多项式回归 The accuracy of the sensitivity model generated by the profiler depends on the degree of the polynomial, and on the differences between settings at runtime as compared to profile time

Controller

  1. Bandwidth Calculation: The controller uses paths inforamtion of Saba-compliant flows passing through switches combined with the profiling result in the sensitivity table and determines the percentage of bandwidth to be allocated to the flows from each application at each switch output port in a way that minimizes the total slowdown across applications.
    Note: the bandwidth calculation for applications on a given output port is independent of other switches
  2. Bandwidth Enforcement: Priority Levels (PLs), per-port queues in switches with the WFQ scheduling policy
  3. Mapping Applications to Queues
    (1) Application to priority level mapping: groups applications according to their bandwidth sensitivity using the K-means clustering algorithm
    (2) Priority level to queue mapping: cluster PLs, Saba uses a fast hierarchical clustering scheme to preserve the information of all possible combinations of PL clustering hierarchically; at runtime, Saba finds the best clustering from the hierarchy for each switch output port and uses it for bandwidth allocation

Saba Library

Evaluation

Thinking

参考文献

Saba: Rethinking Datacenter Network Allocation from Application’s Perspective