乙然的博客 | Yiran Blog

Saba EuroSys 2023

Rethinking Datacenter Network Allocation from Application’s Perspective

提出应当按照应用对带宽的敏感度来分配网络带宽 Background and Motivation 传统方法:按照最大最小公平分配带宽 1. Sensitivity to bandwidth in applications: Profiling different workloads in isolation and compute the completion time for di...

Posted by Yiran on April 3, 2023

Paper collection of HotNets 2022

1. Understanding Host Interconnect Congestion 揭示了Google实际生产集群中主机拥塞的原因：高带宽访问链路导致主机互连（NIC 到 CPU 数据路径）出现瓶颈。 Host congestion turns out to be a result of imperfect interaction (and resource imbalanc...

Posted by Yiran on December 2, 2022

Aequitas SIGCOMM 2022

Admission Control for Performance-Critical RPCs in Datacenters

分布式Admission Control，保证性能敏感RPC的网络时延满足SLO Background and Motivation 三类RPC: (1) Performance-critical (PC) RPCs have tail latency SLOs. Sometimes they are associated with real-time interactive appli...

Posted by Yiran on September 13, 2022

PLB SIGCOMM 2022

Congestion Signals are Simple and Effective for Network Load Balancing

Google的网络负载均衡方案(在FlowBender想法基础上，实际大规模部署的方案) Motivation 现实中的hot spot: Obliviously spreading traffic? flowlet是17年思科提出的想法，(交换机)对网络中自动出现的flowlet随机撒，但发现并不能使流量更加均衡，只是位置上变换了过载和低载的链路 PLB PLB增强了F...

Posted by Yiran on September 10, 2022

ABM SIGCOMM 2022

Active Buffer Management in Datacenters

与数据中心交换机缓存管理相关两个重要的策略：Buffer Management 和 AQM 应如何协作？ Motivation Switch Model: shared buffer, output-queued, 有优先级队列 Buffer Management:决定交换机决定每个队列的buffer最大多大，即空间上的buffer分配, 例如Complete Sharing(完全共...

Posted by Yiran on August 29, 2022

PowerTCP NSDI 2022

Pushing the Performance Limits of Datacenter Networks

Motivation 两种CC: current-based（react to variations）和 voltage-based （react to absolute values）上图非常形象地说明了两种CC各自本质上的不足. 论文在Motivation部分将已有CC统一成了一个抽象模型，借助一些控制理论，得到两个具体的takeaway: While voltage...

Posted by Yiran on May 3, 2022

SchedulePolicy NSDI 2022

Efficient Scheduling Policies for Microsecond-Scale Tasks

在微秒级应用请求的背景下，深入研究负载均衡策略以及核分配策略对latency、CPU efficiency的影响 What load-balancing and core-allocation policies yield the best combination of latency (median and tail) and CPU efficiency for microseco...

Posted by Yiran on April 7, 2022

Persephone SOSP 2021

Optimizing Tail-Latency for Heavy-Tailed Datacenter Workloads with Perséphone

A kernel-bypass OS scheduler designed to minimize tail latency for applications executing at microsecond-scale and exhibiting wide service time distributions 核心思想是 Dynamic Application-aware Reserv...

Posted by Yiran on March 13, 2022

Hoplite SIGCOMM 2021

Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems

第一个为基于任务的分布式系统提供有效的 Collective Communication 支持的工作。 Background Task-based distributed systems asynchronous and dynamic computation: a caller can dynamically invoke a task 𝐴, which im...

Posted by Yiran on November 20, 2021

Robust Congestion Control CoNEXT 2020

RoCC Robust Congestion Control for RDMA

思科提出的基于交换机的鲁棒拥塞控制 Core idea Design 核心: fair rate calculator at the switch An important advantage of this controller is that it can find the fair rate without needing to know the output ra...

Posted by Yiran on July 2, 2021

Yiran Blog

Saba EuroSys 2023

Rethinking Datacenter Network Allocation from Application’s Perspective

Paper collection of HotNets 2022

Paper collection of HotNets 2022

Aequitas SIGCOMM 2022

Admission Control for Performance-Critical RPCs in Datacenters

PLB SIGCOMM 2022

Congestion Signals are Simple and Effective for Network Load Balancing

ABM SIGCOMM 2022

Active Buffer Management in Datacenters

PowerTCP NSDI 2022

Pushing the Performance Limits of Datacenter Networks

SchedulePolicy NSDI 2022

Efficient Scheduling Policies for Microsecond-Scale Tasks

Persephone SOSP 2021

Optimizing Tail-Latency for Heavy-Tailed Datacenter Workloads with Perséphone

Hoplite SIGCOMM 2021

Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems

Robust Congestion Control CoNEXT 2020

RoCC Robust Congestion Control for RDMA

FEATURED TAGS

ABOUT ME