Yiran Blog

New post every day (with probability 0.03).

Saba EuroSys 2023

Rethinking Datacenter Network Allocation from Application’s Perspective

提出应当按照应用对带宽的敏感度来分配网络带宽 Background and Motivation 传统方法:按照最大最小公平分配带宽 1. Sensitivity to bandwidth in applications: Profiling different workloads in isolation and compute the completion time for di...

Paper collection of HotNets 2022

Paper collection of HotNets 2022

1. Understanding Host Interconnect Congestion 揭示了Google实际生产集群中主机拥塞的原因:高带宽访问链路导致主机互连(NIC 到 CPU 数据路径)出现瓶颈。 Host congestion turns out to be a result of imperfect interaction (and resource imbalanc...

Aequitas SIGCOMM 2022

Admission Control for Performance-Critical RPCs in Datacenters

分布式Admission Control,保证性能敏感RPC的网络时延满足SLO Background and Motivation 三类RPC: (1) Performance-critical (PC) RPCs have tail latency SLOs. Sometimes they are associated with real-time interactive appli...

PLB SIGCOMM 2022

Congestion Signals are Simple and Effective for Network Load Balancing

Google的网络负载均衡方案(在FlowBender想法基础上,实际大规模部署的方案) Motivation 现实中的hot spot: Obliviously spreading traffic? flowlet是17年思科提出的想法,(交换机)对网络中自动出现的flowlet随机撒,但发现并不能使流量更加均衡,只是位置上变换了过载和低载的链路 PLB PLB增强了F...

ABM SIGCOMM 2022

Active Buffer Management in Datacenters

与数据中心交换机缓存管理相关两个重要的策略:Buffer Management 和 AQM 应如何协作? Motivation Switch Model: shared buffer, output-queued, 有优先级队列 Buffer Management:决定交换机决定每个队列的buffer最大多大,即空间上的buffer分配, 例如Complete Sharing(完全共...

PowerTCP NSDI 2022

Pushing the Performance Limits of Datacenter Networks

Motivation 两种CC: current-based(react to variations) 和 voltage-based (react to absolute values) 上图非常形象地说明了两种CC各自本质上的不足. 论文在Motivation部分将已有CC统一成了一个抽象模型,借助一些控制理论,得到两个具体的takeaway: While voltage...

SchedulePolicy NSDI 2022

Efficient Scheduling Policies for Microsecond-Scale Tasks

在微秒级应用请求的背景下,深入研究负载均衡策略以及核分配策略对latency、CPU efficiency的影响 What load-balancing and core-allocation policies yield the best combination of latency (median and tail) and CPU efficiency for microseco...

Persephone SOSP 2021

Optimizing Tail-Latency for Heavy-Tailed Datacenter Workloads with Perséphone

A kernel-bypass OS scheduler designed to minimize tail latency for applications executing at microsecond-scale and exhibiting wide service time distributions 核心思想是 Dynamic Application-aware Reserv...

Hoplite SIGCOMM 2021

Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems

第一个为基于任务的分布式系统提供有效的 Collective Communication 支持的工作。 Background Task-based distributed systems asynchronous and dynamic computation: a caller can dynamically invoke a task 𝐴, which im...

Robust Congestion Control CoNEXT 2020

RoCC Robust Congestion Control for RDMA

思科提出的基于交换机的鲁棒拥塞控制 Core idea Design 核心: fair rate calculator at the switch An important advantage of this controller is that it can find the fair rate without needing to know the output ra...