Yiran Blog

New post every day (with probability 0.03).

Blackbox Prediction NSDI 2021

On the Use of ML for Blackbox System Performance Prediction

Does ML make prediction simpler (i.e., allowing us to treat systems as blackboxes) and general (i.e., across a range of applications and use-cases)? The answer is NO. Core idea 实验探究机器学习对黑盒系统性能预测 ...

OnRamp NSDI 2021

Breaking the Transience Equilibrium Nexus A New Approach to Datacenter Packet Transport

云数据中心拥塞控制算法: factoring of datacenter congestion control into two separate control loops, one each for transience and equilibrium. Core idea 解耦 Transience-Equilibrium (笔者理解为瞬态拥塞和持久/稳态拥塞): 传统 C...

Annulus SIGCOMM 2020

A Dual Congestion Control Loop for Datacenter and WAN Traffic Aggregates

针对广域网流量和数据中心流量在数据中心内部共享bottleneck带来的性能问题 Core idea 两个control loop,主要目的是让广域网的流量能够及时获知带宽变化,进行速率调节: congestion at nearby datacenter switches (e.g. ToRs) configured to send direct feedback; 借助于Q...

Swift SIGCOMM 2020

Delay is Simple and Effective for Congestion Control in the Datacenter

Swift是谷歌提出的数据中心拥塞控制协议, 继承Timely的思想, 使用delay作为拥塞信号. Core idea 区分 fabric congestion 和 endpoint congestion: 细粒度delay测量 endpoint delay: remote-queuing (echoed in the ACK) + Local NIC Rx Delay ...

ADS SIGCOMM CCR 2019

Datacenter Congestion Control Identifying what is essential and making it practical

Core idea - two questions: What factors (i.e., which particular design decisions) are the most essential to achieving good performance? Can we deploy such designs easily? Ke...

Aeolus SIGCOMM 2020

A Building Block for Proactive Transport in Datacenters

Aeolus针对近年数据中心较热的主动拥塞控制协议一个普遍没有解决好的问题:第一个RTT内unscheduled的数据包直接发送造成丢包, 延时不可控。 主动拥塞控制思想:request and allocation, 显式分配带宽,提前避免拥塞发生 主动拥塞控制实现方式:基于集中控制器(Fastpass)、基于交换机(TFC PDQ)、基于接收端(ExpressPass pHo...

HPCC SIGCOMM 2019

HPCC High Precision Congestion Control

HPCC是阿里推出的针对高速RDMA网络的新的拥塞控制协议,借助于INT提供的详细信息来进行精确的速率控制,具有快速收敛,保持接近zero-queue的优点。 Motivation 论文认为,如今高速网络中传输普遍存在三个问题: 收敛慢 始终有standing queue,增加latency CC的参数调优困难,operators总是需要在stability与utiliz...

PCN NSDI 2020

Re-architecting Congestion Management in Lossless Ethernet

这篇文章发表在NSDI 2020会议上。笔者花了相当长的时间细读、理解透彻这篇论文(前前后后几个月哈哈)。笔者的笔记也包含了一些笔者自己的理解。这篇论文重构了目前无损以太网的拥塞控制,指出了当前拥塞管理架构中两个核心模块(拥塞探测和速率调节)存在的根本问题。提出了PCN的拥塞控制协议。 Experimental observations 论文作者通过构造一个经典场景,给出实验性的观察,指...

QJUMP NSDI 2015

Queues Don’t Matter When You Can JUMP Them!

核心思想 rate limiting 与 priority value 相结合, 来保证 latency Motivation 解决数据中心网络中 network interference 问题: congestion from throughput-intensive applications causes queueing that delays traffic from late...

NDP SIGCOMM 2017

Re-architecting datacenter networks and stacks

Very intuitive animation (YouTube) Architectural points End-to-end Service Demands Transport Protocol Switch Service Model 参考文献 Re-architecting datacenter netwo...