Technical Program

9:15 Welcome
9:30

Keynote

Eno Thereska - Amazon, UK

Operationalising Machine Learning at Scale.

Deploying and maintaining machine learning (ML) in production requires a lot of manual work and a large team of scientists and engineers. This talk first describes today's pain points in operationalising distributed ML pipelines. We then make the case that there is a principled way to automate away most of the pain through a framework that, at least at a high level, looks like a distributed operating system. We present early thoughts on the IO, storage and compute model for such an OS, as well as ways to test and debug it.

Bio: Eno Thereska is a Principal Engineer at Amazon. Previously he was a Member of Technical Staff at Confluent working on Apache Kafka, and before that he was a Senior Researcher at Microsoft Research. He has broad interests in computer systems. At Microsoft he received 2 Ship-It Awards for contributions to its data centers. He has several academic publications in top conferences in the field of storage systems and operating systems including FAST, OSDI, SOSP, SIGMETRICS and CHI. He served as technical co-chair of the File and Storage Systems Conference (FAST’14). He is the founding co-chair of the HotStorage series of workshops. Eno is a recipient of the 2014 IEEE William R. Bennett Prize, recipient of the IEEE Infocomm 2011 Best Paper award, and recipient of the Usenix FAST Best Paper and Best Student Paper awards in 2005 and 2004 respectively. He was Honorary Research Fellow at Imperial College London 2015-2017. He received his PhD from CMU.

10:30 Coffee break
10:45 Session 1: Large-Scale Production Systems
 

Marcos Aguilera - VMWare Research, US

Remote Memory for Non-Masochists
Remote memory is an old idea that has recently re-emerged in the age of fast networks, and it is now increasingly compelling. However, for remote memory to be successful, it needs to provide a better interface. The current interface based on RDMA is complex, error-prone, and clunky, limiting its adoption to experts and masochists. In this talk, we describe a number of alternatives that we are exploring, which are simpler, more portable, and conceptually richer.

 

Adrien Conrath - Facebook, UK

LogDevice: A Distributed Data Store for Logs
The log is a very powerful abstraction that can be leveraged for a variety of workloads at Facebook's scale. I will present Facebook's work to build a record-oriented data store for logs that is highly available and durable while maintaining a repeatable order on those records.

 

Dushyanth Narayanan - Microsoft Research Cambridge, UK

Opacity and Performance in a Distributed Transactional System
Transactions can simplify distributed applications by hiding the complexities of handling data distribution, concurrency, and failures from the application developer. Ideally the developer would see the abstraction of a single large machine that runs transactions sequentially and never fails. As even the best abstractions are unlikely to be used if they perform poorly or reduce availability, the system must provide this abstraction with high performance and availability. Existing distributed transactional designs either weaken this abstraction or are not designed for the best performance within a data centre.
In our previous work we built FaRM, which provided strict serializability with transparent fault tolerance, high availability and performance. However FaRM lacked opacity: eventually-aborted transactions could see an inconsistent view of the database during execution, which would be detected when the transaction attempted to commit. As FaRM moved to production, this lack of opacity --- to our surprise --- became a significant issue for developers. Adding opacity without compromising the consistency, availability, or performance of FaRM raised some interesting challenges and I will describe how we solved them.

12:15 Lunch
14:00 Session 2: Replication and Consensus
 

Manos Kapritsos - University of Michigan Ann Arbor, US

Replication Beyond the Client-Server Model
Interactions between services are an integral part of today’s computing. Services are typically not standalone, but rather interact with other services as part of a larger system (e.g. online stores interacting with credit card services, swarms of microservices talking to each other, etc.). Despite this trend, our replication protocols have remained strangely attached to the old client-server model, where clients issue requests to a standalone server. In this talk, I will show you that our existing replication protocols run into both performance and correctness problems when deployed in an interactive environment and I will propose some solutions for enabling consistent and efficient replication despite service interactions.

 

Heidi Howard - University of Cambridge, UK

Revising the theory of Distributed Consensus: Going Beyond Multi-Paxos
The ability to reach consensus between hosts, whether for addressing, primary election, locking, or coordination, is a fundamental necessity of modern distributed systems. The Paxos algorithm is at the heart of how we achieve distributed consensus today and as such has been the subject of extensive research to extend and optimise the algorithm for practical distributed systems. In the talk, we revisit the underlying theory behind Paxos, weakening its original requirements and generalising the algorithm. We demonstrate that Paxos is, in fact, a single point on a broad spectrum of approaches to consensus and conclude by arguing that other points on this spectrum offer a much-improved foundation for constructing scalable, resilient and high performance distributed systems.

15:00 Coffee break
15:30 Session 3: Data Analytics and Synchronization
 

Pramod Bhatotia - University of Edinburgh, UK

Approximate Stream Analytics Systems
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing based on the chosen sample size can make a systematic trade-off between the output accuracy and computation efficiency. Unfortunately, the state-of-the-art systems for approximate computing primarily target batch analytics, where the input data remains unchanged during the course of computation. Thus, they are not well-suited for stream analytics. To overcome this limitation, we are working on building approximate computing systems targeting stream analytics.
In this talk, I will first present an overview of our research work on approximate stream analytics systems. Thereafter, I will present the detailed design and implementation of two systems for approximate stream analytics: (a) StreamApprox: a stream analytics system for approximate computing, and (b) PrivApprox: a privacy-preserving stream analytics system leveraging approximate computing.

 

Paris Carbone - KTH, Sweden

Asynchronous Epoch Commits for Fast and Consistent Stateful Streaming with Apache Flink
Guarantees for scalable stream processing come under many misleading names today: exactly-once processing, at-least once, end-to-end fault tolerance etc. In this talk, we will instead present a rigorous overview of epoch-based stream processing, a clear, underlying consistent processing model employed by Apache Flink. Epoch-based stream processing relies on the notion of epoch cuts, a restricted type of Chandy and Lamport's consistent cut. We will further examine different approaches for achieving epoch cuts and the performance implications, showcasing the benefits of asynchronous epoch snapshots employed by Apache Flink. Finally, we will summarize how Flink's epoch commit protocol coordinates operations with locally embedded and externally persisted state systems (e.g., Kafka, HDFS, Pravega) in practice to offer an externally consistent view of the state built by its applications.

 

Martin Kleppmann - University of Cambridge, UK

Automerge: Replicated Data Structures for Peer-to-Peer Collaboration
This talk introduces Automerge, a JavaScript library for data synchronisation between mobile devices such as laptop computers and smartphones. It allows users to read and modify data even while their device is offline, and it automatically merges changes made concurrently on different devices. Unlike most existing data synchronisation systems, Automerge does not require data to be sent via a centralised server, but rather allows local and peer-to-peer networks to be used. We show how this project spans the gamut from the theory of Conflict-free Replicated Data Types (CRDTs) and formal verification, all the way to implementing collaborative applications that use these data structures.