Keynotes and invited industry talks
LADIS 2014 will feature keynote addresses and invited industry talks by the following speakers:
|Spanner: Google's Globally-Distributed Database|
|Brian Cooper (Google)|
|Deferring the Effect of Transactions|
|Johannes Gehrke (Microsoft / Cornell University)|
|Invited Industry Talk I|
|Big Data, Big Mistakes, Big Learnings|
|Lars Albertsson (Spotify)|
|Invited Industry Talk II|
|Using Automated Load Testing to Diagnose Performance Bottlenecks at Scale|
|Dmitri Perelman (Facebook)|
|Invited Industry Talk III|
|From Paper to Product: Putting CRDTs into Riak 2.0|
|Russel Brown (Basho)|
|Invited Industry Talk IV|
|Supporting Large Scale Systems with the salesforce.com Multi Tenant Platform|
|Peter Chittum (salesforce.com)|
|Invited Industry Talk V|
|Gearing for Exabyte Storage with Hadoop Distributed Filesystem|
|Edward Bortnikov (Yahoo! Research)|
Title: Spanner: Google's Globally-Distributed Database
Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It provides strong transactional semantics, consistent replication, and high performance reads and writes for a variety of Google's applications. Spanner scales to large data sizes, workloads, and geographical distributions by relying on a synchronized time abstraction (called TrueTime) and consensus-based replication (via Paxos). I'll discuss the distributed aspects of Spanner and some of the challenges in making them work well. I'll also discuss some open challenges that we still see in building scalable distributed storage systems.
Title: Deferring the Effect of Transactions
ACID Transactions have for decades provided the gold standard in strong consistency. I will describe applications and models for deferring the effect of transactions on the state of the system while maintaining one-copy serializability. In our first model of a quantum database, we can commit transactions while deferring assignments of values in these transactions to optimize the allocation of resources. In our second model, we permit parts of a replicated or distributed database system to be inconsistent during execution, as long as this inconsistency is bounded and does not affect transaction correctness. Our fully automated approach uses program analysis to extract semantic information about permissible levels of inconsistency; it generates treaties between sites that allow sites to operate independently until treaties are violated.
Johannes Gehrke is a Distinguished Engineer at Microsoft and the Tisch University Professor in the Department of Computer Science at Cornell University. At Microsoft, he works on Delve and Big Data and Data Science in Office 365. Johannes received an NSF Career Award, a Sloan Fellowship, a Humboldt Research Award, the 2011 IEEE Computer Society Technical Achievement Award, and the 2011 Blavatnik Award from the New York Academy of Sciences. He co-authored the undergraduate textbook Database Management Systems (McGrawHill (2002), currently in its third edition), used at universities all over the world. Johannes was co-Chair of SIGKDD 2004, VLDB 2007, and ICDE 2012, and he is co-chair of SOCC 2014 and ICDE 2015.
Invited industrial talks
Title: Big Data, Big Mistakes, Big Learnings
Spotify has grown quickly since the launch in 2008, and now has 40 million active users streaming music. We installed our first Hadoop cluster in 2009, and now have more than 100 employees touching the data flow on a daily basis. Failing fast and learning from mistakes are core components of the Spotify culture, and our path to scalable data processing has been riddled with failures and learnings. I will go through a few things we did not get right at first, what we learnt in the process, and how we have addressed the issues that came up.
Title: Using Automated Load Testing to Diagnose Performance Bottlenecks at Scale
Facebook's infrastructure serves millions requests per second, providing reliable personalized experience to more than a billion people from all over the world. It is comprised of hundreds of distributed, interconnected internal services which rapidly evolve and change in a decentralized manner. This infrastructure is deployed amongst many geographically distributed datacenters.
This presents non-trivial capacity management challenges: we need a reliable tool for understanding capacity bottlenecks and analyzing performance of individual services so that we can appropriately allocate resources amongst these services. Understanding the capacity of each of these datacenters is critical to guarantee optimal user experience during planned and unplanned datacenter outages.
To meet these challenges, we developed Keanu, a family of automated continuous load testing tools running at different levels of infrastructure hierarchy. Keanu provides essential information for optimizing resource allocation, it identifies capacity regressions and is used as an A/B testing framework for improving performance of individual services. In this talk I’ll describe and motivate the approach for continuous load testing of large scale systems. I’ll also present the common patterns that cause problems in our infrastructure and talk about the ways load testing is essential to unveil these issues.
Title: From Paper to Product: Putting CRDTs into Riak 2.0
Riak is an open source, Erlang implementation of a Dynamo-like Key/Value store, written and support by Basho, and deployed in production by many customers. Riak is eventually consistent, and favours Availability/Low Latency over Consistency. One of the consequences of this is the possibility of "concurrency anomalies" that manifest as conflicting writes, what Basho has always called "sibling values".
The first time a software engineer using Riak as the datastore for their applications encounters multiple possible values for a single key, they're surprised. Then they learn they need to tell the database the single correct value, by writing a "merge function" that returns a single, deterministic, meaningful value. Google and others have described this ad hoc process as "time consuming, and error prone." Basho's customers report expending significant engineering effort writing merge functions.
When we learned of CRDTs from a 2011 INRIA tech report, they appeared to offer a possible solution to this usability problem. This talk describes the experience of collaborating with academia to try and add CRDTs to Riak.
Title: Supporting Large Scale Systems with the salesforce.com Multi Tenant Platform
In 1999 salesforce.com was a single server in a closet in San Francisco. In 2009, we were processing 13 billion transactions per quarter. Today we handle 13 billion transactions in a little over a week. In this presentation you will learn about the multi-tenant architecture that has made this growth possible and find out how this has influenced important choices in the growth of the platform. The talk will end with a brief demonstration of how this architecture enables our customer and partner developers to quickly create custom applications and functionality.
Title: Gearing for Exabyte Storage with Hadoop Distributed Filesystem
The expectations from scaling Hadoop grids are continuously growing. Systems managing exabytes of storage are envisioned close. However, this growth is challenged by the scalability limits of Hadoop’s namenode - the distributed filesystem's metadata service. We present the recent efforts by the Hadoop community to re-architect the namenode for the next-generation scalability requirements. We focus on (1) splitting the namenode service into distributed filesystem and block management tiers, (2) scaling the new services in a workload-optimized way, and (3) re-designing the concurrency control for high performance. Implementing these changes while guaranteeing consistent and high-speed access to all storage metadata is a nontrivial task. We present the algorithmic principles behind the design, and demonstrate performance results.