Integrating MPC in Big Data Workflows

Speaker/Bio

Nikolaj Volgushev is a second-year PhD student working in systems security.

Abstract

Secure multi-party computation (MPC) allows multiple parties to perform a joint computation without disclosing their private inputs. Many real-world joint computation use cases, however, involve data analyses on very large data sets, and are implemented by non-experts. Moreover, the collaborating parties \x96 e.g., several companies \x96 often have different data analytics stacks deployed internally.These restrictions hamper the real-world usability of MPC.To address these challenges, we combine existing MPC frame-works with data-parallel analytics frameworks by extending the Musketeer big data workflow manager [3]. Musketeer automatically generates code for both the sensitive parts of a workflow,which are executed in MPC, and the remainder of the computation,which runs on scalable, widely-deployed analytics systems.In a prototype use case, we compute the Herfindahl-HirschmanIndex (HHI), an index of market concentration used in antitrust regulation, on an aggregate 156 GB of taxi trip data over five transportation companies. Our implementation computes the HHI in about 20 minutes using a combination of Hadoop and VIFF [18], while even \x93mixed mode\x94 MPC with VIFF alone would have taken many hours. Finally, we discuss future research questions that we seek to address using our approach.