In this paper, we investigate and characterize the behavior of “big” and “fast” data analysis frameworks, in multi-tenant, shared settings for which computing resources (CPU and memory) are limited. Such settings and frameworks are frequently employed in both public and private cloud deployments. Resource constraints stem from both physical limitations (private clouds) and what the user is willing to pay (public clouds). Because of these constraints, users increasingly attempt to maximize resource utilization and sharing in these settings.
To understand how popular analytics frameworks behave and interfere with each other under such constraints, we investigate the use of Mesos to provide fair resource sharing for resource constrained private cloud systems. We empirically evaluate such systems using Hadoop, Spark, and Storm multi-tenant workloads. Our results show that in constrained environments, there is significant performance interference that manifests in multiple ways. First, Mesos is unable to achieve fair resource sharing for many configurations. Moreover, application performance over competing frameworks depends on Mesos offer order and is highly variable. Finally, we find that resource allocation among tenants that employ coarse-grained and fine-grained framework scheduling, can lead to a form of deadlock for fine-grained frameworks and underutilization of system resources.