Architecture and Overview

Crystal design

Crystal seeks to efficiently handle workload heterogeneity and applications with evolving requirements in shared object storage. To achieve this, Crystal separates high-level policies from the mechanisms that implement them at the data plane, to avoid hard-coding the policies in the system itself. To do so, it uses three abstractions: filter, inspection trigger, and controller, in addition to policies.

  • Filter. A filter is a piece of code that a system administrator can inject into the data plane to perform custom computations on incoming object requests1. In Crystal, this concept is broad enough to include computations on object contents (e.g., compression, encryption), data management like caching or pre-fetching, and even resource management such as bandwidth differentiation. A key feature of filters is that the instrumented system is oblivious to their execution and needs no modification to its implementation code to support them.
  • Inspection trigger. This abstraction represents information accrued from the system to automate the execution of filters. There are two types of information sources. A first type that corresponds to the real-time metrics got from the running workloads, like the number of GET operations per second of a data container or the IO bandwidth allocated to a tenant. As with filters, a fundamental feature of workload metrics is that they can be deployed at runtime. A second type of source is the metadata from the objects themselves. Such metadata is typically associated with read and write requests and includes properties like the size or type of objects.
  • Controller. In Crystal, a controller represents an algorithm that manages the behavior of the data plane based on monitoring metrics. A controller may contain a simple rule to automate the execution of a filter, or a complex algorithm requiring global visibility of the cluster to control a filter’s execution under multi-tenancy. Crystal builds a logically centralized control plane formed by supervised and distributed controllers. This allows an administrator to easily deploy new controllers on-the-fly that cope with the requirements of new applications.
  • Policy. Our policies should be extensible for really allowing the system to satisfy evolving requirements. This means that the structure of policies must facilitate the incorporation of new filters, triggers and controllers. To succinctly express policies, Crystal abides by a structure similar to that of the popular IFTTT (If-This-Then-That) service. This service allows users to express small rule-based programs, called “recipes”, using triggers and actions. For example:TRIGGER: compressibility of an object is > 50%
    ACTION: compress
    RECIPE: IF compressibility is > 50% THEN compressAn IFTTT-like language can reflect the extensibility capabilities of the SDS system; at the data plane, we can infer that triggers and actions are translated into our inspection triggers and filters, respectively. At the control plane, a policy is a “recipe” that guides the behavior of control algorithms. Such apparently simple policy structure can express different policy types. On the one hand, the figure below shows storage automation policies that enforce a filter either statically or dynamically based on simple rules; for instance, P1 enforces compression and encryption on document objects of tenant T1, whereas P2 applies data caching on small objects of container C1 when the number of GETs/second is > 5. On the other hand, such policies can also express objectives to be achieved by controllers requiring global visibility and coordination capabilities of the data plane. That is, P3 tells a controller to provide at least 30MBps of aggregated GET bandwidth to tenant T2 under a multi-tenant workload.



Control Plane

Crystal provides administrators with a system-agnostic DSL (Domain-Specific Language) to define SDS services via high-level policies. The DSL “vocabulary” can be extended at runtime with new filters and inspection triggers. The control plane includes an API to compile policies and to manage the life-cycle and metadata of controllers, filters and metrics. Moreover, the control plane is built upon a distributed model. Although logically centralized, the controller is, in practice, split into a set of autonomous micro-services, each running a separate control algorithm. Other microservices, called workload metric processes, close the control loop by exposing monitoring information from the data plane to controllers. The control loop is also extensible, given that both controllers and workload metric processes can be deployed at runtime.

Data Plane

Crystal’s data plane has two core extension points: Inspection triggers and filters. First, a developer can deploy new workload metrics at the data plane to feed distributed controllers with new runtime information on the system. The metrics framework runs the code of metrics and publishes monitoring events to the messaging service. Second, data plane programmability and extensibility is delivered through the filter framework, which intercepts object flows in a transparent manner and runs computations on them. A developer integrating a new filter only needs to contribute the logic; the deployment and execution of the filter is managed by Crystal.


Lifecycle Overview

Here we give an overview of the lifecycle of Crystal to portray how the various components of the architecture reported in the figure interact with one another. Since the interaction flow depends on the type of policy or recipe, we will describe it through three different policies.

Policy P1. The basic interaction flow can be explained trough this simple policy. It is a simple storage policy that instruments the data plane to compress and then encrypt the objects to be written by tenant T1. The flow can be described as follows. At the control plane, the administrator submits P1 to the Crystal controller API, which calls the DSL compilation service. The DSL compiler receives P1 and interacts with the filter framework API. Through this API, the administrator sets up the relationship between the objects of tenant T1 and the two interception filters (compression + encryption) in the metadata store. Upon a new PUT object request by tenant T1, the filter framework will finally fetch from the metadata store the interception filters to be run on that request.

Policy P2. This policy is similar to P1 but it includes a workload metric in the trigger clause of the policy, that is, the number of GET requests per second. Therefore, this policy adds a new ingredient to the interaction flow: The management of dynamic policies. Very succinctly, there are two main differences. First, the DSL compiler now generates a distributed controller that subscribes to a metric: GETS_SEC. Second, it instruments the CACHING interception filter to run only if the trigger condition is met. As shown in Fig. 1, the distributed controller related to P2 makes an API call to the filter framework to program the system to enforce caching on container C1 as soon as the trigger condition is met.

Policy P3. With this policy, we will finally describe the interaction flow that the system follows to dynamically respond to the current resource usage of a workflow. This policy provisions a certain share of IO bandwidth to tenant T2 in a multi-tenant setting. In Crystal, this is an example of a SLO policy that controls the resource usage in the system to offer QoS guarantees. In Crystal, a global objective like this requires of a distributed controller with a global visibility of the system. In Fig. 1, the distributed controller related to P3 performs this task. When the DSL receives P3, it calls to the bandwidth differentiation API to set the SLO for tenant T2. The distributed controller then retrieves the SLOs from the metadata store and disseminates the bandwidth assignments to the data plane storage nodes. These assignments are consumed by the bandwidth differentiation filter, which is in charge of throttling the streams of object requests as well as to close the loop by measuring the bandwidth usage in real time.