Skip to content

Clustered Clients

jagadeesh-huliyar edited this page Mar 16, 2015 · 17 revisions

Client Cluster - Design and Architecture - High Level

High Availability and Load Balancing

High Availability and Load Balancing on the clients is achieved by having partitioned clients participate in a cluster with each client handling a partition (part of the data set). The partitions are dynamically assigned via Helix. Client instances group themselves into a cluster. The combining agent is cluster name provided in the configuration.

The partitions are dynamically assigned to the clients. Partition reassignment when instances join or leave the cluster is automatic and the load (number of partitions assigned) is uniform across instances. The partitions are numbered 0..N-1 and the client uses them to instantiate connections to the relay stream to consume events whose primary key belonged to the assigned partition partition.

The default partition function partition function used to filter the events is MOD. Any other partition function can be plugged in. The filtering itself is performed at the Relay. The MOD partitioning function is based on the primary key. The algorithm is as follows.

  • If the primary key is of type number then the Partition is Key mod N.
  • Else if the key can converted to Number type then the Partition is Key mod N.
  • Else the Partition is Hash Code of Key mod N. Helix is used for automatic partition assignment.

Refer http://helix.apache.org/ for further details on Helix. Checkpoint for partitions is stored in Zookeeper.

Automatic Partition Assignment using Helix

When a client joins or leaves a cluster helix performs automatic partition reassignment. The partitions are reassigned and the clients get a call back for taking actions on the same.

Example : There are total 10 partitions and only one client is part of cluster.

  • When another client joins the cluster. Helix performs reassignment as part of it partitions 0-4 are assignment to Client-1 and 5-9 are assigned to Client-2.
  • Client-1 gets a call back to shutdown 5 partitions (0-4). Client-1 then does a clean shutdown of the partitions. The Relay Puller and Relay Dispatch threads are stopped and so are the consumers.
  • Client-2 gets a notification to enable 5 partitions (0-4). Client-2 gets the latest committed checkpoint from Zookeeper for partitions 0-4. Client then connects to Relay and starts processing these partitions from the fetched checkpoint in Zookeeper.
  • If the Client-1 or Client-2 goes down or leaves the cluster then reassignment is done again and all partitions are assigned to the surviving client.

When all clients are active the partitions are equally assigned to the Clients. They are hence load balanced and each partition is processed in parallel. When a client goes down it’s partitions are assigned to the other remaining clients. Thus high availability is achieved.

Clone this wiki locally