HyperLogLog is an algorithm that addresses the count-distinct problem. To do this it approximates the numbers of items in a set. Determining the exact cardinality of a set requires memory according to the cardinality of the set. Because it estimates the cardinality by probability, the HyperLogLog algorithm can run with more reasonable memory requirements.

HyperLogLog in Redis

Open source Redis implements HyperLogLog (HLL) as a native data-structure. It supports adding elements (PFADD) to an HLL, counting elements (PFCOUNT) of HLLs, and merging of (PFMERGE) HLLs.

Here is an example of a simple write case:

TimeReplica 1Replica 2
t1PFADD hll x
t2— sync —
t3PFADD hll y
t4— sync —
t5PFCOUNT hll –> 2PFCOUNT hll –> 2

Here is an example of a concurrent add case:

TimeReplica 1Replica 2
t1PFADD hll xPFADD hll y
t2PFCOUNT hll –> 1PFCOUNT hll –> 1
t3— sync —
t4PFCOUNT hll –> 2PFCOUNT hll –> 2

The DEL-wins approach

Other collections in the Redis-CRDT implementation use the observed remove method to resolve conflicts. The CRDT-HLL uses the DEL-wins method. If a DEL request is received at the same time as any other request (ADD/MERGE/EXPIRE) on the HLL-key the replicas consistently converge to delete key. In the observed remove method used by other collections (sets, lists, sorted-sets and hashes), only the replica that received the DEL request removes the elements, but elements added concurrently in other replicas exist in the consistently converged collection. We chose to use the DEL-wins method for the CRDT-HLL to maintain the original time and space complexity of the HLL in open source Redis.

Here is an example of a DEL-wins case:

HLL|Set
|
TimeReplica 1Replica 2|TimeReplica 1Replica 2
|
t1PFADD h e1|t1SADD s e1
t2— sync —|t2— sync —
t3PFCOUNT h –> 1PFCOUNT h –> 1|t3SCARD s –> 1SCARD s –> 1
t4PFADD h e2Del h|t4SADD s e2Del S
t5PFCOUNT h –> 2PFCOUNT h –> 0|t5SCARD s –> 2SCARD s –> 0
t6— sync —|t6— sync —
t7PFCOUNT h –> 0PFCOUNT h –> 0|t7SCARD s –> 1SCARD s –> 1
t8Exists h –> 0Exists h –> 0|t8Exists s –> 1Exists s –> 1
|t9SMEMBERS s –> {e2}SMEMBERS s –> {e2}

HLL in Active-Active databases versus HLL in Open Source Redis

In Active-Active databases, we implemented HLL within the CRDT on the basis of the Redis implementation with a few exceptions:

  • Redis keeps the HLL data structure as an encoded string object such that you can potentially run any string request can on a key that contains an HLL. In CRDT, only get and set are supported for HLL.
  • In CRDT, if you do SET on a key that contains a value encoded as an HLL, then the value will remain an HLL. If the value is not encoded as HLL, then it will be a register.