More

hit9 · on April 14, 2016

Reasons to create it:

- Most redis client implementations don't support redis cluster.

- It's hard to provide redis client libraries for multiple languages without breaking compatibilities.

- twemproxy requires restarting to add/remove redis nodes.

- twemproxy is single-threaded.

Hope you like it!

hit9 · on March 28, 2016

And the github project is https://github.com/hit9/htree

hit9 · on March 28, 2016

The page dosen't say words like "faster than a hash table".

The hash table has time complexity O(1) on get/set operations. And this tree has a constant-level time complexity, and should be a bit slower than hash table.

Besides the redundancy of the golang array auto-expanding, this tree is more space-efficient than a hash table, If you implement it in C without the auto-expanding feature, the space utilization can be better.

ot · on March 28, 2016

What do you mean by constant-level time complexity? That the number of levels is constant? That's just because you bound the total number of keys to 2^32. With that constraint, anything is constant-time, even linear search.

This is basically a trie built on the chinese-reminder decompositions of the keys. I'm not an expert in Go, but I'm sure you can use fixed-size arrays, and thus implement any open-addressing hash table with no space overhead.

hit9 · on March 28, 2016

Hmm.. The 2 * 3 ... 29 can be larger, if we want a larger tree.

Comment 1: O(2 * 3 ... 29) is indeed constant level, but we don't like it. The put/get/delete operations on this tree are all constant level, but at most Sum(lg2~lg29) checks.

Comment 2: Yes, we can use fixed-size arrays and build an open-addressing hash table with no space overhead (load factor is 1), this makes 100% table space utilization, but any insertions would cause table to resize, which is an O(N) copy (where N is the table size). However, insertions on a tree node can be done locally, without moving the whole tree.

ot · on March 28, 2016

> Hmm.. The 2 * 3 ... 29 can be larger, if we want a larger tree.

I think you need a refresher on asymptotic analysis :)

hit9 · on March 28, 2016

Sorry but I have typo in the documentation, the " Goroutine Safety." is the headline of following paragraph.

HTree dose not guarantee the goroutine safety:

> "Lock granularity depends on the use case."

The docs page updated.

hit9 · on March 27, 2016

This tree is implemented as the record position informations indexing container for a disk-based storage engine on the bitcask model..

hit9 · on March 27, 2016

And the memory are expensive in that case..

hit9 · on March 27, 2016

hit9 · on March 27, 2016

In golang, array is auto-expanding so the children array allocates more space than its length, which depends on the golang internal implementation.

> "Child nodes are stored in an array orderly, and checked by binary-search but not indexed by remainders, array indexing will result redundancy entries, this is for less memory usage, with a bit performance loss. "

I just cannot find some way (such as linked list?) to use the exact space with at-least binary-search time complexity.

jbapple · on March 27, 2016

What is the actual space overhead of this structure, and how did you measure it?

hit9 · on March 27, 2016

Advantage: constant level time complexity with better space utilization.

The binary tree with the binary-search searching strategy has a time complexity O(logN), which is higher than htree's.

This htree is mainly for memory bounded cases.

hit9 · on March 27, 2016

For this case, conflicts can be solved via the traditional separate chaining strategy, to open a list on the Item.

This htree was originally implemented as the indexing container to store key-value poisition informations for the bitcask storage engine. And string keys are not stored in the htree to save more memory. Collisions of two string keys are distinguished by checking the disk.

hit9 · on March 27, 2016

And the origin string keys are stored in the htree item only if there is collision. So the string key distinguishing checks are at most once.

hit9 · on Jan 29, 2016

Hi, skyline is a great work, here is my views:

1. Skyline has many built-in detection algorithms. Banshee only uses the 3-sigma rule.

2. Skyline's algorithms are pure and simple, without considerations of burr points, scatters (like counters).

3. Banshee comes with alerting rules management panel. Skyline's webapp is a simple analyzation results viewer.

4. Banshee uses the anomaly-factor trending (via weighted moving average) and skyline doesn't. The trending follows the analyzed score (which is to describe how anomalous the current metric is), thus, alerts only happen if the metric is anomalous enough (i.e. with a very big value) or if anomalous time is long enough.

5. Banshee has the management for projects, users, rules.

6. Banshee uses statsd as data source, skyline uses carbon.

7. Skyline uses redis as storage. Banshee has no external storage server dependencies. Metrics are stored on disk via leveldb, and rules are stored on disk via sqlite.

8. Banshee is young and skyline is no longer actively maintained...

Ayaz · on Jan 29, 2016

Thanks for the break-down. It looks interesting. At work, we have been testing Skyline for a number of different metrics coming in from a large distributed network. I'm interested in seeing how Banshee performs in there.