Suppose your clustering keys are
k1 t1, k2 t2, ..., kn tn
where ki is the ith key name and ti is the ith key type. Then the order data is stored in is lexicographic ordering where each dimension is compared using the comparator for that type.
So (a1, a2, ..., an) < (b1, b2, ..., bn) if a1 < b1 using t1 comparator, or a1=b1 and a2 < b2 using t2 comparator, or (a1=b1 and a2=b2) and a3 < b3 using t3 comparator, etc..
This means that it is efficient to find all rows with a certain k1=a, since the data is stored together. But it is inefficient to find all rows with ki=x for i > 1. In fact, such a query isn't allowed - the only clustering key constraints that are allowed specify zero or more clustering keys, starting from the first with none missing.
For example, consider the schema
create table clustering (
x text,
k1 text,
k2 int,
k3 timestamp,
y text,
primary key (x, k1, k2, k3)
);
If you did the following inserts:
insert into clustering (x, k1, k2, k3, y) values ('x', 'a', 1, '2013-09-10 14:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'b', 1, '2013-09-10 13:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'a', 2, '2013-09-10 13:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'b', 1, '2013-09-10 14:00+0000', '1');
then they are stored in this order on disk (the order select
* from clustering where x = 'x'
returns):
x | k1 | k2 | k3 | y
---+----+----+--------------------------+---
x | a | 1 | 2013-09-10 14:00:00+0000 | 1
x | a | 2 | 2013-09-10 13:00:00+0000 | 1
x | b | 1 | 2013-09-10 13:00:00+0000 | 1
x | b | 1 | 2013-09-10 14:00:00+0000 | 1
k1
ordering
dominates, then k2
,
then k3
.
primary key決定了在哪個node上,cluster key 決定的是存儲的順序,而且是按照cluster key1, cluster key2, cluster key3 的順序來存儲的,所以上例子中,:
select * from clustering where x='x' and k1='a', 很容易查,但是select * from clustering where x='x' and k2='b',這個時候得先把k1=*查出來,然後再找k2='b'的,所以沒有意義了。