Cassandra中分區鍵,複合鍵和聚類鍵之間的區別?

本文翻譯自:Difference between partition key, composite key and clustering key in Cassandra?

I have been reading articles around the net to understand the differences between the following key types. 我一直在閱讀網絡上的文章,以瞭解以下key類型之間的差異。 But it just seems hard for me to grasp. 但這對我來說似乎很難掌握。 Examples will definitely help make understanding better. 實例肯定有助於更好地理解。

primary key,
partition key, 
composite key 
clustering key

#1樓

參考:https://stackoom.com/question/1ggY8/Cassandra中分區鍵-複合鍵和聚類鍵之間的區別


#2樓

There is a lot of confusion around this, I will try to make it as simple as possible. 圍繞這個有很多困惑,我會盡量讓它變得簡單。

The primary key is a general concept to indicate one or more columns used to retrieve data from a Table. 主鍵是一般概念,用於指示用於從表中檢索數據的一個或多個列。

The primary key may be SIMPLE and even declared inline: 主鍵可以是SIMPLE ,甚至可以內聯聲明:

 create table stackoverflow_simple (
      key text PRIMARY KEY,
      data text      
  );

That means that it is made by a single column. 這意味着它是由一個列組成的。

But the primary key can also be COMPOSITE (aka COMPOUND ), generated from more columns. 但主鍵也可以是COMPOSITE (又名COMPOUND ),由更多列生成。

 create table stackoverflow_composite (
      key_part_one text,
      key_part_two int,
      data text,
      PRIMARY KEY(key_part_one, key_part_two)      
  );

In a situation of COMPOSITE primary key, the "first part" of the key is called PARTITION KEY (in this example key_part_one is the partition key) and the second part of the key is the CLUSTERING KEY (in this example key_part_two ) COMPOSITE主鍵的情況下,鍵的“第一部分”稱爲PARTITION KEY (在此示例中, key_part_one是分區鍵),鍵的第二部分是CLUSTERING KEY (在此示例中爲key_part_two

Please note that the both partition and clustering key can be made by more columns , here's how: 請注意,分區和羣集鍵都可以由更多列創建 ,具體方法如下:

 create table stackoverflow_multiple (
      k_part_one text,
      k_part_two int,
      k_clust_one text,
      k_clust_two int,
      k_clust_three uuid,
      data text,
      PRIMARY KEY((k_part_one, k_part_two), k_clust_one, k_clust_two, k_clust_three)      
  );

Behind these names ... 這些名字背後......

  • The Partition Key is responsible for data distribution across your nodes. 分區鍵負責跨節點的數據分發。
  • The Clustering Key is responsible for data sorting within the partition. Clustering Key負責分區內的數據排序。
  • The Primary Key is equivalent to the Partition Key in a single-field-key table (ie Simple ). 主鍵等效於單字段鍵表中的分區鍵 (即簡單 )。
  • The Composite/Compound Key is just any multiple-column key 複合/複合鍵只是任何多列鍵

Further usage information: DATASTAX DOCUMENTATION 進一步的使用信息: DATASTAX文件


Small usage and content examples 小用法和內容示例
SIMPLE KEY: 簡單的關鍵:

 insert into stackoverflow_simple (key, data) VALUES ('han', 'solo'); select * from stackoverflow_simple where key='han'; 

table content 表格內容

 key | data ----+------ han | solo 

COMPOSITE/COMPOUND KEY can retrieve "wide rows" (ie you can query by just the partition key, even if you have clustering keys defined) COMPOSITE / COMPOUND KEY可以檢索“寬行”(即,您可以只通過分區鍵進行查詢,即使您已定義了聚類鍵)

 insert into stackoverflow_composite (key_part_one, key_part_two, data) VALUES ('ronaldo', 9, 'football player'); insert into stackoverflow_composite (key_part_one, key_part_two, data) VALUES ('ronaldo', 10, 'ex-football player'); select * from stackoverflow_composite where key_part_one = 'ronaldo'; 

table content 表格內容

  key_part_one | key_part_two | data --------------+--------------+-------------------- ronaldo | 9 | football player ronaldo | 10 | ex-football player 

But you can query with all key (both partition and clustering) ... 但您可以使用所有密鑰(分區和羣集)查詢...

 select * from stackoverflow_composite where key_part_one = 'ronaldo' and key_part_two = 10; 

query output 查詢輸出

  key_part_one | key_part_two | data --------------+--------------+-------------------- ronaldo | 10 | ex-football player 

Important note: the partition key is the minimum-specifier needed to perform a query using a where clause . 重要說明:分區鍵是使用where clause執行查詢所需的最小說明符。 If you have a composite partition key, like the following 如果您有複合分區鍵,如下所示

eg: PRIMARY KEY((col1, col2), col10, col4)) 例如: PRIMARY KEY((col1, col2), col10, col4))

You can perform query only by passing at least both col1 and col2, these are the 2 columns that define the partition key. 您只能通過至少傳遞col1和col2來執行查詢,這些是定義分區鍵的2列。 The "general" rule to make query is you have to pass at least all partition key columns, then you can add optionally each clustering key in the order they're set. 要進行查詢的“常規”規則是您必須至少傳遞所有分區鍵列,然後您可以按照它們設置的順序可選地添加每個羣集鍵。

so the valid queries are ( excluding secondary indexes ) 所以有效的查詢是( 不包括二級索引

  • col1 and col2 col1和col2
  • col1 and col2 and col10 col1和col2和col10
  • col1 and col2 and col10 and col 4 col1和col2和col10和col 4

Invalid: 無效:

  • col1 and col2 and col4 col1和col2和col4
  • anything that does not contain both col1 and col2 任何不包含col1和col2的東西

Hope this helps. 希望這可以幫助。


#3樓

In cassandra , the difference between primary key,partition key,composite key, clustering key always makes some confusion.. So I am going to explain below and co relate to each others. 在cassandra中,主鍵,分區鍵,複合鍵,聚類鍵之間的區別總是會產生一些混亂。所以我將在下面解釋並與其他人聯繫起來。 We use CQL (Cassandra Query Language) for Cassandra database access. 我們使用CQL(Cassandra查詢語言)進行Cassandra數據庫訪問。 Note:- Answer is as per updated version of Cassandra. 注意: - 答案是根據Cassandra的更新版本。 Primary Key :- 首要的關鍵 :-

In cassandra there are 2 different way to use primary Key . 在cassandra中有兩種不同的方式來使用主鍵。

CREATE TABLE Cass (
    id int PRIMARY KEY,
    name text 
);

Create Table Cass (
   id int,
   name text,
   PRIMARY KEY(id) 
);

In CQL, the order in which columns are defined for the PRIMARY KEY matters. 在CQL中,爲PRIMARY KEY定義列的順序很重要。 The first column of the key is called the partition key having property that all the rows sharing the same partition key (even across table in fact) are stored on the same physical node. 密鑰的第一列稱爲分區密鑰,其具有共享相同分區密鑰(實際上甚至跨表)的所有行存儲在同一物理節點上的屬性。 Also, insertion/update/deletion on rows sharing the same partition key for a given table are performed atomically and in isolation. 此外,對於給定表共享相同分區鍵的行上的插入/更新/刪除是以原子方式單獨執行的。 Note that it is possible to have a composite partition key, ie a partition key formed of multiple columns, using an extra set of parentheses to define which columns forms the partition key. 請注意,可以使用複合分區鍵,即由多列組成的分區鍵,使用一組額外的括號來定義哪些列構成分區鍵。

Partitioning and Clustering The PRIMARY KEY definition is made up of two parts: the Partition Key and the Clustering Columns. 分區和集羣 PRIMARY KEY定義由兩部分組成:分區鍵和聚類列。 The first part maps to the storage engine row key, while the second is used to group columns in a row. 第一部分映射到存儲引擎行鍵,而第二部分用於對一行中的列進行分組。

CREATE TABLE device_check (
  device_id   int,
  checked_at  timestamp,
  is_power    boolean,
  is_locked   boolean,
  PRIMARY KEY (device_id, checked_at)
);

Here device_id is partition key and checked_at is cluster_key. 這裏device_id是分區鍵,checked_at是cluster_key。

We can have multiple cluster key as well as partition key too which depends on declaration. 我們可以有多個集羣密鑰以及依賴於聲明的分區密鑰。


#4樓

Adding a summary answer as the accepted one is quite long. 添加摘要答案作爲已接受的答案很長。 The terms "row" and "column" are used in the context of CQL, not how Cassandra is actually implemented. 術語“行”和“列”在CQL的上下文中使用,而不是如何實際實現Cassandra。

  • A primary key uniquely identifies a row. 主鍵唯一標識一行。
  • A composite key is a key formed from multiple columns. 複合鍵是由多列組成的鍵
  • A partition key is the primary lookup to find a set of rows, ie a partition. 分區鍵是查找一組行的主查找,即分區。
  • A clustering key is the part of the primary key that isn't the partition key (and defines the ordering within a partition). 集羣密鑰是主密鑰的一部分,它不是分區密鑰(並定義分區內的順序)。

Examples: 例子:

  • PRIMARY KEY (a) : The partition key is a . PRIMARY KEY (a) :分區鍵是a
  • PRIMARY KEY (a, b) : The partition key is a , the clustering key is b . PRIMARY KEY (a, b) :分區鍵是a ,聚類鍵是b
  • PRIMARY KEY ((a, b)) : The composite partition key is (a, b) . PRIMARY KEY ((a, b)) :複合分區鍵是(a, b)
  • PRIMARY KEY (a, b, c) : The partition key is a , the composite clustering key is (b, c) . PRIMARY KEY (a, b, c) :分區鍵是a ,複合簇密鑰是(b, c)
  • PRIMARY KEY ((a, b), c) : The composite partition key is (a, b) , the clustering key is c . PRIMARY KEY ((a, b), c) :複合分區鍵是(a, b) ,聚類鍵是c
  • PRIMARY KEY ((a, b), c, d) : The composite partition key is (a, b) , the composite clustering key is (c, d) . PRIMARY KEY ((a, b), c, d) :複合分區鍵是(a, b) ,複合聚類鍵是(c, d)

#5樓

In database design, a compound key is a set of superkeys that is not minimal. 在數據庫設計中,複合鍵是一組非最小的超級鍵。

A composite key is a set that contains a compound key and at least one attribute that is not a superkey 複合鍵是一個包含複合鍵和至少一個不是超級鍵的屬性的集合

Given table: EMPLOYEES {employee_id, firstname, surname} 給定表:EMPLOYEES {employee_id,firstname,surname}

Possible superkeys are: 可能的超級鍵是:

{employee_id}
{employee_id, firstname}
{employee_id, firstname, surname}

{employee_id} is the only minimal superkey, which also makes it the only candidate key--given that {firstname} and {surname} do not guarantee uniqueness. {employee_id}是唯一的最小超級密鑰,它也是唯一的候選密鑰 - 假設{firstname}和{surname}不保證唯一性。 Since a primary key is defined as a chosen candidate key, and only one candidate key exists in this example, {employee_id} is the minimal superkey, the only candidate key, and the only possible primary key. 由於主鍵被定義爲所選擇的候選鍵,並且在該示例中僅存在一個候選鍵,因此{employee_id}是最小超級鍵,唯一候選鍵和唯一可能的主鍵。

The exhaustive list of compound keys is: 複合鍵的詳盡列表是:

{employee_id, firstname}
{employee_id, surname}
{employee_id, firstname, surname}

The only composite key is {employee_id, firstname, surname} since that key contains a compound key ({employee_id,firstname}) and an attribute that is not a superkey ({surname}). 唯一的組合鍵是{employee_id,firstname,surname},因爲該鍵包含複合鍵({employee_id,firstname})和不是超級鍵({surname})的屬性。


#6樓

Primary Key : Is composed of partition key(s) [and optional clustering keys(or columns)] 主鍵 :由分區鍵[和可選的聚類鍵(或列)組成]
Partition Key : The hash value of Partition key is used to determine the specific node in a cluster to store the data 分區鍵分區鍵的哈希值用於確定羣集中的特定節點以存儲數據
Clustering Key : Is used to sort the data in each of the partitions(or responsible node and it's replicas) 羣集密鑰 :用於對每個分區(或負責節點及其副本)中的數據進行排序

Compound Primary Key : As said above, the clustering keys are optional in a Primary Key. 複合主鍵 :如上所述,聚類鍵在主鍵中是可選的。 If they aren't mentioned, it's a simple primary key. 如果沒有提到它們,它就是一個簡單的主鍵。 If clustering keys are mentioned, it's a Compound primary key. 如果提到了聚類鍵,則它是複合主鍵。

Composite Partition Key : Using just one column as a partition key, might result in wide row issues (depends on use case/data modeling). 複合分區鍵 :僅使用一列作爲分區鍵,可能會導致廣泛的行問題 (取決於用例/數據建模)。 Hence the partition key is sometimes specified as a combination of more than one column. 因此,分區鍵有時被指定爲多個列的組合。

Regarding confusion of which one is mandatory , which one can be skipped etc. in a query, trying to imagine Cassandra as a giant HashMap helps. 關於哪一個是強制性的混淆 ,哪一個可以在查詢中跳過等等,試圖將Cassandra想象成一個巨大的HashMap有幫助。 So in a HashMap, you can't retrieve the values without the Key. 因此,在HashMap中,如果沒有Key,則無法檢索值。
Here, the Partition keys play the role of that key. 這裏, 分區鍵起到該的作用。 So each query needs to have them specified. 因此每個查詢都需要指定它們。 Without which Cassandra won't know which node to search for. 沒有它,Cassandra將不知道要搜索哪個節點。
The clustering keys (columns, which are optional) help in further narrowing your query search after Cassandra finds out the specific node(and it's replicas) responsible for that specific Partition key . 在Cassandra找到負責該特定分區鍵的特定節點(及其副本)之後, 羣集鍵 (列是可選的)有助於進一步縮小查詢搜索範圍。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章