使用GUID作爲主鍵的最佳實踐是什麼,特別是在性能方面?

本文翻譯自:What are the best practices for using a GUID as a primary key, specifically regarding performance?

I have an application that uses GUID as the Primary Key in almost all tables and I have read that there are issues about performance when using GUID as Primary Key. 我有一個幾乎在所有表中都將GUID用作主鍵的應用程序,並且我已閱讀到將GUID用作主鍵時有關性能的問題。 Honestly, I haven't seen any problem, but I'm about to start a new application and I still want to use the GUIDs as the Primary Keys, but I was thinking of using a Composite Primary Key (The GUID and maybe another field.) 老實說,我還沒有遇到任何問題,但是我將要啓動一個新應用程序,並且我仍然想將GUID用作主鍵,但是我正在考慮使用複合主鍵(GUID以及其他領域) )

I'm using a GUID because they are nice and easy to manage when you have different environments such as "production", "test" and "dev" databases, and also for migration data between databases. 我使用GUID是因爲當您具有不同的環境(例如“生產”,“測試”和“開發”數據庫)以及在數據庫之間進行數據遷移時,它們很容易管理。

I will use Entity Framework 4.3 and I want to assign the Guid in the application code, before inserting it in the database. 我將使用Entity Framework 4.3,然後在將其插入數據庫之前在應用程序代碼中分配Guid。 (ie I don't want to let SQL generate the Guid). (即,我不想讓SQL生成Guid)。

What is the best practice for creating GUID-based Primary Keys, in order to avoid the supposed performance hits associated with this approach? 爲了避免與該方法相關的假定性能損失,創建基於GUID的主鍵的最佳實踐是什麼?


#1樓

參考:https://stackoom.com/question/o5d6/使用GUID作爲主鍵的最佳實踐是什麼-特別是在性能方面


#2樓

This link says it better than I could and helped in my decision making. 該鏈接比我說的更好,對我的決策有幫助。 I usually opt for an int as a primary key, unless I have a specific need not to and I also let SQL server auto-generate/maintain this field unless I have some specific reason not to. 我通常選擇int作爲主鍵,除非我有特殊需要,並且我還讓SQL Server自動生成/維護該字段,除非出於某些特殊原因。 In reality, performance concerns need to be determined based on your specific app. 實際上,需要根據您的特定應用確定性能問題。 There are many factors at play here including but not limited to expected db size, proper indexing, efficient querying, and more. 這裏有許多因素在起作用,包括但不限於預期的數據庫大小,正確的索引編制,有效的查詢等等。 Although people may disagree, I think in many scenarios you will not notice a difference with either option and you should choose what is more appropriate for your app and what allows you to develop easier, quicker, and more effectively (If you never complete the app what difference does the rest make :). 儘管人們可能會不同意,但我認爲在許多情況下您不會注意到這兩種選擇的不同,您應該選擇更適合您的應用程序的內容,以及允許您更輕鬆,更快,更有效地開發的內容(如果您從未完成過該應用程序的話)其餘的有什麼區別:)。

https://web.archive.org/web/20120812080710/http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html https://web.archive.org/web/20120812080710/http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html

PS I'm not sure why you would use a Composite PK or what benefit you believe that would give you. PS我不確定爲什麼要使用複合PK或您認爲會給您帶來什麼好處。


#3樓

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. GUID似乎是您的主鍵的自然選擇-如果確實需要,您可能會爭辯說將其用於表的PRIMARY KEY。 What I'd strongly recommend not to do is use the GUID column as the clustering key , which SQL Server does by default, unless you specifically tell it not to. 我強烈建議您不要使用GUID列作爲羣集鍵 ,默認情況下,SQL Server 會這樣做 ,除非您明確要求不要這樣做

You really need to keep two issues apart: 您確實需要將兩個問題分開:

  1. the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. 主鍵是一個邏輯結構-候選鍵之一,它唯一且可靠地標識表中的每一行。 This can be anything, really - an INT , a GUID , a string - pick what makes most sense for your scenario. 實際上,可以是任何東西INTGUID或字符串-選擇最適合您的方案的東西。

  2. the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option. 羣集鍵 (在表上定義“羣集索引”的一列或多列)-這是與物理存儲相關的事情,在這裏,小型,穩定,不斷增長的數據類型是您的最佳選擇INTBIGINT作爲默認選項。

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! 默認情況下,SQL Server表上的主鍵也用作羣集鍵-但這不是必須的! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column. 當將以前的基於GUID的主鍵/集羣鍵分解爲兩個單獨的鍵-GUID上的主(邏輯)鍵和單獨的INT IDENTITY(1,1)欄。

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance. 正如索引王后金伯利·特里普Kimberly Tripp)和其他人多次指出的那樣,由於聚類鍵的GUID並不是最優的,因爲它的隨機性,它將導致大量的頁面和索引碎片,並且通常會導致性能下降。

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so. 是的,我知道– SQL Server 2005及更高版本中有newsequentialid() ,但即使它不是真正且完全順序的,因此也遭受了與GUID相同的問題-不太明顯。

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. 然後還有另一個要考慮的問題:表上的集羣鍵也將添加到表上每個非集羣索引的每個條目中,因此,您真的要確保它儘可能小。 Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory. 通常,具有2+十億行的INT應該足以容納絕大多數表-與GUID作爲集羣鍵相比,您可以爲磁盤和服務器內存節省數百兆的存儲空間。

Quick calculation - using INT vs. GUID as Primary and Clustering Key: 快速計算-使用INT vs. GUID作爲主鍵和聚類鍵:

  • Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB) 具有1'000'000行的基本表(3.8 MB與15.26 MB)
  • 6 nonclustered indexes (22.89 MB vs. 91.55 MB) 6個非聚集索引(22.89 MB與91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table! 總計:25 MB和106 MB-那就在一張桌子上!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! 再想一想-金伯利·特里普(Kimberly Tripp)的優秀著作-讀它,再讀一次,消化它! It's the SQL Server indexing gospel, really. 確實,這是SQL Server索引的福音。

PS: of course, if you're dealing with just a few hundred or a few thousand rows - most of these arguments won't really have much of an impact on you. PS:當然,如果您只處理幾百行或幾千行,那麼這些參數中的大多數對您實際上沒有太大影響。 However: if you get into the tens or hundreds of thousands of rows, or you start counting in millions - then those points become very crucial and very important to understand. 但是:如果您進入了成千上萬的行,或者您開始​​數以百萬計的行, 那麼這些要點就變得非常關鍵,也非常重要。

Update: if you want to have your PKGUID column as your primary key (but not your clustering key), and another column MYINT ( INT IDENTITY ) as your clustering key - use this: 更新:如果要將PKGUID列作爲主鍵(而不是集羣鍵),而將另一列MYINTINT IDENTITY )作爲集羣鍵,請使用:

CREATE TABLE dbo.MyTable
(PKGUID UNIQUEIDENTIFIER NOT NULL,
 MyINT INT IDENTITY(1,1) NOT NULL,
 .... add more columns as needed ...... )

ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable
PRIMARY KEY NONCLUSTERED (PKGUID)

CREATE UNIQUE CLUSTERED INDEX CIX_MyTable ON dbo.MyTable(MyINT)

Basically: you just have to explicitly tell the PRIMARY KEY constraint that it's NONCLUSTERED (otherwise it's created as your clustered index, by default) - and then you create a second index that's defined as CLUSTERED 基本上:您只需要顯式地告訴PRIMARY KEY約束它是NONCLUSTERED (否則默認情況下它是作爲聚簇索引創建的)-然後創建另一個定義爲CLUSTERED索引

This will work - and it's a valid option if you have an existing system that needs to be "re-engineered" for performance. 這將起作用-如果您需要對現有系統進行“重新設計”以提高性能,那麼這是一個有效的選擇。 For a new system, if you start from scratch, and you're not in a replication scenario, then I'd always pick ID INT IDENTITY(1,1) as my clustered primary key - much more efficient than anything else! 對於新系統,如果您是從頭開始的,並且您不在複製場景中,那麼我將始終選擇ID INT IDENTITY(1,1)作爲羣集主鍵-比其他任何方式都效率更高!


#4樓

如果您使用GUID作爲主鍵並創建聚簇索引,那麼我建議爲其使用默認值NEWSEQUENTIALID()


#5樓

I've been using GUIDs as PKs since 2005. In this distributed database world, it is absolutely the best way to merge distributed data. 自2005年以來,我一直將GUID用作PK。在這個分佈式數據庫世界中,這絕對是合併分佈式數據的最佳方法。 You can fire and forget merge tables without all the worry of ints matching across joined tables. 您可以解僱合併表,而不必擔心合併表之間的整數匹配。 GUIDs joins can be copied without any worry. 可以輕鬆複製GUID聯接。

This is my setup for using GUIDs: 這是我使用GUID的設置:

  1. PK = GUID. PK = GUID。 GUIDs are indexed similar to strings, so high row tables (over 50 million records) may need table partitioning or other performance techniques. GUID的索引類似於字符串,因此高行表(超過5000萬條記錄)可能需要表分區或其他性能技術。 SQL Server is getting extremely efficient, so performance concerns are less and less applicable. SQL Server變得異常高效,因此對性能的關注越來越少。

  2. PK Guid is NON-Clustered index. PK Guid是非聚集索引。 Never cluster index a GUID unless it is NewSequentialID. 除非它是NewSequentialID,否則切勿對GUID建立索引。 But even then, a server reboot will cause major breaks in ordering. 但是即使那樣,服務器重新啓動也將導致訂單嚴重中斷。

  3. Add ClusterID Int to every table. 將ClusterID Int添加到每個表。 This is your CLUSTERED Index... that orders your table. 這是您的聚集索引...,可以對您的表進行排序。

  4. Joining on ClusterIDs (int) is more efficient, but I work with 20-30 million record tables, so joining on GUIDs doesn't visibly affect performance. 加入ClusterID(int)效率更高,但是我使用20-30百萬個記錄表,因此加入GUID不會明顯影響性能。 If you want max performance, use the ClusterID concept as your primary key & join on ClusterID. 如果要獲得最佳性能,請使用ClusterID概念作爲主鍵並加入ClusterID。

Here is my Email table... 這是我的電子郵件表格...

CREATE TABLE [Core].[Email] (
    [EmailID]      UNIQUEIDENTIFIER CONSTRAINT [DF_Email_EmailID] DEFAULT (newsequentialid()) NOT NULL,        
    [EmailAddress] NVARCHAR (50)    CONSTRAINT [DF_Email_EmailAddress] DEFAULT ('') NOT NULL,        
    [CreatedDate]  DATETIME         CONSTRAINT [DF_Email_CreatedDate] DEFAULT (getutcdate()) NOT NULL,      
    [ClusterID] INT NOT NULL IDENTITY,
    CONSTRAINT [PK_Email] PRIMARY KEY NonCLUSTERED ([EmailID] ASC)
);
GO

CREATE UNIQUE CLUSTERED INDEX [IX_Email_ClusterID] ON [Core].[Email] ([ClusterID])
GO

CREATE UNIQUE NONCLUSTERED INDEX [IX_Email_EmailAddress] ON [Core].[Email] ([EmailAddress] Asc)

#6樓

I am currently developing an web application with EF Core and here is the pattern I use : 我目前正在使用EF Core開發Web應用程序,這是我使用的模式:

All my classes (tables) and an int PK and FK. 我所有的課程(表)以及一個INT PK和FK。 I have got a additional column with the type Guid (generated by the c# constructor) with a non clustered index on it. 我還有一個類型爲Guid的附加列(由c#構造函數生成),上面帶有非聚集索引。

All the joins of table within EF is managed through the int keys while all the access from outside (controllers) are done with the Guids. EF中表的所有聯接都是通過int鍵進行管理的,而外部(控制器)的所有訪問均由Guid完成。

This solution allows to not show the int keys on urls but keep the model tidy and fast. 該解決方案允許不顯示URL上的int鍵,但保持模型整潔和快速。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章