使用GUID作为主键的最佳实践是什么,特别是在性能方面?

本文翻译自:What are the best practices for using a GUID as a primary key, specifically regarding performance?

I have an application that uses GUID as the Primary Key in almost all tables and I have read that there are issues about performance when using GUID as Primary Key. 我有一个几乎在所有表中都将GUID用作主键的应用程序,并且我已阅读到将GUID用作主键时有关性能的问题。 Honestly, I haven't seen any problem, but I'm about to start a new application and I still want to use the GUIDs as the Primary Keys, but I was thinking of using a Composite Primary Key (The GUID and maybe another field.) 老实说,我还没有遇到任何问题,但是我将要启动一个新应用程序,并且我仍然想将GUID用作主键,但是我正在考虑使用复合主键(GUID以及其他领域) )

I'm using a GUID because they are nice and easy to manage when you have different environments such as "production", "test" and "dev" databases, and also for migration data between databases. 我使用GUID是因为当您具有不同的环境(例如“生产”,“测试”和“开发”数据库)以及在数据库之间进行数据迁移时,它们很容易管理。

I will use Entity Framework 4.3 and I want to assign the Guid in the application code, before inserting it in the database. 我将使用Entity Framework 4.3,然后在将其插入数据库之前在应用程序代码中分配Guid。 (ie I don't want to let SQL generate the Guid). (即,我不想让SQL生成Guid)。

What is the best practice for creating GUID-based Primary Keys, in order to avoid the supposed performance hits associated with this approach? 为了避免与该方法相关的假定性能损失,创建基于GUID的主键的最佳实践是什么?


#1楼

参考:https://stackoom.com/question/o5d6/使用GUID作为主键的最佳实践是什么-特别是在性能方面


#2楼

This link says it better than I could and helped in my decision making. 该链接比我说的更好,对我的决策有帮助。 I usually opt for an int as a primary key, unless I have a specific need not to and I also let SQL server auto-generate/maintain this field unless I have some specific reason not to. 我通常选择int作为主键,除非我有特殊需要,并且我还让SQL Server自动生成/维护该字段,除非出于某些特殊原因。 In reality, performance concerns need to be determined based on your specific app. 实际上,需要根据您的特定应用确定性能问题。 There are many factors at play here including but not limited to expected db size, proper indexing, efficient querying, and more. 这里有许多因素在起作用,包括但不限于预期的数据库大小,正确的索引编制,有效的查询等等。 Although people may disagree, I think in many scenarios you will not notice a difference with either option and you should choose what is more appropriate for your app and what allows you to develop easier, quicker, and more effectively (If you never complete the app what difference does the rest make :). 尽管人们可能会不同意,但我认为在许多情况下您不会注意到这两种选择的不同,您应该选择更适合您的应用程序的内容,以及允许您更轻松,更快,更有效地开发的内容(如果您从未完成过该应用程序的话)其余的有什么区别:)。

https://web.archive.org/web/20120812080710/http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html https://web.archive.org/web/20120812080710/http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html

PS I'm not sure why you would use a Composite PK or what benefit you believe that would give you. PS我不确定为什么要使用复合PK或您认为会给您带来什么好处。


#3楼

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. GUID似乎是您的主键的自然选择-如果确实需要,您可能会争辩说将其用于表的PRIMARY KEY。 What I'd strongly recommend not to do is use the GUID column as the clustering key , which SQL Server does by default, unless you specifically tell it not to. 我强烈建议您不要使用GUID列作为群集键 ,默认情况下,SQL Server 会这样做 ,除非您明确要求不要这样做

You really need to keep two issues apart: 您确实需要将两个问题分开:

  1. the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. 主键是一个逻辑结构-候选键之一,它唯一且可靠地标识表中的每一行。 This can be anything, really - an INT , a GUID , a string - pick what makes most sense for your scenario. 实际上,可以是任何东西INTGUID或字符串-选择最适合您的方案的东西。

  2. the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option. 群集键 (在表上定义“群集索引”的一列或多列)-这是与物理存储相关的事情,在这里,小型,稳定,不断增长的数据类型是您的最佳选择INTBIGINT作为默认选项。

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! 默认情况下,SQL Server表上的主键也用作群集键-但这不是必须的! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column. 当将以前的基于GUID的主键/集群键分解为两个单独的键-GUID上的主(逻辑)键和单独的INT IDENTITY(1,1)栏。

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance. 正如索引王后金伯利·特里普Kimberly Tripp)和其他人多次指出的那样,由于聚类键的GUID并不是最优的,因为它的随机性,它将导致大量的页面和索引碎片,并且通常会导致性能下降。

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so. 是的,我知道– SQL Server 2005及更高版本中有newsequentialid() ,但即使它不是真正且完全顺序的,因此也遭受了与GUID相同的问题-不太明显。

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. 然后还有另一个要考虑的问题:表上的集群键也将添加到表上每个非集群索引的每个条目中,因此,您真的要确保它尽可能小。 Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory. 通常,具有2+十亿行的INT应该足以容纳绝大多数表-与GUID作为集群键相比,您可以为磁盘和服务器内存节省数百兆的存储空间。

Quick calculation - using INT vs. GUID as Primary and Clustering Key: 快速计算-使用INT vs. GUID作为主键和聚类键:

  • Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB) 具有1'000'000行的基本表(3.8 MB与15.26 MB)
  • 6 nonclustered indexes (22.89 MB vs. 91.55 MB) 6个非聚集索引(22.89 MB与91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table! 总计:25 MB和106 MB-那就在一张桌子上!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! 再想一想-金伯利·特里普(Kimberly Tripp)的优秀著作-读它,再读一次,消化它! It's the SQL Server indexing gospel, really. 确实,这是SQL Server索引的福音。

PS: of course, if you're dealing with just a few hundred or a few thousand rows - most of these arguments won't really have much of an impact on you. PS:当然,如果您只处理几百行或几千行,那么这些参数中的大多数对您实际上没有太大影响。 However: if you get into the tens or hundreds of thousands of rows, or you start counting in millions - then those points become very crucial and very important to understand. 但是:如果您进入了成千上万的行,或者您开始​​数以百万计的行, 那么这些要点就变得非常关键,也非常重要。

Update: if you want to have your PKGUID column as your primary key (but not your clustering key), and another column MYINT ( INT IDENTITY ) as your clustering key - use this: 更新:如果要将PKGUID列作为主键(而不是集群键),而将另一列MYINTINT IDENTITY )作为集群键,请使用:

CREATE TABLE dbo.MyTable
(PKGUID UNIQUEIDENTIFIER NOT NULL,
 MyINT INT IDENTITY(1,1) NOT NULL,
 .... add more columns as needed ...... )

ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable
PRIMARY KEY NONCLUSTERED (PKGUID)

CREATE UNIQUE CLUSTERED INDEX CIX_MyTable ON dbo.MyTable(MyINT)

Basically: you just have to explicitly tell the PRIMARY KEY constraint that it's NONCLUSTERED (otherwise it's created as your clustered index, by default) - and then you create a second index that's defined as CLUSTERED 基本上:您只需要显式地告诉PRIMARY KEY约束它是NONCLUSTERED (否则默认情况下它是作为聚簇索引创建的)-然后创建另一个定义为CLUSTERED索引

This will work - and it's a valid option if you have an existing system that needs to be "re-engineered" for performance. 这将起作用-如果您需要对现有系统进行“重新设计”以提高性能,那么这是一个有效的选择。 For a new system, if you start from scratch, and you're not in a replication scenario, then I'd always pick ID INT IDENTITY(1,1) as my clustered primary key - much more efficient than anything else! 对于新系统,如果您是从头开始的,并且您不在复制场景中,那么我将始终选择ID INT IDENTITY(1,1)作为群集主键-比其他任何方式都效率更高!


#4楼

如果您使用GUID作为主键并创建聚簇索引,那么我建议为其使用默认值NEWSEQUENTIALID()


#5楼

I've been using GUIDs as PKs since 2005. In this distributed database world, it is absolutely the best way to merge distributed data. 自2005年以来,我一直将GUID用作PK。在这个分布式数据库世界中,这绝对是合并分布式数据的最佳方法。 You can fire and forget merge tables without all the worry of ints matching across joined tables. 您可以解雇合并表,而不必担心合并表之间的整数匹配。 GUIDs joins can be copied without any worry. 可以轻松复制GUID联接。

This is my setup for using GUIDs: 这是我使用GUID的设置:

  1. PK = GUID. PK = GUID。 GUIDs are indexed similar to strings, so high row tables (over 50 million records) may need table partitioning or other performance techniques. GUID的索引类似于字符串,因此高行表(超过5000万条记录)可能需要表分区或其他性能技术。 SQL Server is getting extremely efficient, so performance concerns are less and less applicable. SQL Server变得异常高效,因此对性能的关注越来越少。

  2. PK Guid is NON-Clustered index. PK Guid是非聚集索引。 Never cluster index a GUID unless it is NewSequentialID. 除非它是NewSequentialID,否则切勿对GUID建立索引。 But even then, a server reboot will cause major breaks in ordering. 但是即使那样,服务器重新启动也将导致订单严重中断。

  3. Add ClusterID Int to every table. 将ClusterID Int添加到每个表。 This is your CLUSTERED Index... that orders your table. 这是您的聚集索引...,可以对您的表进行排序。

  4. Joining on ClusterIDs (int) is more efficient, but I work with 20-30 million record tables, so joining on GUIDs doesn't visibly affect performance. 加入ClusterID(int)效率更高,但是我使用20-30百万个记录表,因此加入GUID不会明显影响性能。 If you want max performance, use the ClusterID concept as your primary key & join on ClusterID. 如果要获得最佳性能,请使用ClusterID概念作为主键并加入ClusterID。

Here is my Email table... 这是我的电子邮件表格...

CREATE TABLE [Core].[Email] (
    [EmailID]      UNIQUEIDENTIFIER CONSTRAINT [DF_Email_EmailID] DEFAULT (newsequentialid()) NOT NULL,        
    [EmailAddress] NVARCHAR (50)    CONSTRAINT [DF_Email_EmailAddress] DEFAULT ('') NOT NULL,        
    [CreatedDate]  DATETIME         CONSTRAINT [DF_Email_CreatedDate] DEFAULT (getutcdate()) NOT NULL,      
    [ClusterID] INT NOT NULL IDENTITY,
    CONSTRAINT [PK_Email] PRIMARY KEY NonCLUSTERED ([EmailID] ASC)
);
GO

CREATE UNIQUE CLUSTERED INDEX [IX_Email_ClusterID] ON [Core].[Email] ([ClusterID])
GO

CREATE UNIQUE NONCLUSTERED INDEX [IX_Email_EmailAddress] ON [Core].[Email] ([EmailAddress] Asc)

#6楼

I am currently developing an web application with EF Core and here is the pattern I use : 我目前正在使用EF Core开发Web应用程序,这是我使用的模式:

All my classes (tables) and an int PK and FK. 我所有的课程(表)以及一个INT PK和FK。 I have got a additional column with the type Guid (generated by the c# constructor) with a non clustered index on it. 我还有一个类型为Guid的附加列(由c#构造函数生成),上面带有非聚集索引。

All the joins of table within EF is managed through the int keys while all the access from outside (controllers) are done with the Guids. EF中表的所有联接都是通过int键进行管理的,而外部(控制器)的所有访问均由Guid完成。

This solution allows to not show the int keys on urls but keep the model tidy and fast. 该解决方案允许不显示URL上的int键,但保持模型整洁和快速。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章