Why use the INCLUDE clause when creating an index?

Why use the INCLUDE clause when creating an index?

問題

While studying for the 70-433 exam I noticed you can create a covering index in one of the following two ways.

CREATE INDEX idx1 ON MyTable (Col1, Col2, Col3)
-- OR --

CREATE INDEX idx1 ON MyTable (Col1) INCLUDE (Col2, Col3)
The INCLUDE clause is new to me. Why would you use it and what guidelines would you suggest in determining whether to create a covering index with or without the INCLUDE clause?

 

回答1

If the column is not in the WHERE/JOIN/GROUP BY/ORDER BY, but only in the column list in the SELECT clause is where you use INCLUDE.

The INCLUDE clause adds the data at the lowest/leaf level, rather than in the index tree. This makes the index smaller because it's not part of the tree

INCLUDE columns are not key columns in the index, so they are not ordered. This means it isn't really useful for predicates, sorting etc as I mentioned above. However, it may be useful if you have a residual lookup in a few rows from the key column(s)

Another MSDN article with a worked example

 

回答2

You would use the INCLUDE to add one or more columns to the leaf level of a non-clustered index, if by doing so, you can "cover" your queries.

Imagine you need to query for an employee's ID, department ID, and lastname.

SELECT EmployeeID, DepartmentID, LastName
FROM Employee
WHERE DepartmentID = 5
If you happen to have a non-clustered index on (EmployeeID, DepartmentID), once you find the employees for a given department, you now have to do "bookmark lookup" to get the actual full employee record, just to get the lastname column. That can get pretty expensive in terms of performance, if you find a lot of employees.

If you had included that lastname in your index:

CREATE NONCLUSTERED INDEX NC_EmpDep 
  ON Employee(EmployeeID, DepartmentID)
  INCLUDE (Lastname)
then all the information you need is available in the leaf level of the non-clustered index. Just by seeking in the non-clustered index and finding your employees for a given department, you have all the necessary information, and the bookmark lookup for each employee found in the index is no longer necessary --> you save a lot of time.

Obviously, you cannot include every column in every non-clustered index - but if you do have queries which are missing just one or two columns to be "covered" (and that get used a lot), it can be very helpful to INCLUDE those into a suitable non-clustered index.

 

回答3

This discussion is missing out on the important point: The question is not if the "non-key-columns" are better to include as index-columns or as included-columns.

The question is how expensive it is to use the include-mechanism to include columns that are not really needed in index? (typically not part of where-clauses, but often included in selects). So your dilemma is always:

Use index on id1, id2 ... idN alone or
Use index on id1, id2 ... idN plus include col1, col2 ... colN
Where: id1, id2 ... idN are columns often used in restrictions and col1, col2 ... colN are columns often selected, but typically not used in restrictions

(The option to include all of these columns as part of the index-key is just always silly (unless they are also used in restrictions) - cause it would always be more expensive to maintain since the index must be updated and sorted even when the "keys" have not changed).

So use option 1 or 2?

Answer: If your table is rarely updated - mostly inserted into/deleted from - then it is relatively inexpensive to use the include-mechanism to include some "hot columns" (that are often used in selects - but not often used on restrictions) since inserts/deletes require the index to be updated/sorted anyway and thus little extra overhead is associated with storing off a few extra columns while already updating the index. The overhead is the extra memory and CPU used to store redundant info on the index.

If the columns you consider to add as included-columns are often updated (without the index-key-columns being updated) - or - if it is so many of them that the index becomes close to a copy of your table - use option 1 I'd suggest! Also if adding certain include-column(s) turns out to make no performance-difference - you might want to skip the idea of adding them:) Verify that they are useful!

The average number of rows per same values in keys (id1, id2 ... idN) can be of some importance as well.

Notice that if a column - that is added as an included-column of index - is used in the restriction: As long as the index as such can be used (based on restriction against index-key-columns) - then SQL Server is matching the column-restriction against the index (leaf-node-values) instead of going the expensive way around the table itself.

 

SQL Server 數據庫索引的“包含列”是什麼意思?

索引中,有一項內容是“包含列”,這起什麼作用呢?

假設表有 A、B、C 列,A 列是索引,沒有包含列。

那麼當使用索引 A 找到記錄後,會有一個對應關係,去對應 B、C 的內容。

這個去對應 B、C 的內容就花時間了。

假如將 B、C 納入“包含列”

就不用去對應了,就直接將在 A 的旁邊,將 B、C 返回就是了。

優點

就是如上,少個對應關係,速度快。別說,在某些場景中,提高 80% 的效率不在話下。

缺點

當然是將索引搞大了呀,原來索引頁光存 A,現在要存 A、B、C 了。

包含列和把 B、C 也納入索引,有什麼區別?

有區別的,如果不納入索引,它僅僅是躺在 A 的旁邊,A 怎麼排序,它就怎麼走。如果把 B、C 也納入索引,它也參與排序了,它的值的變動,會影響其在索引頁中的位置。

怎麼設置“包含列”

可視化設計的話,就在下方有一個包含的列,多個使用逗號隔開。

語法是 include:INCLUDE([Field1],[Field1])

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章