Hi,
Thanks for the reply. One of the article
(http://www.sql-server-performance.c...red_indexes.asp) was the exact
one that confuses me.
It mentions...
"As a rule of thumb, every table should have a clustered index. Generally,
but not always, the clustered index should be on a column that monotonically
increases--such as an identity column, or some other column where the value
is increasing--and is unique. In many cases, the primary key is the ideal
column for a clustered index."
Which in my understanding basically says... Primary key should use clustered
index.
But in another paragraph...
"The primary key you select for your table should not always be a clustered
index. If you create the primary key and don't specify otherwise, this is th
e
default. Only make the primary key a clustered index if you will be regularl
y
performing range queries on the primary key or need your results sorted by
the primary key. [6.5, 7.0, 2000] Updated 3-5-2004"
They sound contradictory to me.
And on (http://www.sql-server-performance.c...red_indexes.asp), it
states
"If a column in a table is not at least 95% unique, then most likely the
query optimizer will not use a non-clustered index based on that column.
Because of this, don't add non-clustered indexes to columns that aren't at
least 95% unique. For example, a column with "yes" or "no" as the data won't
be at least 95% unique. [6.5, 7.0, 2000] Updated 3-4-2004"
Which sounds more logical to me.
1) Do you have general rule of thumb on this matter:
- clustered or non-clustered index for primary key?
- clustered or non-clustered index for foreign key?
2) Also does my understanding of the functions of clustered and
non-clustered index correct?
Could u please kindly advise. TQ again.
"Uri Dimant" wrote:
> Hi
> This is very,very big and important issue.
> Start with
> http://www.sql-server-performance.c...red_indexes.asp
> http://www.sql-server-performance.c...red_indexes.asp
> http://www.sql-server-performance.c...ing_indexes.asp
> "HardKhor" <HardKhor@.discussions.microsoft.com> wrote in message
> news:BABFB6D4-09F7-482B-89F9-DF10DFCF1242@.microsoft.com...
> the
> to
> clustered
> So
> column.
>
>1) Do you have general rule of thumb on this matter:
- clustered or non-clustered index for primary key?
- clustered or non-clustered index for foreign key?
It depends on your tables, data, and business requirements. In my opinion CI
is more efficient on a range queries as BETWEEN and greater,lower that
returns a set of data.NCI is more effcient to retrieve a few rows or single
row from the table.
You have to choose what column to be defined as primary key and on what
column create CI.
Yes, it is considered a good practice to create a NCI on foreign key to
improve JOIN performance.
2) Also does my understanding of the functions of clustered and
non-clustered index correct?
> Which in my understanding basically says... Primary key should use
clustered
> index.
Up to you.
"HardKhor" <HardKhor@.discussions.microsoft.com> wrote in message
news:05A5C898-E9AD-4B0F-9F43-E0E0BA75A9EA@.microsoft.com...
> Hi,
> Thanks for the reply. One of the article
> (http://www.sql-server-performance.c...red_indexes.asp) was the
exact
> one that confuses me.
> It mentions...
> "As a rule of thumb, every table should have a clustered index. Generally,
> but not always, the clustered index should be on a column that
monotonically
> increases--such as an identity column, or some other column where the
value
> is increasing--and is unique. In many cases, the primary key is the ideal
> column for a clustered index."
> Which in my understanding basically says... Primary key should use
clustered
> index.
> But in another paragraph...
> "The primary key you select for your table should not always be a
clustered
> index. If you create the primary key and don't specify otherwise, this is
the
> default. Only make the primary key a clustered index if you will be
regularly
> performing range queries on the primary key or need your results sorted by
> the primary key. [6.5, 7.0, 2000] Updated 3-5-2004"
> They sound contradictory to me.
> And on (http://www.sql-server-performance.c...red_indexes.asp),
it
> states
> "If a column in a table is not at least 95% unique, then most likely the
> query optimizer will not use a non-clustered index based on that column.
> Because of this, don't add non-clustered indexes to columns that aren't at
> least 95% unique. For example, a column with "yes" or "no" as the data
won't
> be at least 95% unique. [6.5, 7.0, 2000] Updated 3-4-2004"
> Which sounds more logical to me.
> 1) Do you have general rule of thumb on this matter:
> - clustered or non-clustered index for primary key?
> - clustered or non-clustered index for foreign key?
> 2) Also does my understanding of the functions of clustered and
> non-clustered index correct?
> Could u please kindly advise. TQ again.
> "Uri Dimant" wrote:
>
a
However
or
value
index
directly
I
clustered
index.
key,
child
clustered|||Watch out for rule-of-thumbs!
You have to weigh in both retrieval and inserting of data. The part about cr
eating cl ix on a column
which is monotonically increasing is all about inserting data. This is so th
at all data goes to the
end of the table and you don't get page split in your data (remember that cl
ix = data). But it
doesn't address how you search data. Fine, cluster on your identify PK, but
perhaps it is more
important to support range queries on a datetime or lastname column? Also, r
emember that page splits
can be handled using index defragmentation, and perhaps the table isn't that
insert intensitive in
the first place?
I don't know where they get the 95% unique number. It is all about selectivi
ty. Understand how SQL
Server can use a clustered index and also non-clustered indexes (navigate th
e index then bookmark
lookups). Also remember that the optimizer is cost based.
Based on above, you should be able to make informed decisions of what to clu
ster on.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
Blog: http://solidqualitylearning.com/blogs/tibor/
"HardKhor" <HardKhor@.discussions.microsoft.com> wrote in message
news:05A5C898-E9AD-4B0F-9F43-E0E0BA75A9EA@.microsoft.com...
> Hi,
> Thanks for the reply. One of the article
> (http://www.sql-server-performance.c...red_indexes.asp) was the exac
t
> one that confuses me.
> It mentions...
> "As a rule of thumb, every table should have a clustered index. Generally,
> but not always, the clustered index should be on a column that monotonical
ly
> increases--such as an identity column, or some other column where the valu
e
> is increasing--and is unique. In many cases, the primary key is the ideal
> column for a clustered index."
> Which in my understanding basically says... Primary key should use cluster
ed
> index.
> But in another paragraph...
> "The primary key you select for your table should not always be a clustere
d
> index. If you create the primary key and don't specify otherwise, this is
the
> default. Only make the primary key a clustered index if you will be regula
rly
> performing range queries on the primary key or need your results sorted by
> the primary key. [6.5, 7.0, 2000] Updated 3-5-2004"
> They sound contradictory to me.
> And on (http://www.sql-server-performance.c...red_indexes.asp), i
t
> states
> "If a column in a table is not at least 95% unique, then most likely the
> query optimizer will not use a non-clustered index based on that column.
> Because of this, don't add non-clustered indexes to columns that aren't at
> least 95% unique. For example, a column with "yes" or "no" as the data won
't
> be at least 95% unique. [6.5, 7.0, 2000] Updated 3-4-2004"
> Which sounds more logical to me.
> 1) Do you have general rule of thumb on this matter:
> - clustered or non-clustered index for primary key?
> - clustered or non-clustered index for foreign key?
> 2) Also does my understanding of the functions of clustered and
> non-clustered index correct?
> Could u please kindly advise. TQ again.
> "Uri Dimant" wrote:
>|||On Wed, 20 Jul 2005 11:57:06 +0300, "Uri Dimant" <urid@.iscar.co.il> wrote:
in <uqvI#jQjFHA.4000@.TK2MSFTNGP12.phx.gbl>
>Yes, it is considered a good practice to create a NCI on foreign key to
>improve JOIN performance.
Is it better, worse, or no difference if the foreign key is also a CI on its
own
PK?
Consider the following:
CREATE TABLE ShowTime.dbo.Countries (
CountryID TINYINT NOT NULL CONSTRAINT PK_Countries PRIMARY KEY,
CountryCode CHAR(3) NOT NULL,
Country VARCHAR(45) NOT NULL)
CREATE TABLE ShowTime.dbo.States (
StateID TINYINT NOT NULL CONSTRAINT PK_States PRIMARY KEY,
CountryID TINYINT NOT NULL CONSTRAINT FK_States_Country FOREIGN KEY
(CountryID) REFERENCES ShowTime.dbo.Countries(CountryID),
State CHAR(2) NOT NULL)
I'm interpreting your remark to mean in the absence of an index on the forei
gn
key it would be considered to be good practice to create a NCI?
Stefan Berglund|||Tibor Karaszi wrote:
> Watch out for rule-of-thumbs!
> You have to weigh in both retrieval and inserting of data. The part
> about creating cl ix on a column which is monotonically increasing is
> all about inserting data. This is so that all data goes to the end of
> the table and you don't get page split in your data (remember that cl
> ix = data). But it doesn't address how you search data. Fine, cluster
> on your identify PK, but perhaps it is more important to support
> range queries on a datetime or lastname column? Also, remember that
> page splits can be handled using index defragmentation, and perhaps
> the table isn't that insert intensitive in the first place?
> I don't know where they get the 95% unique number. It is all about
> selectivity. Understand how SQL Server can use a clustered index and
> also non-clustered indexes (navigate the index then bookmark
> lookups). Also remember that the optimizer is cost based.
> Based on above, you should be able to make informed decisions of what
> to cluster on.
I would add that clustered index keys are stored internally by SQL
Server as unique values and the clustered index key values are the
pointers in all non-clustered indexes. This has a couple implications:
1- Your clustered index key should be small if your table contains
non-clustered indexes. If you cluster on a large key (say 100 bytes),
then you've added 100 bytes to each key value in every non-clustered
index.
2- If you cluster on a non-unique column, SQL Server will make it
unique internally. While this is transparent to database users, it
further increases the size of the index and all non-clustered indexes.
David Gugick
Quest Software
www.imceda.com
www.quest.com|||"Tibor Karaszi" wrote:
> Watch out for rule-of-thumbs!
>
1) TQ for ur reply.
Yes I understand that is no one-size-fits-all approach to this. Which is why
I am trying to understand from a conceptual point of view, difference betwee
n
clustered and non-clustered index and their application, before I can make
informed decisions on how they fit into my application.
However I don't believe anyone has given a clear answer on whether my
understanding on them is correct. I.e. essentially, a clustered index is a
sorted group of rows under a common index value (1 index value = M rows).
Whereas a non-clustered index is, basically 1 index value = 1 row. Could u
pls kindly advise, essentially, is my understanding correct?
2) My database has plenty of parent-and-child tables, e.g. Invoice and
Invoice Item. So the child table [InvoiceItem] has a foreign key
[InvoiceCode] referring back to the primary key [Code] of [Invoice] table
(1-M relationship).
So when query the database to populate my biz reports, a very common
requirement is to join the parent and child tables along their keys.
Also in a typical requirement, I will need join the results from the
parent-child join, with other tables, such as [Item] which contains details
on the [InvoiceItem], E.g. [Item].[Code] = [InvoiceItem].[ItemCode].
Therefore there is a lot of 1-M joins.
So based on my understanding of clustered and non-clustered indexes and how
SQL Server uses them to search for data, at least in the case of parent-chil
d
relationship, it makes more sense to set a clustered index at child's foreig
n
key column, i.e. [InvoiceItem].[InvoiceCode], and non-clustered index at the
parent's primary key column, i.e. [Invoice].[Code]. Do u agree?
> You have to weigh in both retrieval and inserting of data. The part about
creating cl ix on a column
> which is monotonically increasing is all about inserting data. This is so
that all data goes to the
> end of the table and you don't get page split in your data (remember that
cl ix = data). But it
> doesn't address how you search data. Fine, cluster on your identify PK, bu
t perhaps it is more
> important to support range queries on a datetime or lastname column?
3) Based on ur explaination, the majority of range queries that I do is
along the date fields of various transaction tables, e.g. finding the
invoices for this mth, last mth, last year, 10 years ago, etc. So it is also
POSSIBLE to consider [TransactionDate] as a candidate for cluster index? Do u
agree?
> I don't know where they get the 95% unique number. It is all about selecti
vity. Understand how SQL
> Server can use a clustered index and also non-clustered indexes (navigate
the index then bookmark
> lookups). Also remember that the optimizer is cost based.
3) My understanding of selectivity means how fast data can be uniquely
identified. So it implies a preference towards a field with 95% unqiue
values. Hence the 95% unique number thingy. Is that true?
> Based on above, you should be able to make informed decisions of what to c
luster on.
> --
> Tibor Karaszi, SQL Server MVP
> http://www.karaszi.com/sqlserver/default.asp
> http://www.solidqualitylearning.com/
> Blog: http://solidqualitylearning.com/blogs/tibor/
>
> "HardKhor" <HardKhor@.discussions.microsoft.com> wrote in message
> news:05A5C898-E9AD-4B0F-9F43-E0E0BA75A9EA@.microsoft.com...
>|||Inline...
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
Blog: http://solidqualitylearning.com/blogs/tibor/
"HardKhor" <HardKhor@.discussions.microsoft.com> wrote in message
news:D5D39394-6183-46E5-B6FA-40B5957CB882@.microsoft.com...
> "Tibor Karaszi" wrote:
>
> 1) TQ for ur reply.
You're welcome :-)
> Yes I understand that is no one-size-fits-all approach to this. Which is w
hy
> I am trying to understand from a conceptual point of view, difference betw
een
> clustered and non-clustered index and their application, before I can make
> informed decisions on how they fit into my application.
> However I don't believe anyone has given a clear answer on whether my
> understanding on them is correct. I.e. essentially, a clustered index is a
> sorted group of rows under a common index value (1 index value = M rows).
> Whereas a non-clustered index is, basically 1 index value = 1 row. Could u
> pls kindly advise, essentially, is my understanding correct?
SQL Server does not store the value once and the number of occurances as a n
umber, if that is what
you mean. Bot cl and nc indexes are b-trees (there are good drawings on this
in Books Online).
For a cl ix, the leaf level *is the data*. I.e., it is sorted (the linked li
st) according to the
index key and the pages contains all the columns.
An nc ix leaf level only contains the index key and a "row locator"/bookmark
. This is used to
navigate to the corresponding page and row number on that pag.
>
> 2) My database has plenty of parent-and-child tables, e.g. Invoice and
> Invoice Item. So the child table [InvoiceItem] has a foreign key
> [InvoiceCode] referring back to the primary key [Code] of [Invoice] table
> (1-M relationship).
> So when query the database to populate my biz reports, a very common
> requirement is to join the parent and child tables along their keys.
> Also in a typical requirement, I will need join the results from the
> parent-child join, with other tables, such as [Item] which contains details
> on the [InvoiceItem], E.g. [Item].[Code] = [InvoiceItem].[ItemCode].
> Therefore there is a lot of 1-M joins.
> So based on my understanding of clustered and non-clustered indexes and ho
w
> SQL Server uses them to search for data, at least in the case of parent-ch
ild
> relationship, it makes more sense to set a clustered index at child's fore
ign
> key column, i.e. [InvoiceItem].[InvoiceCode], and non-clustered index at the
> parent's primary key column, i.e. [Invoice].[Code]. Do u agree?
Yes, cl on fk can often ba a good candidate. You still have to weigh in othe
r candidates as well as
consider fragmentation aspects.
>
> 3) Based on ur explaination, the majority of range queries that I do is
> along the date fields of various transaction tables, e.g. finding the
> invoices for this mth, last mth, last year, 10 years ago, etc. So it is al
so
> POSSIBLE to consider [TransactionDate] as a candidate for cluster index? Do u
> agree?
Yes.
>
> 3) My understanding of selectivity means how fast data can be uniquely
> identified. So it implies a preference towards a field with 95% unqiue
> values. Hence the 95% unique number thingy. Is that true?
Selectivity is quite simply for a certain table and a certain restriction (c
ondition in the WHERE
clause), how many percent of the total number of rows. 1000 rows in table, c
ondition returns 100
rows = 10% selectivity.
Density is another term we use. Here we *do not* involve a query or search c
ondition. We only look
at the data. Say you have a column named gender. Say we only have two possib
le values (male and
female). Say that both are represented in the table. We now have a density o
f 1/2 = 0.5. Or say that
we do business with 20 other companies. These 10 companies are all represent
ed in the table. For the
"company column", we have a density of 0.05 (1/20). When we calculatye densi
ty, we do not care about
the number of rows in the table.
>
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment