Hi,
Can somebody explain me the difference between primary key
and unique key and in which scenarios we use them. please
explain with a example.
Regards
ChipThe primary distinction is that Primary key column(s) must be non-nullable
while Unique constraints allow a NULL value. Also, there can be only one
primary key in a table while there can be more than one Unique constraints
defined for a table.
--
Anith|||Hi,
Primary key :
- The key which will not allow any NULL Values.
- Will Create a Clustered Index by default.
Unique Key
- The Key will allow 1 Null Value
- Will Create a non clustered index by default
Since Primary key will not allow NULLS values it can be used in scenario to
preserve more data integrity
Thanks
Hari
MCDBA
"Chip" <anonymous@.discussions.microsoft.com> wrote in message
news:12c4c01c3f6a9$9ac762d0$a401280a@.phx.gbl...
> Hi,
> Can somebody explain me the difference between primary key
> and unique key and in which scenarios we use them. please
> explain with a example.
> Regards
> Chipsql
Showing posts with label unique. Show all posts
Showing posts with label unique. Show all posts
Tuesday, March 27, 2012
Difference Between primary key and unique key
Hi,
Can somebody explain me the difference between primary key
and unique key and in which scenarios we use them. please
explain with a example.
Regards
ChipThe primary distinction is that Primary key column(s) must be non-nullable
while Unique constraints allow a NULL value. Also, there can be only one
primary key in a table while there can be more than one Unique constraints
defined for a table.
Anith|||Hi,
Primary key :
- The key which will not allow any NULL Values.
- Will Create a Clustered Index by default.
Unique Key
- The Key will allow 1 Null Value
- Will Create a non clustered index by default
Since Primary key will not allow NULLS values it can be used in scenario to
preserve more data integrity
Thanks
Hari
MCDBA
"Chip" <anonymous@.discussions.microsoft.com> wrote in message
news:12c4c01c3f6a9$9ac762d0$a401280a@.phx
.gbl...
> Hi,
> Can somebody explain me the difference between primary key
> and unique key and in which scenarios we use them. please
> explain with a example.
> Regards
> Chip
Can somebody explain me the difference between primary key
and unique key and in which scenarios we use them. please
explain with a example.
Regards
ChipThe primary distinction is that Primary key column(s) must be non-nullable
while Unique constraints allow a NULL value. Also, there can be only one
primary key in a table while there can be more than one Unique constraints
defined for a table.
Anith|||Hi,
Primary key :
- The key which will not allow any NULL Values.
- Will Create a Clustered Index by default.
Unique Key
- The Key will allow 1 Null Value
- Will Create a non clustered index by default
Since Primary key will not allow NULLS values it can be used in scenario to
preserve more data integrity
Thanks
Hari
MCDBA
"Chip" <anonymous@.discussions.microsoft.com> wrote in message
news:12c4c01c3f6a9$9ac762d0$a401280a@.phx
.gbl...
> Hi,
> Can somebody explain me the difference between primary key
> and unique key and in which scenarios we use them. please
> explain with a example.
> Regards
> Chip
Sunday, March 25, 2012
Difference between creating a Unique Constraint and a Unique Index
Hi,
I've looked at the online documentation and encountered this piece of
text
"If uniqueness must be enforced to ensure data integrity, create a
UNIQUE or PRIMARY KEY constraint on the column rather than a unique
index"
I'm baffled; I understand a unique contraint, and personally do not
care how it is physically enforced (using an index to speed up the
process) as long as it is enforced. However what is the deal with
unique indexes? The piece of documentation above seems to suggest that
a unique index will not enforce uniqueness.
The reason I ask is that I am working against a legacy system where
unique indexes are used for each table. Each table, besides a PK, has
a column that stores a GUID. Personally I would use a Unique
Contraint, but they have used a unique index on this column. What
would be the advantages and disadvantages of using a unique index vs a
unique constraint.'
regards
Paul SjoerdsmaYes, a unique index enforces uniqueness. A unique index is created in the
background when you specify a PK or Unique constraint.
The difference is more semantic, but also tactical... Placing the constraint
indicates a design requirement. When you create the index directly, how can
you know IF the index was created for performance, or if the index is
intended to enforce some design constraint... The fact is that you can
not... So use PK and Uniques to enforce design requirements...Indexes are
created for 2 main reasons, to enforce integrity and speed up queries.The
indexes used to speed up queries( or other statements) may need to change
over time as business priorities change, but we would still need to maintain
the indexes created to enforce data integrity.
Indexes created as a by-product of PK or unique constraints can NOT be
dropped using DROP index, so you are protected from inadvertently dropping
an index which is used for data integrity. FOr this you must use alter table
drop constraint, alter table add constraint... Any index that goes away
using DROP index, therefore, would have been created for performance, and is
a candidate for re-assessment as needs change
Additionally FKs must be placed on columns with PKs or Unique
constraints.(from BOL). HOwever the actual code allows you to put an FK on
any column which has a unique index ( although I beleive this should NOT be
allowed)..
Hope this helps/
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"Paul Sjoerdsma" <paul@.pasoftware.nl> wrote in message
news:2j8ko0pdll18amito60gcqne5g58en6otj@.
4ax.com...
> Hi,
> I've looked at the online documentation and encountered this piece of
> text
> "If uniqueness must be enforced to ensure data integrity, create a
> UNIQUE or PRIMARY KEY constraint on the column rather than a unique
> index"
> I'm baffled; I understand a unique contraint, and personally do not
> care how it is physically enforced (using an index to speed up the
> process) as long as it is enforced. However what is the deal with
> unique indexes? The piece of documentation above seems to suggest that
> a unique index will not enforce uniqueness.
> The reason I ask is that I am working against a legacy system where
> unique indexes are used for each table. Each table, besides a PK, has
> a column that stores a GUID. Personally I would use a Unique
> Contraint, but they have used a unique index on this column. What
> would be the advantages and disadvantages of using a unique index vs a
> unique constraint.'
>
> regards
> Paul Sjoerdsma
>|||In addition to the other respondents reply, implicit in your question is
really two seperate questions:
What is the difference between the use of PK and Unique Constraints and
Unique Indexes? The was answered by the other respondent. Except, so you
know, there is no such thing as a PK index, only a constraint, which is also
enforced through a Unique Index.
Second, you made mention of PK and GUID, which I am assuming you implied
what is typically used some sort of INT IDENTITY. Both of these are unique
by design, one locally to the table, the other across all tables, databases,
and time. However, these are mearly mechanics. They do not garauntee
uniqueness of the data, which is the sole purpose of KEYS, Primary or
Alternate. Moreover, PKs and Unique Constraints are not limited to IDENTITY
nor GUID. For example:
EmployeeID LastName FirstName
1 Smith John
2 Jones Tom
3 Marshall Eric
4 Smith John
Although EmployeeID is unique, because it is IDENTITY and is so by
construction, for this table, the DATA is NOT unique: John Smith is repeated
.
Therefore, this table is NOT even a table by relational design--it is not in
1NF (First Normal Form).
The real key for this entity is the Logical Name attribute, which is
composite upon FirstName and LastName. Since that is pretty much all of the
data, it is also the Primary Key candidate. However, if this table were
JOINed to some other not only would you have to populate both attributes to
that table, you would have to populate string types (data storage
inefficient) and would have to include both attributes in any Foreign Key or
JOIN declaration, also highly inefficient.
So, oftentimes, one creates the IDENTITY or GUID attributes to simplify
these requirements. What happens, TOO OFTEN, is the PK is defined on
EmployeeID and no Unique Constraint is ever defined on the Candidate Key
combination of First and Last Names, which is the true key of this table, bu
t
should be and MUST be for this implementation to be in at least 1NF.
Moreover, while I am on it, I see far too often that such an IDENTITY or
GUID PK is also defined as the Clustered Index, if there is one at all. Tha
t
is a lousy choice. Not only should every table have a KEY (Primary or
Otherwise)--this is a logical or design requirement--every table should also
have a WELL CHOSEN Clustered Index or Key defined--this is a physical
recommendation.
A well chosen Clustered Index or Key will affect the construction of
statistics and other indexes on the table. It should be nearly unique, if
not a unique key. It should not change often. And, it should be considered
for RANGE type query requirements. In our example, not only do we need to
create a UNIQUE CONSTRAINT for the Name attribute, the Last Name or the Last
Name + First Name combination should be considered for the Clustered Index.
I hope I have not gone to Academic, or Theoretical--some might say, Cultish.
Nevertheless, these point go far too unnoticed by the mass of database
designers and users and needs to be said more often.
Sincerely,
Anthony Thomas
"Paul Sjoerdsma" wrote:
> Hi,
> I've looked at the online documentation and encountered this piece of
> text
> "If uniqueness must be enforced to ensure data integrity, create a
> UNIQUE or PRIMARY KEY constraint on the column rather than a unique
> index"
> I'm baffled; I understand a unique contraint, and personally do not
> care how it is physically enforced (using an index to speed up the
> process) as long as it is enforced. However what is the deal with
> unique indexes? The piece of documentation above seems to suggest that
> a unique index will not enforce uniqueness.
> The reason I ask is that I am working against a legacy system where
> unique indexes are used for each table. Each table, besides a PK, has
> a column that stores a GUID. Personally I would use a Unique
> Contraint, but they have used a unique index on this column. What
> would be the advantages and disadvantages of using a unique index vs a
> unique constraint.'
>
> regards
> Paul Sjoerdsma
>|||Anthony and Wayne,
Thanks for your answers. Although I would consider myself to be
experienced in database design (with the necessary theoretical
background) I still do not grok the full idea behind the concept of a
unique index; even after your elaborate explanations.
To me it seems to have raised it's head out of some murky swamp. There
are nicer places to visit, so a Unique Key constraint is what it will
be.
However if someone could give me a clear example where I would use a
unique index instead of a unique key I would be very happy,
thanks,
regards
Paul
On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
<AnthonyThomas@.discussions.microsoft.com> wrote:
>In addition to the other respondents reply, implicit in your question is
>really two seperate questions:
>|||> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
Very good question. IMO, there aren't any (*). Some say that you should use
an index directly when
the uniqueness isn't a property of the data model per se, but IMO, I have a
hard time finding such
an example. If I want to enforce uniqueness, I surely want to expose this at
the logical level
(constraint) as well, right?
(*) Here's an exception. There are at least one attribute you can define for
a unique index which
isn't available through a unique constraint. If this index attribute is impo
rtant for you, define an
index directly and not though a unique constraint. In SQL Server 2005, this
will be available
through a constraint...
Be prepared that different people have different viewpoints about this. It i
s a bit like the age-old
discussion "natural keys vs. surrogate keys".
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"Paul Sjoerdsma" <paul.nospam.@.pasoftware.nl> wrote in message
news:h9dmo0dce287fe5gdjoo0ro4mvgqrsggb5@.
4ax.com...
> Anthony and Wayne,
> Thanks for your answers. Although I would consider myself to be
> experienced in database design (with the necessary theoretical
> background) I still do not grok the full idea behind the concept of a
> unique index; even after your elaborate explanations.
> To me it seems to have raised it's head out of some murky swamp. There
> are nicer places to visit, so a Unique Key constraint is what it will
> be.
>
> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
>
> thanks,
> regards
> Paul
>
> On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
> <AnthonyThomas@.discussions.microsoft.com> wrote:
>
>|||I have two real life examples where I would use a unique index as
opposed to a unique constraint.
1. When you have a relation table that contains the keys of two
referenced tables. For example the table StudentCourses which lists the
courses for each student. It might be defined as:
CREATE TABLE StudentCourses
(StudentID int not null REFERENCES Students
,CourseID int not null REFERENCES Courses
,CONSTRAINT PK_StudentCourses PRIMARY KEY (StudentID, CourseID)
)
In this case there already is a unique constraint on (StudentID,
CourseID). For performance purposes I would add a unique index on
(CourseID, StudentID). This would benefit query like:
SELECT Course, COUNT(*) AS StudentCount
FROM Courses C1
INNER JOIN StudentCourses CS1
ON CS1.CourseID = C1.CourseID
GROUP BY Course
2. When a (compound) covering index is needed, and one of its columns is
by itself unique. When I have the choice between a nonunique and a
unique index, I choose a unique index. But the columns of the covering
index do not constitute a new or different "unique relation", so a
unique constraint would be inappropriate.
The rule I use for unique constraints is that you should be able to
remove all user created unique indexes, and the data model would still
be consistent and enforced.
Hope this helps,
Gert-Jan
Paul Sjoerdsma wrote:[vbcol=seagreen]
> Anthony and Wayne,
> Thanks for your answers. Although I would consider myself to be
> experienced in database design (with the necessary theoretical
> background) I still do not grok the full idea behind the concept of a
> unique index; even after your elaborate explanations.
> To me it seems to have raised it's head out of some murky swamp. There
> are nicer places to visit, so a Unique Key constraint is what it will
> be.
> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
> thanks,
> regards
> Paul
> On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
> <AnthonyThomas@.discussions.microsoft.com> wrote:
>
I've looked at the online documentation and encountered this piece of
text
"If uniqueness must be enforced to ensure data integrity, create a
UNIQUE or PRIMARY KEY constraint on the column rather than a unique
index"
I'm baffled; I understand a unique contraint, and personally do not
care how it is physically enforced (using an index to speed up the
process) as long as it is enforced. However what is the deal with
unique indexes? The piece of documentation above seems to suggest that
a unique index will not enforce uniqueness.
The reason I ask is that I am working against a legacy system where
unique indexes are used for each table. Each table, besides a PK, has
a column that stores a GUID. Personally I would use a Unique
Contraint, but they have used a unique index on this column. What
would be the advantages and disadvantages of using a unique index vs a
unique constraint.'
regards
Paul SjoerdsmaYes, a unique index enforces uniqueness. A unique index is created in the
background when you specify a PK or Unique constraint.
The difference is more semantic, but also tactical... Placing the constraint
indicates a design requirement. When you create the index directly, how can
you know IF the index was created for performance, or if the index is
intended to enforce some design constraint... The fact is that you can
not... So use PK and Uniques to enforce design requirements...Indexes are
created for 2 main reasons, to enforce integrity and speed up queries.The
indexes used to speed up queries( or other statements) may need to change
over time as business priorities change, but we would still need to maintain
the indexes created to enforce data integrity.
Indexes created as a by-product of PK or unique constraints can NOT be
dropped using DROP index, so you are protected from inadvertently dropping
an index which is used for data integrity. FOr this you must use alter table
drop constraint, alter table add constraint... Any index that goes away
using DROP index, therefore, would have been created for performance, and is
a candidate for re-assessment as needs change
Additionally FKs must be placed on columns with PKs or Unique
constraints.(from BOL). HOwever the actual code allows you to put an FK on
any column which has a unique index ( although I beleive this should NOT be
allowed)..
Hope this helps/
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"Paul Sjoerdsma" <paul@.pasoftware.nl> wrote in message
news:2j8ko0pdll18amito60gcqne5g58en6otj@.
4ax.com...
> Hi,
> I've looked at the online documentation and encountered this piece of
> text
> "If uniqueness must be enforced to ensure data integrity, create a
> UNIQUE or PRIMARY KEY constraint on the column rather than a unique
> index"
> I'm baffled; I understand a unique contraint, and personally do not
> care how it is physically enforced (using an index to speed up the
> process) as long as it is enforced. However what is the deal with
> unique indexes? The piece of documentation above seems to suggest that
> a unique index will not enforce uniqueness.
> The reason I ask is that I am working against a legacy system where
> unique indexes are used for each table. Each table, besides a PK, has
> a column that stores a GUID. Personally I would use a Unique
> Contraint, but they have used a unique index on this column. What
> would be the advantages and disadvantages of using a unique index vs a
> unique constraint.'
>
> regards
> Paul Sjoerdsma
>|||In addition to the other respondents reply, implicit in your question is
really two seperate questions:
What is the difference between the use of PK and Unique Constraints and
Unique Indexes? The was answered by the other respondent. Except, so you
know, there is no such thing as a PK index, only a constraint, which is also
enforced through a Unique Index.
Second, you made mention of PK and GUID, which I am assuming you implied
what is typically used some sort of INT IDENTITY. Both of these are unique
by design, one locally to the table, the other across all tables, databases,
and time. However, these are mearly mechanics. They do not garauntee
uniqueness of the data, which is the sole purpose of KEYS, Primary or
Alternate. Moreover, PKs and Unique Constraints are not limited to IDENTITY
nor GUID. For example:
EmployeeID LastName FirstName
1 Smith John
2 Jones Tom
3 Marshall Eric
4 Smith John
Although EmployeeID is unique, because it is IDENTITY and is so by
construction, for this table, the DATA is NOT unique: John Smith is repeated
.
Therefore, this table is NOT even a table by relational design--it is not in
1NF (First Normal Form).
The real key for this entity is the Logical Name attribute, which is
composite upon FirstName and LastName. Since that is pretty much all of the
data, it is also the Primary Key candidate. However, if this table were
JOINed to some other not only would you have to populate both attributes to
that table, you would have to populate string types (data storage
inefficient) and would have to include both attributes in any Foreign Key or
JOIN declaration, also highly inefficient.
So, oftentimes, one creates the IDENTITY or GUID attributes to simplify
these requirements. What happens, TOO OFTEN, is the PK is defined on
EmployeeID and no Unique Constraint is ever defined on the Candidate Key
combination of First and Last Names, which is the true key of this table, bu
t
should be and MUST be for this implementation to be in at least 1NF.
Moreover, while I am on it, I see far too often that such an IDENTITY or
GUID PK is also defined as the Clustered Index, if there is one at all. Tha
t
is a lousy choice. Not only should every table have a KEY (Primary or
Otherwise)--this is a logical or design requirement--every table should also
have a WELL CHOSEN Clustered Index or Key defined--this is a physical
recommendation.
A well chosen Clustered Index or Key will affect the construction of
statistics and other indexes on the table. It should be nearly unique, if
not a unique key. It should not change often. And, it should be considered
for RANGE type query requirements. In our example, not only do we need to
create a UNIQUE CONSTRAINT for the Name attribute, the Last Name or the Last
Name + First Name combination should be considered for the Clustered Index.
I hope I have not gone to Academic, or Theoretical--some might say, Cultish.
Nevertheless, these point go far too unnoticed by the mass of database
designers and users and needs to be said more often.
Sincerely,
Anthony Thomas
"Paul Sjoerdsma" wrote:
> Hi,
> I've looked at the online documentation and encountered this piece of
> text
> "If uniqueness must be enforced to ensure data integrity, create a
> UNIQUE or PRIMARY KEY constraint on the column rather than a unique
> index"
> I'm baffled; I understand a unique contraint, and personally do not
> care how it is physically enforced (using an index to speed up the
> process) as long as it is enforced. However what is the deal with
> unique indexes? The piece of documentation above seems to suggest that
> a unique index will not enforce uniqueness.
> The reason I ask is that I am working against a legacy system where
> unique indexes are used for each table. Each table, besides a PK, has
> a column that stores a GUID. Personally I would use a Unique
> Contraint, but they have used a unique index on this column. What
> would be the advantages and disadvantages of using a unique index vs a
> unique constraint.'
>
> regards
> Paul Sjoerdsma
>|||Anthony and Wayne,
Thanks for your answers. Although I would consider myself to be
experienced in database design (with the necessary theoretical
background) I still do not grok the full idea behind the concept of a
unique index; even after your elaborate explanations.
To me it seems to have raised it's head out of some murky swamp. There
are nicer places to visit, so a Unique Key constraint is what it will
be.
However if someone could give me a clear example where I would use a
unique index instead of a unique key I would be very happy,
thanks,
regards
Paul
On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
<AnthonyThomas@.discussions.microsoft.com> wrote:
>In addition to the other respondents reply, implicit in your question is
>really two seperate questions:
>|||> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
Very good question. IMO, there aren't any (*). Some say that you should use
an index directly when
the uniqueness isn't a property of the data model per se, but IMO, I have a
hard time finding such
an example. If I want to enforce uniqueness, I surely want to expose this at
the logical level
(constraint) as well, right?
(*) Here's an exception. There are at least one attribute you can define for
a unique index which
isn't available through a unique constraint. If this index attribute is impo
rtant for you, define an
index directly and not though a unique constraint. In SQL Server 2005, this
will be available
through a constraint...
Be prepared that different people have different viewpoints about this. It i
s a bit like the age-old
discussion "natural keys vs. surrogate keys".
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"Paul Sjoerdsma" <paul.nospam.@.pasoftware.nl> wrote in message
news:h9dmo0dce287fe5gdjoo0ro4mvgqrsggb5@.
4ax.com...
> Anthony and Wayne,
> Thanks for your answers. Although I would consider myself to be
> experienced in database design (with the necessary theoretical
> background) I still do not grok the full idea behind the concept of a
> unique index; even after your elaborate explanations.
> To me it seems to have raised it's head out of some murky swamp. There
> are nicer places to visit, so a Unique Key constraint is what it will
> be.
>
> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
>
> thanks,
> regards
> Paul
>
> On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
> <AnthonyThomas@.discussions.microsoft.com> wrote:
>
>|||I have two real life examples where I would use a unique index as
opposed to a unique constraint.
1. When you have a relation table that contains the keys of two
referenced tables. For example the table StudentCourses which lists the
courses for each student. It might be defined as:
CREATE TABLE StudentCourses
(StudentID int not null REFERENCES Students
,CourseID int not null REFERENCES Courses
,CONSTRAINT PK_StudentCourses PRIMARY KEY (StudentID, CourseID)
)
In this case there already is a unique constraint on (StudentID,
CourseID). For performance purposes I would add a unique index on
(CourseID, StudentID). This would benefit query like:
SELECT Course, COUNT(*) AS StudentCount
FROM Courses C1
INNER JOIN StudentCourses CS1
ON CS1.CourseID = C1.CourseID
GROUP BY Course
2. When a (compound) covering index is needed, and one of its columns is
by itself unique. When I have the choice between a nonunique and a
unique index, I choose a unique index. But the columns of the covering
index do not constitute a new or different "unique relation", so a
unique constraint would be inappropriate.
The rule I use for unique constraints is that you should be able to
remove all user created unique indexes, and the data model would still
be consistent and enforced.
Hope this helps,
Gert-Jan
Paul Sjoerdsma wrote:[vbcol=seagreen]
> Anthony and Wayne,
> Thanks for your answers. Although I would consider myself to be
> experienced in database design (with the necessary theoretical
> background) I still do not grok the full idea behind the concept of a
> unique index; even after your elaborate explanations.
> To me it seems to have raised it's head out of some murky swamp. There
> are nicer places to visit, so a Unique Key constraint is what it will
> be.
> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
> thanks,
> regards
> Paul
> On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
> <AnthonyThomas@.discussions.microsoft.com> wrote:
>
Labels:
constraint,
create,
creating,
database,
documentation,
encountered,
enforced,
ensure,
index,
integrity,
microsoft,
mysql,
oftext,
online,
oracle,
piece,
server,
sql,
unique,
uniqueness
Difference between creating a Unique Constraint and a Unique Index
Hi,
I've looked at the online documentation and encountered this piece of
text
"If uniqueness must be enforced to ensure data integrity, create a
UNIQUE or PRIMARY KEY constraint on the column rather than a unique
index"
I'm baffled; I understand a unique contraint, and personally do not
care how it is physically enforced (using an index to speed up the
process) as long as it is enforced. However what is the deal with
unique indexes? The piece of documentation above seems to suggest that
a unique index will not enforce uniqueness.
The reason I ask is that I am working against a legacy system where
unique indexes are used for each table. Each table, besides a PK, has
a column that stores a GUID. Personally I would use a Unique
Contraint, but they have used a unique index on this column. What
would be the advantages and disadvantages of using a unique index vs a
unique constraint.'
regards
Paul SjoerdsmaYes, a unique index enforces uniqueness. A unique index is created in the
background when you specify a PK or Unique constraint.
The difference is more semantic, but also tactical... Placing the constraint
indicates a design requirement. When you create the index directly, how can
you know IF the index was created for performance, or if the index is
intended to enforce some design constraint... The fact is that you can
not... So use PK and Uniques to enforce design requirements...Indexes are
created for 2 main reasons, to enforce integrity and speed up queries.The
indexes used to speed up queries( or other statements) may need to change
over time as business priorities change, but we would still need to maintain
the indexes created to enforce data integrity.
Indexes created as a by-product of PK or unique constraints can NOT be
dropped using DROP index, so you are protected from inadvertently dropping
an index which is used for data integrity. FOr this you must use alter table
drop constraint, alter table add constraint... Any index that goes away
using DROP index, therefore, would have been created for performance, and is
a candidate for re-assessment as needs change
Additionally FKs must be placed on columns with PKs or Unique
constraints.(from BOL). HOwever the actual code allows you to put an FK on
any column which has a unique index ( although I beleive this should NOT be
allowed)..
Hope this helps/
--
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"Paul Sjoerdsma" <paul@.pasoftware.nl> wrote in message
news:2j8ko0pdll18amito60gcqne5g58en6otj@.4ax.com...
> Hi,
> I've looked at the online documentation and encountered this piece of
> text
> "If uniqueness must be enforced to ensure data integrity, create a
> UNIQUE or PRIMARY KEY constraint on the column rather than a unique
> index"
> I'm baffled; I understand a unique contraint, and personally do not
> care how it is physically enforced (using an index to speed up the
> process) as long as it is enforced. However what is the deal with
> unique indexes? The piece of documentation above seems to suggest that
> a unique index will not enforce uniqueness.
> The reason I ask is that I am working against a legacy system where
> unique indexes are used for each table. Each table, besides a PK, has
> a column that stores a GUID. Personally I would use a Unique
> Contraint, but they have used a unique index on this column. What
> would be the advantages and disadvantages of using a unique index vs a
> unique constraint.'
>
> regards
> Paul Sjoerdsma
>|||In addition to the other respondents reply, implicit in your question is
really two seperate questions:
What is the difference between the use of PK and Unique Constraints and
Unique Indexes? The was answered by the other respondent. Except, so you
know, there is no such thing as a PK index, only a constraint, which is also
enforced through a Unique Index.
Second, you made mention of PK and GUID, which I am assuming you implied
what is typically used some sort of INT IDENTITY. Both of these are unique
by design, one locally to the table, the other across all tables, databases,
and time. However, these are mearly mechanics. They do not garauntee
uniqueness of the data, which is the sole purpose of KEYS, Primary or
Alternate. Moreover, PKs and Unique Constraints are not limited to IDENTITY
nor GUID. For example:
EmployeeID LastName FirstName
1 Smith John
2 Jones Tom
3 Marshall Eric
4 Smith John
Although EmployeeID is unique, because it is IDENTITY and is so by
construction, for this table, the DATA is NOT unique: John Smith is repeated.
Therefore, this table is NOT even a table by relational design--it is not in
1NF (First Normal Form).
The real key for this entity is the Logical Name attribute, which is
composite upon FirstName and LastName. Since that is pretty much all of the
data, it is also the Primary Key candidate. However, if this table were
JOINed to some other not only would you have to populate both attributes to
that table, you would have to populate string types (data storage
inefficient) and would have to include both attributes in any Foreign Key or
JOIN declaration, also highly inefficient.
So, oftentimes, one creates the IDENTITY or GUID attributes to simplify
these requirements. What happens, TOO OFTEN, is the PK is defined on
EmployeeID and no Unique Constraint is ever defined on the Candidate Key
combination of First and Last Names, which is the true key of this table, but
should be and MUST be for this implementation to be in at least 1NF.
Moreover, while I am on it, I see far too often that such an IDENTITY or
GUID PK is also defined as the Clustered Index, if there is one at all. That
is a lousy choice. Not only should every table have a KEY (Primary or
Otherwise)--this is a logical or design requirement--every table should also
have a WELL CHOSEN Clustered Index or Key defined--this is a physical
recommendation.
A well chosen Clustered Index or Key will affect the construction of
statistics and other indexes on the table. It should be nearly unique, if
not a unique key. It should not change often. And, it should be considered
for RANGE type query requirements. In our example, not only do we need to
create a UNIQUE CONSTRAINT for the Name attribute, the Last Name or the Last
Name + First Name combination should be considered for the Clustered Index.
I hope I have not gone to Academic, or Theoretical--some might say, Cultish.
Nevertheless, these point go far too unnoticed by the mass of database
designers and users and needs to be said more often.
Sincerely,
Anthony Thomas
"Paul Sjoerdsma" wrote:
> Hi,
> I've looked at the online documentation and encountered this piece of
> text
> "If uniqueness must be enforced to ensure data integrity, create a
> UNIQUE or PRIMARY KEY constraint on the column rather than a unique
> index"
> I'm baffled; I understand a unique contraint, and personally do not
> care how it is physically enforced (using an index to speed up the
> process) as long as it is enforced. However what is the deal with
> unique indexes? The piece of documentation above seems to suggest that
> a unique index will not enforce uniqueness.
> The reason I ask is that I am working against a legacy system where
> unique indexes are used for each table. Each table, besides a PK, has
> a column that stores a GUID. Personally I would use a Unique
> Contraint, but they have used a unique index on this column. What
> would be the advantages and disadvantages of using a unique index vs a
> unique constraint.'
>
> regards
> Paul Sjoerdsma
>|||Anthony and Wayne,
Thanks for your answers. Although I would consider myself to be
experienced in database design (with the necessary theoretical
background) I still do not grok the full idea behind the concept of a
unique index; even after your elaborate explanations.
To me it seems to have raised it's head out of some murky swamp. There
are nicer places to visit, so a Unique Key constraint is what it will
be.
However if someone could give me a clear example where I would use a
unique index instead of a unique key I would be very happy,
thanks,
regards
Paul
On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
<AnthonyThomas@.discussions.microsoft.com> wrote:
>In addition to the other respondents reply, implicit in your question is
>really two seperate questions:
>|||> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
Very good question. IMO, there aren't any (*). Some say that you should use an index directly when
the uniqueness isn't a property of the data model per se, but IMO, I have a hard time finding such
an example. If I want to enforce uniqueness, I surely want to expose this at the logical level
(constraint) as well, right?
(*) Here's an exception. There are at least one attribute you can define for a unique index which
isn't available through a unique constraint. If this index attribute is important for you, define an
index directly and not though a unique constraint. In SQL Server 2005, this will be available
through a constraint...
Be prepared that different people have different viewpoints about this. It is a bit like the age-old
discussion "natural keys vs. surrogate keys".
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"Paul Sjoerdsma" <paul.nospam.@.pasoftware.nl> wrote in message
news:h9dmo0dce287fe5gdjoo0ro4mvgqrsggb5@.4ax.com...
> Anthony and Wayne,
> Thanks for your answers. Although I would consider myself to be
> experienced in database design (with the necessary theoretical
> background) I still do not grok the full idea behind the concept of a
> unique index; even after your elaborate explanations.
> To me it seems to have raised it's head out of some murky swamp. There
> are nicer places to visit, so a Unique Key constraint is what it will
> be.
>
> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
>
> thanks,
> regards
> Paul
>
> On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
> <AnthonyThomas@.discussions.microsoft.com> wrote:
> >In addition to the other respondents reply, implicit in your question is
> >really two seperate questions:
> >
>|||I have two real life examples where I would use a unique index as
opposed to a unique constraint.
1. When you have a relation table that contains the keys of two
referenced tables. For example the table StudentCourses which lists the
courses for each student. It might be defined as:
CREATE TABLE StudentCourses
(StudentID int not null REFERENCES Students
,CourseID int not null REFERENCES Courses
,CONSTRAINT PK_StudentCourses PRIMARY KEY (StudentID, CourseID)
)
In this case there already is a unique constraint on (StudentID,
CourseID). For performance purposes I would add a unique index on
(CourseID, StudentID). This would benefit query like:
SELECT Course, COUNT(*) AS StudentCount
FROM Courses C1
INNER JOIN StudentCourses CS1
ON CS1.CourseID = C1.CourseID
GROUP BY Course
2. When a (compound) covering index is needed, and one of its columns is
by itself unique. When I have the choice between a nonunique and a
unique index, I choose a unique index. But the columns of the covering
index do not constitute a new or different "unique relation", so a
unique constraint would be inappropriate.
The rule I use for unique constraints is that you should be able to
remove all user created unique indexes, and the data model would still
be consistent and enforced.
Hope this helps,
Gert-Jan
Paul Sjoerdsma wrote:
> Anthony and Wayne,
> Thanks for your answers. Although I would consider myself to be
> experienced in database design (with the necessary theoretical
> background) I still do not grok the full idea behind the concept of a
> unique index; even after your elaborate explanations.
> To me it seems to have raised it's head out of some murky swamp. There
> are nicer places to visit, so a Unique Key constraint is what it will
> be.
> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
> thanks,
> regards
> Paul
> On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
> <AnthonyThomas@.discussions.microsoft.com> wrote:
> >In addition to the other respondents reply, implicit in your question is
> >really two seperate questions:
> >
I've looked at the online documentation and encountered this piece of
text
"If uniqueness must be enforced to ensure data integrity, create a
UNIQUE or PRIMARY KEY constraint on the column rather than a unique
index"
I'm baffled; I understand a unique contraint, and personally do not
care how it is physically enforced (using an index to speed up the
process) as long as it is enforced. However what is the deal with
unique indexes? The piece of documentation above seems to suggest that
a unique index will not enforce uniqueness.
The reason I ask is that I am working against a legacy system where
unique indexes are used for each table. Each table, besides a PK, has
a column that stores a GUID. Personally I would use a Unique
Contraint, but they have used a unique index on this column. What
would be the advantages and disadvantages of using a unique index vs a
unique constraint.'
regards
Paul SjoerdsmaYes, a unique index enforces uniqueness. A unique index is created in the
background when you specify a PK or Unique constraint.
The difference is more semantic, but also tactical... Placing the constraint
indicates a design requirement. When you create the index directly, how can
you know IF the index was created for performance, or if the index is
intended to enforce some design constraint... The fact is that you can
not... So use PK and Uniques to enforce design requirements...Indexes are
created for 2 main reasons, to enforce integrity and speed up queries.The
indexes used to speed up queries( or other statements) may need to change
over time as business priorities change, but we would still need to maintain
the indexes created to enforce data integrity.
Indexes created as a by-product of PK or unique constraints can NOT be
dropped using DROP index, so you are protected from inadvertently dropping
an index which is used for data integrity. FOr this you must use alter table
drop constraint, alter table add constraint... Any index that goes away
using DROP index, therefore, would have been created for performance, and is
a candidate for re-assessment as needs change
Additionally FKs must be placed on columns with PKs or Unique
constraints.(from BOL). HOwever the actual code allows you to put an FK on
any column which has a unique index ( although I beleive this should NOT be
allowed)..
Hope this helps/
--
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"Paul Sjoerdsma" <paul@.pasoftware.nl> wrote in message
news:2j8ko0pdll18amito60gcqne5g58en6otj@.4ax.com...
> Hi,
> I've looked at the online documentation and encountered this piece of
> text
> "If uniqueness must be enforced to ensure data integrity, create a
> UNIQUE or PRIMARY KEY constraint on the column rather than a unique
> index"
> I'm baffled; I understand a unique contraint, and personally do not
> care how it is physically enforced (using an index to speed up the
> process) as long as it is enforced. However what is the deal with
> unique indexes? The piece of documentation above seems to suggest that
> a unique index will not enforce uniqueness.
> The reason I ask is that I am working against a legacy system where
> unique indexes are used for each table. Each table, besides a PK, has
> a column that stores a GUID. Personally I would use a Unique
> Contraint, but they have used a unique index on this column. What
> would be the advantages and disadvantages of using a unique index vs a
> unique constraint.'
>
> regards
> Paul Sjoerdsma
>|||In addition to the other respondents reply, implicit in your question is
really two seperate questions:
What is the difference between the use of PK and Unique Constraints and
Unique Indexes? The was answered by the other respondent. Except, so you
know, there is no such thing as a PK index, only a constraint, which is also
enforced through a Unique Index.
Second, you made mention of PK and GUID, which I am assuming you implied
what is typically used some sort of INT IDENTITY. Both of these are unique
by design, one locally to the table, the other across all tables, databases,
and time. However, these are mearly mechanics. They do not garauntee
uniqueness of the data, which is the sole purpose of KEYS, Primary or
Alternate. Moreover, PKs and Unique Constraints are not limited to IDENTITY
nor GUID. For example:
EmployeeID LastName FirstName
1 Smith John
2 Jones Tom
3 Marshall Eric
4 Smith John
Although EmployeeID is unique, because it is IDENTITY and is so by
construction, for this table, the DATA is NOT unique: John Smith is repeated.
Therefore, this table is NOT even a table by relational design--it is not in
1NF (First Normal Form).
The real key for this entity is the Logical Name attribute, which is
composite upon FirstName and LastName. Since that is pretty much all of the
data, it is also the Primary Key candidate. However, if this table were
JOINed to some other not only would you have to populate both attributes to
that table, you would have to populate string types (data storage
inefficient) and would have to include both attributes in any Foreign Key or
JOIN declaration, also highly inefficient.
So, oftentimes, one creates the IDENTITY or GUID attributes to simplify
these requirements. What happens, TOO OFTEN, is the PK is defined on
EmployeeID and no Unique Constraint is ever defined on the Candidate Key
combination of First and Last Names, which is the true key of this table, but
should be and MUST be for this implementation to be in at least 1NF.
Moreover, while I am on it, I see far too often that such an IDENTITY or
GUID PK is also defined as the Clustered Index, if there is one at all. That
is a lousy choice. Not only should every table have a KEY (Primary or
Otherwise)--this is a logical or design requirement--every table should also
have a WELL CHOSEN Clustered Index or Key defined--this is a physical
recommendation.
A well chosen Clustered Index or Key will affect the construction of
statistics and other indexes on the table. It should be nearly unique, if
not a unique key. It should not change often. And, it should be considered
for RANGE type query requirements. In our example, not only do we need to
create a UNIQUE CONSTRAINT for the Name attribute, the Last Name or the Last
Name + First Name combination should be considered for the Clustered Index.
I hope I have not gone to Academic, or Theoretical--some might say, Cultish.
Nevertheless, these point go far too unnoticed by the mass of database
designers and users and needs to be said more often.
Sincerely,
Anthony Thomas
"Paul Sjoerdsma" wrote:
> Hi,
> I've looked at the online documentation and encountered this piece of
> text
> "If uniqueness must be enforced to ensure data integrity, create a
> UNIQUE or PRIMARY KEY constraint on the column rather than a unique
> index"
> I'm baffled; I understand a unique contraint, and personally do not
> care how it is physically enforced (using an index to speed up the
> process) as long as it is enforced. However what is the deal with
> unique indexes? The piece of documentation above seems to suggest that
> a unique index will not enforce uniqueness.
> The reason I ask is that I am working against a legacy system where
> unique indexes are used for each table. Each table, besides a PK, has
> a column that stores a GUID. Personally I would use a Unique
> Contraint, but they have used a unique index on this column. What
> would be the advantages and disadvantages of using a unique index vs a
> unique constraint.'
>
> regards
> Paul Sjoerdsma
>|||Anthony and Wayne,
Thanks for your answers. Although I would consider myself to be
experienced in database design (with the necessary theoretical
background) I still do not grok the full idea behind the concept of a
unique index; even after your elaborate explanations.
To me it seems to have raised it's head out of some murky swamp. There
are nicer places to visit, so a Unique Key constraint is what it will
be.
However if someone could give me a clear example where I would use a
unique index instead of a unique key I would be very happy,
thanks,
regards
Paul
On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
<AnthonyThomas@.discussions.microsoft.com> wrote:
>In addition to the other respondents reply, implicit in your question is
>really two seperate questions:
>|||> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
Very good question. IMO, there aren't any (*). Some say that you should use an index directly when
the uniqueness isn't a property of the data model per se, but IMO, I have a hard time finding such
an example. If I want to enforce uniqueness, I surely want to expose this at the logical level
(constraint) as well, right?
(*) Here's an exception. There are at least one attribute you can define for a unique index which
isn't available through a unique constraint. If this index attribute is important for you, define an
index directly and not though a unique constraint. In SQL Server 2005, this will be available
through a constraint...
Be prepared that different people have different viewpoints about this. It is a bit like the age-old
discussion "natural keys vs. surrogate keys".
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"Paul Sjoerdsma" <paul.nospam.@.pasoftware.nl> wrote in message
news:h9dmo0dce287fe5gdjoo0ro4mvgqrsggb5@.4ax.com...
> Anthony and Wayne,
> Thanks for your answers. Although I would consider myself to be
> experienced in database design (with the necessary theoretical
> background) I still do not grok the full idea behind the concept of a
> unique index; even after your elaborate explanations.
> To me it seems to have raised it's head out of some murky swamp. There
> are nicer places to visit, so a Unique Key constraint is what it will
> be.
>
> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
>
> thanks,
> regards
> Paul
>
> On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
> <AnthonyThomas@.discussions.microsoft.com> wrote:
> >In addition to the other respondents reply, implicit in your question is
> >really two seperate questions:
> >
>|||I have two real life examples where I would use a unique index as
opposed to a unique constraint.
1. When you have a relation table that contains the keys of two
referenced tables. For example the table StudentCourses which lists the
courses for each student. It might be defined as:
CREATE TABLE StudentCourses
(StudentID int not null REFERENCES Students
,CourseID int not null REFERENCES Courses
,CONSTRAINT PK_StudentCourses PRIMARY KEY (StudentID, CourseID)
)
In this case there already is a unique constraint on (StudentID,
CourseID). For performance purposes I would add a unique index on
(CourseID, StudentID). This would benefit query like:
SELECT Course, COUNT(*) AS StudentCount
FROM Courses C1
INNER JOIN StudentCourses CS1
ON CS1.CourseID = C1.CourseID
GROUP BY Course
2. When a (compound) covering index is needed, and one of its columns is
by itself unique. When I have the choice between a nonunique and a
unique index, I choose a unique index. But the columns of the covering
index do not constitute a new or different "unique relation", so a
unique constraint would be inappropriate.
The rule I use for unique constraints is that you should be able to
remove all user created unique indexes, and the data model would still
be consistent and enforced.
Hope this helps,
Gert-Jan
Paul Sjoerdsma wrote:
> Anthony and Wayne,
> Thanks for your answers. Although I would consider myself to be
> experienced in database design (with the necessary theoretical
> background) I still do not grok the full idea behind the concept of a
> unique index; even after your elaborate explanations.
> To me it seems to have raised it's head out of some murky swamp. There
> are nicer places to visit, so a Unique Key constraint is what it will
> be.
> However if someone could give me a clear example where I would use a
> unique index instead of a unique key I would be very happy,
> thanks,
> regards
> Paul
> On Thu, 4 Nov 2004 05:34:01 -0800, "AnthonyThomas"
> <AnthonyThomas@.discussions.microsoft.com> wrote:
> >In addition to the other respondents reply, implicit in your question is
> >really two seperate questions:
> >
Labels:
constraint,
create,
creating,
database,
documentation,
encountered,
enforced,
ensure,
index,
integrity,
microsoft,
mysql,
online,
oracle,
piece,
server,
sql,
text,
unique,
uniqueness
Tuesday, February 14, 2012
Determining unique children
Hi,
I need to be able to identify if a parent has the same child names as another parent.
Have a look at my data (in pic)
Can you think of a SELECT that will count for each parent, the number of other parents including themselves with all the same children's names?
So you would get something like:
Bob | 2
Bill | 1
Glen | 2
thanks,
david.That's some interesting homework!
CREATE TABLE #patp (
parent VARCHAR(20)
, child VARCHAR(20)
)
INSERT INTO #patp (parent, child)
SELECT 'Bob', 'Jen' UNION
SELECT 'Bob', 'Jill' UNION
SELECT 'Bob', 'Mike' UNION
SELECT 'Bill', 'David' UNION
SELECT 'Bill', 'Steve' UNION
SELECT 'Glen', 'Jen' UNION
SELECT 'Glen', 'Jill' UNION
SELECT 'Glen', 'Mike'
SELECT DISTINCT a.parent
, (SELECT Count(*)
FROM (SELECT b.parent AS p1, c.parent AS p2
FROM #patp AS b
FULL JOIN #patp AS c
ON (c.child = b.child)
WHERE c.parent = a.parent
GROUP BY b.parent, c.parent
HAVING Count(*) = Count(b.parent)
AND Count(*) = Count(c.parent)) AS z)
FROM #patp AS a
DROP TABLE #patp
-PatP|||OK now, David...tell us why that works,now.
and show your work!!! ;)|||C'mon Pat. This is not the first time this guy has posted his homework for someone else to do. Reviewing his code for errors is one thing, but writing code for him is helping him cheat.|||let's hope he turns it in using the #patp table name, at least ;)
...and not even a "thank you, you can have half of my class credits"|||but writing code for him is helping him cheat.Only if he's incredibly gutsy and not very bright. I think he's smarter than that.
...and not even a "thank you, you can have half of my class credits"Why only half? If he retypes it for presentation, I'd give him 20%, but that's about all.
As TallCowboy0614 pointed out earlier, the real fun will come when David gets to explain that code for the class!
-PatP|||As TallCowboy0614 pointed out earlier, the real fun will come when David gets to explain that code for the class!
I'd be hard pressed....
working on it|||thanks for the code Pat!
I'm not a student, and that's not a homework assignment. (Been a while since then). I'm actually going to be using it (obviously re-worked to my own data) as an intergrity check for data Im pulling from a .xls file manually entered by a business user.
I was going to end up creating a function, but I knew someone here would have the guru-ishness to do it with a straight SELECT...obviously advantageous.
For simplicity I provided an example that I suppose looked like a homework assignment...LOL...the real scenario just would have been really annoying and difficult to explain. (and would have been one of those posts that people skip over)
as for explaining the code...cripes...i thought my SQL was pretty good. Lots more to learn. Although, I will be able to re-use what you provided Pat.
cheers.
I need to be able to identify if a parent has the same child names as another parent.
Have a look at my data (in pic)
Can you think of a SELECT that will count for each parent, the number of other parents including themselves with all the same children's names?
So you would get something like:
Bob | 2
Bill | 1
Glen | 2
thanks,
david.That's some interesting homework!
CREATE TABLE #patp (
parent VARCHAR(20)
, child VARCHAR(20)
)
INSERT INTO #patp (parent, child)
SELECT 'Bob', 'Jen' UNION
SELECT 'Bob', 'Jill' UNION
SELECT 'Bob', 'Mike' UNION
SELECT 'Bill', 'David' UNION
SELECT 'Bill', 'Steve' UNION
SELECT 'Glen', 'Jen' UNION
SELECT 'Glen', 'Jill' UNION
SELECT 'Glen', 'Mike'
SELECT DISTINCT a.parent
, (SELECT Count(*)
FROM (SELECT b.parent AS p1, c.parent AS p2
FROM #patp AS b
FULL JOIN #patp AS c
ON (c.child = b.child)
WHERE c.parent = a.parent
GROUP BY b.parent, c.parent
HAVING Count(*) = Count(b.parent)
AND Count(*) = Count(c.parent)) AS z)
FROM #patp AS a
DROP TABLE #patp
-PatP|||OK now, David...tell us why that works,now.
and show your work!!! ;)|||C'mon Pat. This is not the first time this guy has posted his homework for someone else to do. Reviewing his code for errors is one thing, but writing code for him is helping him cheat.|||let's hope he turns it in using the #patp table name, at least ;)
...and not even a "thank you, you can have half of my class credits"|||but writing code for him is helping him cheat.Only if he's incredibly gutsy and not very bright. I think he's smarter than that.
...and not even a "thank you, you can have half of my class credits"Why only half? If he retypes it for presentation, I'd give him 20%, but that's about all.
As TallCowboy0614 pointed out earlier, the real fun will come when David gets to explain that code for the class!
-PatP|||As TallCowboy0614 pointed out earlier, the real fun will come when David gets to explain that code for the class!
I'd be hard pressed....
working on it|||thanks for the code Pat!
I'm not a student, and that's not a homework assignment. (Been a while since then). I'm actually going to be using it (obviously re-worked to my own data) as an intergrity check for data Im pulling from a .xls file manually entered by a business user.
I was going to end up creating a function, but I knew someone here would have the guru-ishness to do it with a straight SELECT...obviously advantageous.
For simplicity I provided an example that I suppose looked like a homework assignment...LOL...the real scenario just would have been really annoying and difficult to explain. (and would have been one of those posts that people skip over)
as for explaining the code...cripes...i thought my SQL was pretty good. Lots more to learn. Although, I will be able to re-use what you provided Pat.
cheers.
Determining single-column candidate keys
This script will tell you all of the unique columns (except BLOBs) in
all of the tables in a database.
I made a post last night with some VBA code for exporting data from
Access. I've been given a 2 GB Access database with scores of tables,
some of them having hundreds of columns, millions of rows, and no
constraints. I also know very little about the underlying data. After
getting the data into SQL Server, I wanted to learn more about the
nature of the data. The first thing I wanted to find was which columns
could be single-column candidate keys.
One of the issues I ran into is that my generated SQL often exceeded
4000 characters, so I needed to come up with a way of executing more
than 4000 characters. Also, y'all'll notice that I'm using cursors
quite a bit here. These cursors only iterate through a few rows. The
heavy stuff is pretty efficient SQL.
-Alan
SET NOCOUNT ON
CREATE TABLE #UniqueColumns
(
TableName sysname
, ColumnName sysname
, IsUnique CHAR(1)
, PRIMARY KEY (TableName, ColumnName)
)
DECLARE @.TableName sysname
, @.ColumnName sysname
DECLARE cTables CURSOR
LOCAL
FAST_FORWARD
FOR
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
CREATE TABLE #SQLFragments
(
RowNumber INT
IDENTITY
, SQL NVARCHAR(4000)
, Type VARCHAR(6)
NOT NULL
CHECK (Type IN ('SELECT', 'UPDATE'))
)
OPEN cTables
WHILE 1 = 1 BEGIN
FETCH NEXT FROM cTables INTO @.TableName
IF @.@.FETCH_STATUS <> 0 BREAK
DECLARE @.SQL NVARCHAR(4000)
, @.SQLUpdateUnique NVARCHAR(4000)
SET @.SQL = N'SELECT '
SET @.SQLUpdateUnique = N''
DECLARE c CURSOR
LOCAL
FAST_FORWARD
FOR
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.columns
WHERE TABLE_NAME = @.TableName
AND DATA_TYPE NOT IN ('text', 'ntext', 'image')
ORDER BY ORDINAL_POSITION
OPEN c
WHILE 1 = 1 BEGIN
FETCH NEXT FROM c INTO @.ColumnName
IF @.@.FETCH_STATUS <> 0 BREAK
INSERT #UniqueColumns (TableName, ColumnName)
SELECT @.TableName, @.ColumnName
SET @.SQL = @.SQL + '
CASE WHEN COUNT (DISTINCT ' + @.ColumnName + ') = COUNT(*) THEN ''Y''
ELSE ''N'' END ' + @.ColumnName + ','
SET @.SQLUpdateUnique = @.SQLUpdateUnique + 'UPDATE #UniqueColumns SET
IsUnique = ' + @.ColumnName + ' FROM ##Temp WHERE TableName = ''' +
@.TableName + ''' AND ColumnName = ''' + @.ColumnName + '''
'
IF LEN(@.SQL) > 3000 BEGIN
INSERT #SQLFragments (SQL, Type)
SELECT @.SQL, 'SELECT'
SET @.SQL = ''
END
IF LEN(@.SQLUpdateUnique) > 3000 BEGIN
INSERT #SQLFragments (SQL, Type)
SELECT @.SQLUpdateUnique, 'UPDATE'
SET @.SQLUpdateUnique = ''
END
END
CLOSE c
DEALLOCATE c
SET @.SQL = LEFT(@.SQL, LEN(@.SQL) - 1) + ' INTO ##Temp FROM ' +
@.TableName
INSERT #SQLFragments (SQL, Type)
SELECT @.SQL, 'SELECT'
INSERT #SQLFragments (SQL, Type)
SELECT @.SQLUpdateUnique, 'UPDATE'
DECLARE @.FragmentCount INT
SELECT @.FragmentCount = COUNT(*) FROM #SQLFragments WHERE Type =
'SELECT'
DECLARE @.SQLEXEC NVARCHAR(4000)
, @.Fragment NVARCHAR(4000)
, @.RowNumber VARCHAR(10)
SET @.SQL = N''
SET @.SQLEXEC = N'EXEC ('
DECLARE cFragments CURSOR
LOCAL
FAST_FORWARD
FOR
SELECT SQL, RowNumber
FROM #SQLFragments
WHERE Type = 'SELECT'
ORDER BY RowNumber
OPEN cFragments
WHILE 1 = 1 BEGIN
FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
IF @.@.FETCH_STATUS <> 0 BREAK
SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
= ' + @.RowNumber + N'
'
SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
END
CLOSE cFragments
DEALLOCATE cFragments
SET @.SQLEXEC = LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + ')'
SET @.SQL = @.SQL + N' ' + @.SQLEXEC
EXEC sp_executesql @.SQL
SET @.SQL = N''
SET @.SQLEXEC = N'EXEC ('
DECLARE cFragments CURSOR
LOCAL
FAST_FORWARD
FOR
SELECT SQL, RowNumber
FROM #SQLFragments
WHERE Type = 'UPDATE'
ORDER BY RowNumber
OPEN cFragments
WHILE 1 = 1 BEGIN
FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
IF @.@.FETCH_STATUS <> 0 BREAK
SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
= ' + @.RowNumber + N'
'
SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
END
CLOSE cFragments
DEALLOCATE cFragments
SET @.SQL = @.SQL + N' ' + LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + N')'
EXEC sp_executesql @.SQL
DELETE #SQLFragments
DROP TABLE ##Temp
END
CLOSE cTables
DEALLOCATE cTables
DELETE #UniqueColumns WHERE IsUnique = 'N'
SELECT * FROM #UniqueColumns
DROP TABLE #UniqueColumns
DROP TABLE #SQLFragmentsYou can analyze the data and make assumptions about keys and relationships,
but can the source of the data also provide a data dictionary?
"Alan Samet" <alansamet@.gmail.com> wrote in message
news:1141922084.124478.54740@.j52g2000cwj.googlegroups.com...
> This script will tell you all of the unique columns (except BLOBs) in
> all of the tables in a database.
> I made a post last night with some VBA code for exporting data from
> Access. I've been given a 2 GB Access database with scores of tables,
> some of them having hundreds of columns, millions of rows, and no
> constraints. I also know very little about the underlying data. After
> getting the data into SQL Server, I wanted to learn more about the
> nature of the data. The first thing I wanted to find was which columns
> could be single-column candidate keys.
> One of the issues I ran into is that my generated SQL often exceeded
> 4000 characters, so I needed to come up with a way of executing more
> than 4000 characters. Also, y'all'll notice that I'm using cursors
> quite a bit here. These cursors only iterate through a few rows. The
> heavy stuff is pretty efficient SQL.
> -Alan
> SET NOCOUNT ON
> CREATE TABLE #UniqueColumns
> (
> TableName sysname
> , ColumnName sysname
> , IsUnique CHAR(1)
> , PRIMARY KEY (TableName, ColumnName)
> )
> DECLARE @.TableName sysname
> , @.ColumnName sysname
> DECLARE cTables CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT TABLE_NAME
> FROM INFORMATION_SCHEMA.TABLES
> CREATE TABLE #SQLFragments
> (
> RowNumber INT
> IDENTITY
> , SQL NVARCHAR(4000)
> , Type VARCHAR(6)
> NOT NULL
> CHECK (Type IN ('SELECT', 'UPDATE'))
> )
> OPEN cTables
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cTables INTO @.TableName
> IF @.@.FETCH_STATUS <> 0 BREAK
> DECLARE @.SQL NVARCHAR(4000)
> , @.SQLUpdateUnique NVARCHAR(4000)
> SET @.SQL = N'SELECT '
> SET @.SQLUpdateUnique = N''
> DECLARE c CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT COLUMN_NAME
> FROM INFORMATION_SCHEMA.columns
> WHERE TABLE_NAME = @.TableName
> AND DATA_TYPE NOT IN ('text', 'ntext', 'image')
> ORDER BY ORDINAL_POSITION
> OPEN c
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM c INTO @.ColumnName
> IF @.@.FETCH_STATUS <> 0 BREAK
> INSERT #UniqueColumns (TableName, ColumnName)
> SELECT @.TableName, @.ColumnName
> SET @.SQL = @.SQL + '
> CASE WHEN COUNT (DISTINCT ' + @.ColumnName + ') = COUNT(*) THEN ''Y''
> ELSE ''N'' END ' + @.ColumnName + ','
> SET @.SQLUpdateUnique = @.SQLUpdateUnique + 'UPDATE #UniqueColumns SET
> IsUnique = ' + @.ColumnName + ' FROM ##Temp WHERE TableName = ''' +
> @.TableName + ''' AND ColumnName = ''' + @.ColumnName + '''
> '
> IF LEN(@.SQL) > 3000 BEGIN
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQL, 'SELECT'
> SET @.SQL = ''
> END
> IF LEN(@.SQLUpdateUnique) > 3000 BEGIN
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQLUpdateUnique, 'UPDATE'
> SET @.SQLUpdateUnique = ''
> END
> END
> CLOSE c
> DEALLOCATE c
> SET @.SQL = LEFT(@.SQL, LEN(@.SQL) - 1) + ' INTO ##Temp FROM ' +
> @.TableName
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQL, 'SELECT'
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQLUpdateUnique, 'UPDATE'
> DECLARE @.FragmentCount INT
> SELECT @.FragmentCount = COUNT(*) FROM #SQLFragments WHERE Type =
> 'SELECT'
> DECLARE @.SQLEXEC NVARCHAR(4000)
> , @.Fragment NVARCHAR(4000)
> , @.RowNumber VARCHAR(10)
> SET @.SQL = N''
> SET @.SQLEXEC = N'EXEC ('
> DECLARE cFragments CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT SQL, RowNumber
> FROM #SQLFragments
> WHERE Type = 'SELECT'
> ORDER BY RowNumber
> OPEN cFragments
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
> IF @.@.FETCH_STATUS <> 0 BREAK
> SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
> SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
> = ' + @.RowNumber + N'
> '
> SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
> END
> CLOSE cFragments
> DEALLOCATE cFragments
> SET @.SQLEXEC = LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + ')'
> SET @.SQL = @.SQL + N' ' + @.SQLEXEC
> EXEC sp_executesql @.SQL
> SET @.SQL = N''
> SET @.SQLEXEC = N'EXEC ('
> DECLARE cFragments CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT SQL, RowNumber
> FROM #SQLFragments
> WHERE Type = 'UPDATE'
> ORDER BY RowNumber
> OPEN cFragments
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
> IF @.@.FETCH_STATUS <> 0 BREAK
> SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
> SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
> = ' + @.RowNumber + N'
> '
> SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
> END
> CLOSE cFragments
> DEALLOCATE cFragments
> SET @.SQL = @.SQL + N' ' + LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + N')'
> EXEC sp_executesql @.SQL
> DELETE #SQLFragments
> DROP TABLE ##Temp
> END
> CLOSE cTables
> DEALLOCATE cTables
> DELETE #UniqueColumns WHERE IsUnique = 'N'
> SELECT * FROM #UniqueColumns
> DROP TABLE #UniqueColumns
> DROP TABLE #SQLFragments
>|||I can't get a data dictionary. This data is only in intermediate form
anyways. I'm writing my own database for the front-end application, but
I need some of the data that's in this database. A lot of it's not
normalized, et cetera. Once I know exactly what I need for my
application, the database I write will be properly constrained. The
purpose of the script was to reduce the amount of time it takes me to
understand the underlying data.
-Alan|||Alan,
I believe it's possible to execute longer strings by combining them and
using EXEC instead of sp_executesql eg:
EXEC( @.select + ' ' + @.from )
However, that should probably telling you there might be another way to do
it. How about capturing your information first?
ie have two tables, one to represent the tables in your Access database and
one to represent the columns ( or attributes Joe )? Something like:
Table: access_tables
at_id
table_name
rows
Table: access_fields
af_id
at_id
field_name
unique values
rows
Run one script which captures the data required, then you can run nice fast
SELECTs on real data:
SELECT *
FROM access_tables t
INNER JOIN access_fields f ON t.at_id = f.at_id
WHERE t.rows = f.unique_values
It's sometimes frowned upon to gather metadata like this in your database
but I see what you're trying to do.
Have fun!
Damien
"Alan Samet" wrote:
> This script will tell you all of the unique columns (except BLOBs) in
> all of the tables in a database.
> I made a post last night with some VBA code for exporting data from
> Access. I've been given a 2 GB Access database with scores of tables,
> some of them having hundreds of columns, millions of rows, and no
> constraints. I also know very little about the underlying data. After
> getting the data into SQL Server, I wanted to learn more about the
> nature of the data. The first thing I wanted to find was which columns
> could be single-column candidate keys.
> One of the issues I ran into is that my generated SQL often exceeded
> 4000 characters, so I needed to come up with a way of executing more
> than 4000 characters. Also, y'all'll notice that I'm using cursors
> quite a bit here. These cursors only iterate through a few rows. The
> heavy stuff is pretty efficient SQL.
> -Alan
> SET NOCOUNT ON
> CREATE TABLE #UniqueColumns
> (
> TableName sysname
> , ColumnName sysname
> , IsUnique CHAR(1)
> , PRIMARY KEY (TableName, ColumnName)
> )
> DECLARE @.TableName sysname
> , @.ColumnName sysname
> DECLARE cTables CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT TABLE_NAME
> FROM INFORMATION_SCHEMA.TABLES
> CREATE TABLE #SQLFragments
> (
> RowNumber INT
> IDENTITY
> , SQL NVARCHAR(4000)
> , Type VARCHAR(6)
> NOT NULL
> CHECK (Type IN ('SELECT', 'UPDATE'))
> )
> OPEN cTables
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cTables INTO @.TableName
> IF @.@.FETCH_STATUS <> 0 BREAK
> DECLARE @.SQL NVARCHAR(4000)
> , @.SQLUpdateUnique NVARCHAR(4000)
> SET @.SQL = N'SELECT '
> SET @.SQLUpdateUnique = N''
> DECLARE c CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT COLUMN_NAME
> FROM INFORMATION_SCHEMA.columns
> WHERE TABLE_NAME = @.TableName
> AND DATA_TYPE NOT IN ('text', 'ntext', 'image')
> ORDER BY ORDINAL_POSITION
> OPEN c
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM c INTO @.ColumnName
> IF @.@.FETCH_STATUS <> 0 BREAK
> INSERT #UniqueColumns (TableName, ColumnName)
> SELECT @.TableName, @.ColumnName
> SET @.SQL = @.SQL + '
> CASE WHEN COUNT (DISTINCT ' + @.ColumnName + ') = COUNT(*) THEN ''Y''
> ELSE ''N'' END ' + @.ColumnName + ','
> SET @.SQLUpdateUnique = @.SQLUpdateUnique + 'UPDATE #UniqueColumns SET
> IsUnique = ' + @.ColumnName + ' FROM ##Temp WHERE TableName = ''' +
> @.TableName + ''' AND ColumnName = ''' + @.ColumnName + '''
> '
> IF LEN(@.SQL) > 3000 BEGIN
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQL, 'SELECT'
> SET @.SQL = ''
> END
> IF LEN(@.SQLUpdateUnique) > 3000 BEGIN
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQLUpdateUnique, 'UPDATE'
> SET @.SQLUpdateUnique = ''
> END
> END
> CLOSE c
> DEALLOCATE c
> SET @.SQL = LEFT(@.SQL, LEN(@.SQL) - 1) + ' INTO ##Temp FROM ' +
> @.TableName
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQL, 'SELECT'
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQLUpdateUnique, 'UPDATE'
> DECLARE @.FragmentCount INT
> SELECT @.FragmentCount = COUNT(*) FROM #SQLFragments WHERE Type =
> 'SELECT'
> DECLARE @.SQLEXEC NVARCHAR(4000)
> , @.Fragment NVARCHAR(4000)
> , @.RowNumber VARCHAR(10)
> SET @.SQL = N''
> SET @.SQLEXEC = N'EXEC ('
> DECLARE cFragments CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT SQL, RowNumber
> FROM #SQLFragments
> WHERE Type = 'SELECT'
> ORDER BY RowNumber
> OPEN cFragments
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
> IF @.@.FETCH_STATUS <> 0 BREAK
> SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
> SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
> = ' + @.RowNumber + N'
> '
> SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
> END
> CLOSE cFragments
> DEALLOCATE cFragments
> SET @.SQLEXEC = LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + ')'
> SET @.SQL = @.SQL + N' ' + @.SQLEXEC
> EXEC sp_executesql @.SQL
> SET @.SQL = N''
> SET @.SQLEXEC = N'EXEC ('
> DECLARE cFragments CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT SQL, RowNumber
> FROM #SQLFragments
> WHERE Type = 'UPDATE'
> ORDER BY RowNumber
> OPEN cFragments
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
> IF @.@.FETCH_STATUS <> 0 BREAK
> SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
> SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
> = ' + @.RowNumber + N'
> '
> SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
> END
> CLOSE cFragments
> DEALLOCATE cFragments
> SET @.SQL = @.SQL + N' ' + LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + N')'
> EXEC sp_executesql @.SQL
> DELETE #SQLFragments
> DROP TABLE ##Temp
> END
> CLOSE cTables
> DEALLOCATE cTables
> DELETE #UniqueColumns WHERE IsUnique = 'N'
> SELECT * FROM #UniqueColumns
> DROP TABLE #UniqueColumns
> DROP TABLE #SQLFragments
>|||>> .. and one to represent the columns ( or attributes Joe )? <<
LOL! Eitgher, but not "fields", please !
We can avoid the metadata with a little work. Pull out the table and
column names then use a text editor to generate queries with this
template
/*
DROP TABLE Foobar;
CREATE TABLE Foobar
(known_key CHAR(2) NOT NULL PRIMARY KEY,
actual_key CHAR(2) NOT NULL,
non_key CHAR(2) NOT NULL,
non_key_null CHAR(2));
INSERT INTO Foobar VALUES ('K1', 'a', 'x', 'a');
INSERT INTO Foobar VALUES ('K2', 'b', 'x', NULL);
INSERT INTO Foobar VALUES ('K3', 'c', 'x', 'b');
INSERT INTO Foobar VALUES ('K4', 'd', 'y', 'c');
INSERT INTO Foobar VALUES ('K5', 'e', 'y', 'd');
INSERT INTO Foobar VALUES ('K6', 'f', 'y', 'e');
*/
SELECT 'Foobar' AS table_name,
CASE WHEN COUNT(DISTINCT known_key) = COUNT(*)
THEN 'Y' ELSE 'N' END AS known_key,
CASE WHEN COUNT(DISTINCT actual_key) = COUNT(*)
THEN 'Y' ELSE 'N' END AS actual_key,
CASE WHEN COUNT(DISTINCT non_key) = COUNT(*)
THEN 'Y' ELSE 'N' END AS non_key,
CASE WHEN COUNT(DISTINCT non_key_null) = COUNT(*)
THEN 'Y' ELSE 'N' END AS non_key_null
FROM Foobar
These should run faster than a cursor. NULLs and dups will cause
COUNT(*) and COUNT(DISTINCT col) to be unequal. COUNT(*) should be
computed only once, and the optimizer should catch NOT NULL UNIQUE
constraints to short-cut the equality test.
No need for loops.|||Joe, that's almost exactly what I did. The cursors were used for two
things:
building dynamic SQL statements for the select case when count(...)
statements and the update statement to update my result table. I
prioritized execution efficiency of the queries that determined
uniqueness over query generation.
In more detail, the technique I used is as follows.
Here is your table w/ unknown constraints:
CREATE TABLE Foobar
(
Column0 DATATYPE
, Column1 DATATYPE
..
, Column 499 DATATYPE
)
First, I created a table of all table names and column names with a
column to indicate whether the value was unique.
CREATE TABLE #UniqueColumns
(
TableName sysname
, ColumnName sysname
, IsUnique CHAR(1)
, PRIMARY KEY (TableName, ColumnName)
)
Next, for each table I determined which columns were unique. Since I'm
dealing with lots of records, I wanted this to be the most efficient.
I'd say this query did the trick.
SELECT CASE WHEN COUNT(DISTINCT Column0) = COUNT(*) THEN 'Y' ELSE 'N'
END Column0
, CASE WHEN COUNT(DISTINCT Column1) = COUNT(*) THEN 'Y' ELSE 'N' END
Column1
, CASE WHEN COUNT(DISTINCT Column2) = COUNT(*) THEN 'Y' ELSE 'N' END
Column2
... et cetera
INTO ##Temp
FROM TableName
I then updated my table with table and column names:
UPDATE #UniqueColumns SET IsUnique = Column0 FROM ##Temp WHERE
TableName = 'TableName' AND ColumnName = 'Column0'
UPDATE #UniqueColumns SET IsUnique = Column1 FROM ##Temp WHERE
TableName = 'TableName' AND ColumnName = 'Column1'
The only issue here was when my SQL exceeded 4000 characters. So, I
went ahead and created a temporary table to store the fragments. This
is one of the few cases where the IDENTITY keyword is actually useful.
The latter part of the query would take and create a script that looked
like the following and execute it:
DECLARE @.SQL1 NVARCHAR(4000)
SELECT @.SQL1 = SQL FROM #SQLFragments WHERE RowNumber = 1
DECLARE @.SQL2 NVARCHAR(4000)
SELECT @.SQL2 = SQL FROM #SQLFragments WHERE RowNumber = 2
EXEC (@.SQL1 + @.SQL2)
et cetera. This was my dynamic workaround for the 4000 character
limitation. In reality, I doubt anything would gone over 12,000, so I
could've probably gotten away with just a couple variables, but this
was way more fun.
-Alan
--CELKO-- wrote:
> LOL! Eitgher, but not "fields", please !
>
> We can avoid the metadata with a little work. Pull out the table and
> column names then use a text editor to generate queries with this
> template
> /*
> DROP TABLE Foobar;
> CREATE TABLE Foobar
> (known_key CHAR(2) NOT NULL PRIMARY KEY,
> actual_key CHAR(2) NOT NULL,
> non_key CHAR(2) NOT NULL,
> non_key_null CHAR(2));
> INSERT INTO Foobar VALUES ('K1', 'a', 'x', 'a');
> INSERT INTO Foobar VALUES ('K2', 'b', 'x', NULL);
> INSERT INTO Foobar VALUES ('K3', 'c', 'x', 'b');
> INSERT INTO Foobar VALUES ('K4', 'd', 'y', 'c');
> INSERT INTO Foobar VALUES ('K5', 'e', 'y', 'd');
> INSERT INTO Foobar VALUES ('K6', 'f', 'y', 'e');
> */
> SELECT 'Foobar' AS table_name,
> CASE WHEN COUNT(DISTINCT known_key) = COUNT(*)
> THEN 'Y' ELSE 'N' END AS known_key,
> CASE WHEN COUNT(DISTINCT actual_key) = COUNT(*)
> THEN 'Y' ELSE 'N' END AS actual_key,
> CASE WHEN COUNT(DISTINCT non_key) = COUNT(*)
> THEN 'Y' ELSE 'N' END AS non_key,
> CASE WHEN COUNT(DISTINCT non_key_null) = COUNT(*)
> THEN 'Y' ELSE 'N' END AS non_key_null
> FROM Foobar
> These should run faster than a cursor. NULLs and dups will cause
> COUNT(*) and COUNT(DISTINCT col) to be unequal. COUNT(*) should be
> computed only once, and the optimizer should catch NOT NULL UNIQUE
> constraints to short-cut the equality test.
> No need for loops.|||Alan,
Neat stuff!
My simpleminded approach to the same sort of problem follows. It
doesn't use the owner or schema, as it should, and doesn't filter on
datatypes, as it should, but the general idea is clear enough I think.
The first column of the result tells you the table name and the number
of rows, and I scan visually for the candidate keys. On the other
hand, it gives me a complete picture of how many values per column,
and that can be invaluable to sorting out mysterious data.
Roy Harvey
Beacon Falls, CT
SELECT CASE WHEN ORDINAL_POSITION = 1
THEN 'SELECT COUNT(*) as ' + TABLE_NAME + ',' +
char(13) + char(10)
ELSE ''
END +
' COUNT(distinct ' + COLUMN_NAME + ') as ' + COLUMN_NAME
+
CASE WHEN ORDINAL_POSITION <
(select max(ORDINAL_POSITION)
from INFORMATION_SCHEMA.COLUMNS as L
where C.TABLE_NAME = L.TABLE_NAME)
THEN ','
ELSE char(13) + char(10) +
' FROM ' + TABLE_NAME
END
from INFORMATION_SCHEMA.COLUMNS as C
order by TABLE_NAME, ORDINAL_POSITION
all of the tables in a database.
I made a post last night with some VBA code for exporting data from
Access. I've been given a 2 GB Access database with scores of tables,
some of them having hundreds of columns, millions of rows, and no
constraints. I also know very little about the underlying data. After
getting the data into SQL Server, I wanted to learn more about the
nature of the data. The first thing I wanted to find was which columns
could be single-column candidate keys.
One of the issues I ran into is that my generated SQL often exceeded
4000 characters, so I needed to come up with a way of executing more
than 4000 characters. Also, y'all'll notice that I'm using cursors
quite a bit here. These cursors only iterate through a few rows. The
heavy stuff is pretty efficient SQL.
-Alan
SET NOCOUNT ON
CREATE TABLE #UniqueColumns
(
TableName sysname
, ColumnName sysname
, IsUnique CHAR(1)
, PRIMARY KEY (TableName, ColumnName)
)
DECLARE @.TableName sysname
, @.ColumnName sysname
DECLARE cTables CURSOR
LOCAL
FAST_FORWARD
FOR
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
CREATE TABLE #SQLFragments
(
RowNumber INT
IDENTITY
, SQL NVARCHAR(4000)
, Type VARCHAR(6)
NOT NULL
CHECK (Type IN ('SELECT', 'UPDATE'))
)
OPEN cTables
WHILE 1 = 1 BEGIN
FETCH NEXT FROM cTables INTO @.TableName
IF @.@.FETCH_STATUS <> 0 BREAK
DECLARE @.SQL NVARCHAR(4000)
, @.SQLUpdateUnique NVARCHAR(4000)
SET @.SQL = N'SELECT '
SET @.SQLUpdateUnique = N''
DECLARE c CURSOR
LOCAL
FAST_FORWARD
FOR
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.columns
WHERE TABLE_NAME = @.TableName
AND DATA_TYPE NOT IN ('text', 'ntext', 'image')
ORDER BY ORDINAL_POSITION
OPEN c
WHILE 1 = 1 BEGIN
FETCH NEXT FROM c INTO @.ColumnName
IF @.@.FETCH_STATUS <> 0 BREAK
INSERT #UniqueColumns (TableName, ColumnName)
SELECT @.TableName, @.ColumnName
SET @.SQL = @.SQL + '
CASE WHEN COUNT (DISTINCT ' + @.ColumnName + ') = COUNT(*) THEN ''Y''
ELSE ''N'' END ' + @.ColumnName + ','
SET @.SQLUpdateUnique = @.SQLUpdateUnique + 'UPDATE #UniqueColumns SET
IsUnique = ' + @.ColumnName + ' FROM ##Temp WHERE TableName = ''' +
@.TableName + ''' AND ColumnName = ''' + @.ColumnName + '''
'
IF LEN(@.SQL) > 3000 BEGIN
INSERT #SQLFragments (SQL, Type)
SELECT @.SQL, 'SELECT'
SET @.SQL = ''
END
IF LEN(@.SQLUpdateUnique) > 3000 BEGIN
INSERT #SQLFragments (SQL, Type)
SELECT @.SQLUpdateUnique, 'UPDATE'
SET @.SQLUpdateUnique = ''
END
END
CLOSE c
DEALLOCATE c
SET @.SQL = LEFT(@.SQL, LEN(@.SQL) - 1) + ' INTO ##Temp FROM ' +
@.TableName
INSERT #SQLFragments (SQL, Type)
SELECT @.SQL, 'SELECT'
INSERT #SQLFragments (SQL, Type)
SELECT @.SQLUpdateUnique, 'UPDATE'
DECLARE @.FragmentCount INT
SELECT @.FragmentCount = COUNT(*) FROM #SQLFragments WHERE Type =
'SELECT'
DECLARE @.SQLEXEC NVARCHAR(4000)
, @.Fragment NVARCHAR(4000)
, @.RowNumber VARCHAR(10)
SET @.SQL = N''
SET @.SQLEXEC = N'EXEC ('
DECLARE cFragments CURSOR
LOCAL
FAST_FORWARD
FOR
SELECT SQL, RowNumber
FROM #SQLFragments
WHERE Type = 'SELECT'
ORDER BY RowNumber
OPEN cFragments
WHILE 1 = 1 BEGIN
FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
IF @.@.FETCH_STATUS <> 0 BREAK
SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
= ' + @.RowNumber + N'
'
SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
END
CLOSE cFragments
DEALLOCATE cFragments
SET @.SQLEXEC = LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + ')'
SET @.SQL = @.SQL + N' ' + @.SQLEXEC
EXEC sp_executesql @.SQL
SET @.SQL = N''
SET @.SQLEXEC = N'EXEC ('
DECLARE cFragments CURSOR
LOCAL
FAST_FORWARD
FOR
SELECT SQL, RowNumber
FROM #SQLFragments
WHERE Type = 'UPDATE'
ORDER BY RowNumber
OPEN cFragments
WHILE 1 = 1 BEGIN
FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
IF @.@.FETCH_STATUS <> 0 BREAK
SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
= ' + @.RowNumber + N'
'
SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
END
CLOSE cFragments
DEALLOCATE cFragments
SET @.SQL = @.SQL + N' ' + LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + N')'
EXEC sp_executesql @.SQL
DELETE #SQLFragments
DROP TABLE ##Temp
END
CLOSE cTables
DEALLOCATE cTables
DELETE #UniqueColumns WHERE IsUnique = 'N'
SELECT * FROM #UniqueColumns
DROP TABLE #UniqueColumns
DROP TABLE #SQLFragmentsYou can analyze the data and make assumptions about keys and relationships,
but can the source of the data also provide a data dictionary?
"Alan Samet" <alansamet@.gmail.com> wrote in message
news:1141922084.124478.54740@.j52g2000cwj.googlegroups.com...
> This script will tell you all of the unique columns (except BLOBs) in
> all of the tables in a database.
> I made a post last night with some VBA code for exporting data from
> Access. I've been given a 2 GB Access database with scores of tables,
> some of them having hundreds of columns, millions of rows, and no
> constraints. I also know very little about the underlying data. After
> getting the data into SQL Server, I wanted to learn more about the
> nature of the data. The first thing I wanted to find was which columns
> could be single-column candidate keys.
> One of the issues I ran into is that my generated SQL often exceeded
> 4000 characters, so I needed to come up with a way of executing more
> than 4000 characters. Also, y'all'll notice that I'm using cursors
> quite a bit here. These cursors only iterate through a few rows. The
> heavy stuff is pretty efficient SQL.
> -Alan
> SET NOCOUNT ON
> CREATE TABLE #UniqueColumns
> (
> TableName sysname
> , ColumnName sysname
> , IsUnique CHAR(1)
> , PRIMARY KEY (TableName, ColumnName)
> )
> DECLARE @.TableName sysname
> , @.ColumnName sysname
> DECLARE cTables CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT TABLE_NAME
> FROM INFORMATION_SCHEMA.TABLES
> CREATE TABLE #SQLFragments
> (
> RowNumber INT
> IDENTITY
> , SQL NVARCHAR(4000)
> , Type VARCHAR(6)
> NOT NULL
> CHECK (Type IN ('SELECT', 'UPDATE'))
> )
> OPEN cTables
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cTables INTO @.TableName
> IF @.@.FETCH_STATUS <> 0 BREAK
> DECLARE @.SQL NVARCHAR(4000)
> , @.SQLUpdateUnique NVARCHAR(4000)
> SET @.SQL = N'SELECT '
> SET @.SQLUpdateUnique = N''
> DECLARE c CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT COLUMN_NAME
> FROM INFORMATION_SCHEMA.columns
> WHERE TABLE_NAME = @.TableName
> AND DATA_TYPE NOT IN ('text', 'ntext', 'image')
> ORDER BY ORDINAL_POSITION
> OPEN c
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM c INTO @.ColumnName
> IF @.@.FETCH_STATUS <> 0 BREAK
> INSERT #UniqueColumns (TableName, ColumnName)
> SELECT @.TableName, @.ColumnName
> SET @.SQL = @.SQL + '
> CASE WHEN COUNT (DISTINCT ' + @.ColumnName + ') = COUNT(*) THEN ''Y''
> ELSE ''N'' END ' + @.ColumnName + ','
> SET @.SQLUpdateUnique = @.SQLUpdateUnique + 'UPDATE #UniqueColumns SET
> IsUnique = ' + @.ColumnName + ' FROM ##Temp WHERE TableName = ''' +
> @.TableName + ''' AND ColumnName = ''' + @.ColumnName + '''
> '
> IF LEN(@.SQL) > 3000 BEGIN
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQL, 'SELECT'
> SET @.SQL = ''
> END
> IF LEN(@.SQLUpdateUnique) > 3000 BEGIN
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQLUpdateUnique, 'UPDATE'
> SET @.SQLUpdateUnique = ''
> END
> END
> CLOSE c
> DEALLOCATE c
> SET @.SQL = LEFT(@.SQL, LEN(@.SQL) - 1) + ' INTO ##Temp FROM ' +
> @.TableName
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQL, 'SELECT'
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQLUpdateUnique, 'UPDATE'
> DECLARE @.FragmentCount INT
> SELECT @.FragmentCount = COUNT(*) FROM #SQLFragments WHERE Type =
> 'SELECT'
> DECLARE @.SQLEXEC NVARCHAR(4000)
> , @.Fragment NVARCHAR(4000)
> , @.RowNumber VARCHAR(10)
> SET @.SQL = N''
> SET @.SQLEXEC = N'EXEC ('
> DECLARE cFragments CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT SQL, RowNumber
> FROM #SQLFragments
> WHERE Type = 'SELECT'
> ORDER BY RowNumber
> OPEN cFragments
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
> IF @.@.FETCH_STATUS <> 0 BREAK
> SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
> SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
> = ' + @.RowNumber + N'
> '
> SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
> END
> CLOSE cFragments
> DEALLOCATE cFragments
> SET @.SQLEXEC = LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + ')'
> SET @.SQL = @.SQL + N' ' + @.SQLEXEC
> EXEC sp_executesql @.SQL
> SET @.SQL = N''
> SET @.SQLEXEC = N'EXEC ('
> DECLARE cFragments CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT SQL, RowNumber
> FROM #SQLFragments
> WHERE Type = 'UPDATE'
> ORDER BY RowNumber
> OPEN cFragments
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
> IF @.@.FETCH_STATUS <> 0 BREAK
> SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
> SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
> = ' + @.RowNumber + N'
> '
> SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
> END
> CLOSE cFragments
> DEALLOCATE cFragments
> SET @.SQL = @.SQL + N' ' + LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + N')'
> EXEC sp_executesql @.SQL
> DELETE #SQLFragments
> DROP TABLE ##Temp
> END
> CLOSE cTables
> DEALLOCATE cTables
> DELETE #UniqueColumns WHERE IsUnique = 'N'
> SELECT * FROM #UniqueColumns
> DROP TABLE #UniqueColumns
> DROP TABLE #SQLFragments
>|||I can't get a data dictionary. This data is only in intermediate form
anyways. I'm writing my own database for the front-end application, but
I need some of the data that's in this database. A lot of it's not
normalized, et cetera. Once I know exactly what I need for my
application, the database I write will be properly constrained. The
purpose of the script was to reduce the amount of time it takes me to
understand the underlying data.
-Alan|||Alan,
I believe it's possible to execute longer strings by combining them and
using EXEC instead of sp_executesql eg:
EXEC( @.select + ' ' + @.from )
However, that should probably telling you there might be another way to do
it. How about capturing your information first?
ie have two tables, one to represent the tables in your Access database and
one to represent the columns ( or attributes Joe )? Something like:
Table: access_tables
at_id
table_name
rows
Table: access_fields
af_id
at_id
field_name
unique values
rows
Run one script which captures the data required, then you can run nice fast
SELECTs on real data:
SELECT *
FROM access_tables t
INNER JOIN access_fields f ON t.at_id = f.at_id
WHERE t.rows = f.unique_values
It's sometimes frowned upon to gather metadata like this in your database
but I see what you're trying to do.
Have fun!
Damien
"Alan Samet" wrote:
> This script will tell you all of the unique columns (except BLOBs) in
> all of the tables in a database.
> I made a post last night with some VBA code for exporting data from
> Access. I've been given a 2 GB Access database with scores of tables,
> some of them having hundreds of columns, millions of rows, and no
> constraints. I also know very little about the underlying data. After
> getting the data into SQL Server, I wanted to learn more about the
> nature of the data. The first thing I wanted to find was which columns
> could be single-column candidate keys.
> One of the issues I ran into is that my generated SQL often exceeded
> 4000 characters, so I needed to come up with a way of executing more
> than 4000 characters. Also, y'all'll notice that I'm using cursors
> quite a bit here. These cursors only iterate through a few rows. The
> heavy stuff is pretty efficient SQL.
> -Alan
> SET NOCOUNT ON
> CREATE TABLE #UniqueColumns
> (
> TableName sysname
> , ColumnName sysname
> , IsUnique CHAR(1)
> , PRIMARY KEY (TableName, ColumnName)
> )
> DECLARE @.TableName sysname
> , @.ColumnName sysname
> DECLARE cTables CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT TABLE_NAME
> FROM INFORMATION_SCHEMA.TABLES
> CREATE TABLE #SQLFragments
> (
> RowNumber INT
> IDENTITY
> , SQL NVARCHAR(4000)
> , Type VARCHAR(6)
> NOT NULL
> CHECK (Type IN ('SELECT', 'UPDATE'))
> )
> OPEN cTables
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cTables INTO @.TableName
> IF @.@.FETCH_STATUS <> 0 BREAK
> DECLARE @.SQL NVARCHAR(4000)
> , @.SQLUpdateUnique NVARCHAR(4000)
> SET @.SQL = N'SELECT '
> SET @.SQLUpdateUnique = N''
> DECLARE c CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT COLUMN_NAME
> FROM INFORMATION_SCHEMA.columns
> WHERE TABLE_NAME = @.TableName
> AND DATA_TYPE NOT IN ('text', 'ntext', 'image')
> ORDER BY ORDINAL_POSITION
> OPEN c
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM c INTO @.ColumnName
> IF @.@.FETCH_STATUS <> 0 BREAK
> INSERT #UniqueColumns (TableName, ColumnName)
> SELECT @.TableName, @.ColumnName
> SET @.SQL = @.SQL + '
> CASE WHEN COUNT (DISTINCT ' + @.ColumnName + ') = COUNT(*) THEN ''Y''
> ELSE ''N'' END ' + @.ColumnName + ','
> SET @.SQLUpdateUnique = @.SQLUpdateUnique + 'UPDATE #UniqueColumns SET
> IsUnique = ' + @.ColumnName + ' FROM ##Temp WHERE TableName = ''' +
> @.TableName + ''' AND ColumnName = ''' + @.ColumnName + '''
> '
> IF LEN(@.SQL) > 3000 BEGIN
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQL, 'SELECT'
> SET @.SQL = ''
> END
> IF LEN(@.SQLUpdateUnique) > 3000 BEGIN
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQLUpdateUnique, 'UPDATE'
> SET @.SQLUpdateUnique = ''
> END
> END
> CLOSE c
> DEALLOCATE c
> SET @.SQL = LEFT(@.SQL, LEN(@.SQL) - 1) + ' INTO ##Temp FROM ' +
> @.TableName
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQL, 'SELECT'
> INSERT #SQLFragments (SQL, Type)
> SELECT @.SQLUpdateUnique, 'UPDATE'
> DECLARE @.FragmentCount INT
> SELECT @.FragmentCount = COUNT(*) FROM #SQLFragments WHERE Type =
> 'SELECT'
> DECLARE @.SQLEXEC NVARCHAR(4000)
> , @.Fragment NVARCHAR(4000)
> , @.RowNumber VARCHAR(10)
> SET @.SQL = N''
> SET @.SQLEXEC = N'EXEC ('
> DECLARE cFragments CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT SQL, RowNumber
> FROM #SQLFragments
> WHERE Type = 'SELECT'
> ORDER BY RowNumber
> OPEN cFragments
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
> IF @.@.FETCH_STATUS <> 0 BREAK
> SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
> SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
> = ' + @.RowNumber + N'
> '
> SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
> END
> CLOSE cFragments
> DEALLOCATE cFragments
> SET @.SQLEXEC = LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + ')'
> SET @.SQL = @.SQL + N' ' + @.SQLEXEC
> EXEC sp_executesql @.SQL
> SET @.SQL = N''
> SET @.SQLEXEC = N'EXEC ('
> DECLARE cFragments CURSOR
> LOCAL
> FAST_FORWARD
> FOR
> SELECT SQL, RowNumber
> FROM #SQLFragments
> WHERE Type = 'UPDATE'
> ORDER BY RowNumber
> OPEN cFragments
> WHILE 1 = 1 BEGIN
> FETCH NEXT FROM cFragments INTO @.Fragment, @.RowNumber
> IF @.@.FETCH_STATUS <> 0 BREAK
> SET @.SQL = @.SQL + N'DECLARE @.SQL' + @.RowNumber + N' NVARCHAR(4000)
> SELECT @.SQL' + @.RowNumber + N' = SQL FROM #SQLFragments WHERE RowNumber
> = ' + @.RowNumber + N'
> '
> SET @.SQLEXEC = @.SQLEXEC + N' @.SQL' + @.RowNumber + N' +'
> END
> CLOSE cFragments
> DEALLOCATE cFragments
> SET @.SQL = @.SQL + N' ' + LEFT(@.SQLEXEC, LEN(@.SQLEXEC) - 1) + N')'
> EXEC sp_executesql @.SQL
> DELETE #SQLFragments
> DROP TABLE ##Temp
> END
> CLOSE cTables
> DEALLOCATE cTables
> DELETE #UniqueColumns WHERE IsUnique = 'N'
> SELECT * FROM #UniqueColumns
> DROP TABLE #UniqueColumns
> DROP TABLE #SQLFragments
>|||>> .. and one to represent the columns ( or attributes Joe )? <<
LOL! Eitgher, but not "fields", please !
We can avoid the metadata with a little work. Pull out the table and
column names then use a text editor to generate queries with this
template
/*
DROP TABLE Foobar;
CREATE TABLE Foobar
(known_key CHAR(2) NOT NULL PRIMARY KEY,
actual_key CHAR(2) NOT NULL,
non_key CHAR(2) NOT NULL,
non_key_null CHAR(2));
INSERT INTO Foobar VALUES ('K1', 'a', 'x', 'a');
INSERT INTO Foobar VALUES ('K2', 'b', 'x', NULL);
INSERT INTO Foobar VALUES ('K3', 'c', 'x', 'b');
INSERT INTO Foobar VALUES ('K4', 'd', 'y', 'c');
INSERT INTO Foobar VALUES ('K5', 'e', 'y', 'd');
INSERT INTO Foobar VALUES ('K6', 'f', 'y', 'e');
*/
SELECT 'Foobar' AS table_name,
CASE WHEN COUNT(DISTINCT known_key) = COUNT(*)
THEN 'Y' ELSE 'N' END AS known_key,
CASE WHEN COUNT(DISTINCT actual_key) = COUNT(*)
THEN 'Y' ELSE 'N' END AS actual_key,
CASE WHEN COUNT(DISTINCT non_key) = COUNT(*)
THEN 'Y' ELSE 'N' END AS non_key,
CASE WHEN COUNT(DISTINCT non_key_null) = COUNT(*)
THEN 'Y' ELSE 'N' END AS non_key_null
FROM Foobar
These should run faster than a cursor. NULLs and dups will cause
COUNT(*) and COUNT(DISTINCT col) to be unequal. COUNT(*) should be
computed only once, and the optimizer should catch NOT NULL UNIQUE
constraints to short-cut the equality test.
No need for loops.|||Joe, that's almost exactly what I did. The cursors were used for two
things:
building dynamic SQL statements for the select case when count(...)
statements and the update statement to update my result table. I
prioritized execution efficiency of the queries that determined
uniqueness over query generation.
In more detail, the technique I used is as follows.
Here is your table w/ unknown constraints:
CREATE TABLE Foobar
(
Column0 DATATYPE
, Column1 DATATYPE
..
, Column 499 DATATYPE
)
First, I created a table of all table names and column names with a
column to indicate whether the value was unique.
CREATE TABLE #UniqueColumns
(
TableName sysname
, ColumnName sysname
, IsUnique CHAR(1)
, PRIMARY KEY (TableName, ColumnName)
)
Next, for each table I determined which columns were unique. Since I'm
dealing with lots of records, I wanted this to be the most efficient.
I'd say this query did the trick.
SELECT CASE WHEN COUNT(DISTINCT Column0) = COUNT(*) THEN 'Y' ELSE 'N'
END Column0
, CASE WHEN COUNT(DISTINCT Column1) = COUNT(*) THEN 'Y' ELSE 'N' END
Column1
, CASE WHEN COUNT(DISTINCT Column2) = COUNT(*) THEN 'Y' ELSE 'N' END
Column2
... et cetera
INTO ##Temp
FROM TableName
I then updated my table with table and column names:
UPDATE #UniqueColumns SET IsUnique = Column0 FROM ##Temp WHERE
TableName = 'TableName' AND ColumnName = 'Column0'
UPDATE #UniqueColumns SET IsUnique = Column1 FROM ##Temp WHERE
TableName = 'TableName' AND ColumnName = 'Column1'
The only issue here was when my SQL exceeded 4000 characters. So, I
went ahead and created a temporary table to store the fragments. This
is one of the few cases where the IDENTITY keyword is actually useful.
The latter part of the query would take and create a script that looked
like the following and execute it:
DECLARE @.SQL1 NVARCHAR(4000)
SELECT @.SQL1 = SQL FROM #SQLFragments WHERE RowNumber = 1
DECLARE @.SQL2 NVARCHAR(4000)
SELECT @.SQL2 = SQL FROM #SQLFragments WHERE RowNumber = 2
EXEC (@.SQL1 + @.SQL2)
et cetera. This was my dynamic workaround for the 4000 character
limitation. In reality, I doubt anything would gone over 12,000, so I
could've probably gotten away with just a couple variables, but this
was way more fun.
-Alan
--CELKO-- wrote:
> LOL! Eitgher, but not "fields", please !
>
> We can avoid the metadata with a little work. Pull out the table and
> column names then use a text editor to generate queries with this
> template
> /*
> DROP TABLE Foobar;
> CREATE TABLE Foobar
> (known_key CHAR(2) NOT NULL PRIMARY KEY,
> actual_key CHAR(2) NOT NULL,
> non_key CHAR(2) NOT NULL,
> non_key_null CHAR(2));
> INSERT INTO Foobar VALUES ('K1', 'a', 'x', 'a');
> INSERT INTO Foobar VALUES ('K2', 'b', 'x', NULL);
> INSERT INTO Foobar VALUES ('K3', 'c', 'x', 'b');
> INSERT INTO Foobar VALUES ('K4', 'd', 'y', 'c');
> INSERT INTO Foobar VALUES ('K5', 'e', 'y', 'd');
> INSERT INTO Foobar VALUES ('K6', 'f', 'y', 'e');
> */
> SELECT 'Foobar' AS table_name,
> CASE WHEN COUNT(DISTINCT known_key) = COUNT(*)
> THEN 'Y' ELSE 'N' END AS known_key,
> CASE WHEN COUNT(DISTINCT actual_key) = COUNT(*)
> THEN 'Y' ELSE 'N' END AS actual_key,
> CASE WHEN COUNT(DISTINCT non_key) = COUNT(*)
> THEN 'Y' ELSE 'N' END AS non_key,
> CASE WHEN COUNT(DISTINCT non_key_null) = COUNT(*)
> THEN 'Y' ELSE 'N' END AS non_key_null
> FROM Foobar
> These should run faster than a cursor. NULLs and dups will cause
> COUNT(*) and COUNT(DISTINCT col) to be unequal. COUNT(*) should be
> computed only once, and the optimizer should catch NOT NULL UNIQUE
> constraints to short-cut the equality test.
> No need for loops.|||Alan,
Neat stuff!
My simpleminded approach to the same sort of problem follows. It
doesn't use the owner or schema, as it should, and doesn't filter on
datatypes, as it should, but the general idea is clear enough I think.
The first column of the result tells you the table name and the number
of rows, and I scan visually for the candidate keys. On the other
hand, it gives me a complete picture of how many values per column,
and that can be invaluable to sorting out mysterious data.
Roy Harvey
Beacon Falls, CT
SELECT CASE WHEN ORDINAL_POSITION = 1
THEN 'SELECT COUNT(*) as ' + TABLE_NAME + ',' +
char(13) + char(10)
ELSE ''
END +
' COUNT(distinct ' + COLUMN_NAME + ') as ' + COLUMN_NAME
+
CASE WHEN ORDINAL_POSITION <
(select max(ORDINAL_POSITION)
from INFORMATION_SCHEMA.COLUMNS as L
where C.TABLE_NAME = L.TABLE_NAME)
THEN ','
ELSE char(13) + char(10) +
' FROM ' + TABLE_NAME
END
from INFORMATION_SCHEMA.COLUMNS as C
order by TABLE_NAME, ORDINAL_POSITION
Subscribe to:
Posts (Atom)