Composite primary key on many columns - Ask TOM (2024)

Several problems with that approach

mathguy, June 13, 2023 - 1:54 pm UTC

The hash of NULL is non-NULL, so that approach would allow all column values (in the supposed PK) to be NULL without throwing an error.

Then: simple concatenation, without some separator that is guaranteed to be impossible in the column values, will prevent the insertion of validly distinct keys. For example: ('a', 'b', 'cde') is distinct from ('ab', 'cd', 'e') - but after concatenation they become the same and are flagged as duplicates.

Even if we use a separator like & - so that ('a', 'b', 'cde') becomes 'a&b&cde' - we still have the same problem, as long as & is possible in the actual column values. If we do use a separator, it should be a control character like chr(31), the "unit separator" control character.

Also, it's hard to see how concatenation, with or without separator, followed by hash, can enforce that every value in a key column is non-NULL - even if there may be some way to reject a row where all key values are NULL.

One possible solution is to create 8 virtual columns, on lower(c1), lower(c2) etc., and set the primary key on these columns. This avoids the collision concern too.

June 14, 2023 - 12:22 pm UTC

You can define the individual columns to be NOT NULL - no need to do anything fancy in the hashing process.

Good point on distinct rows having the same concatenated string; though I suspect like hash collisions this is highly unlikely in practice. The OP will have to assess whether this is an issue for them and come up with a suitable separator.

I suspect they're actively trying to avoid an 8 column primary key; this could be to keep the index small and/or make foreign keys more practical.

John, June 13, 2023 - 9:07 pm UTC

Maybe I'm missing something, but what is wrong with a multi column unique key in combination with an identity column as primary key?

June 14, 2023 - 12:35 pm UTC

the table is refreshed fully every time from the feed

So the process is DELETE/TRUNCATE + INSERT => new identity column values every load. Which means rekeying all child rows, etc.

It's possible the process could be rewritten as a merge to avoid this problem. Whether that's practical is something the OP would have to clarify.

A reader, June 14, 2023 - 5:01 am UTC

Thanks Chris for a quick response. I think I can take it from here.

Another approach - set collation to a case-insensitive one

mathguy, June 14, 2023 - 5:29 am UTC

If your Oracle version is at least 12.2 and the MAX_STRING_SIZE is set to EXTENDED (a requirement I never understood), then there is another possible solution: set the collation to a case-insensitive variant, such as BINARY_CI, either for the entire table (with the DEFAULT COLLATION clause) or for the 8 columns individually (with the COLLATE clause) if the other columns of the table should not be affected. Then the PK can be set for the 8 columns and the effect is exactly what you need.

This will have other effects too; case-insensitive comparison to values in these columns will be performed in WHERE clause and join conditions, foreign keys pointing to your table will be checked using case-insensitive comparison, etc. You need to decide if these other effects are desired or not; in most cases they should be, if your PK values should be viewed as case-insensitive.

Also, if your Oracle version is at least 12.2 but your MAX_STRING_SIZE is STANDARD, you can change that to EXTENDED but that is irreversible, and it has several consequences, not all of which are harmless. But if your MAX_STRING_SIZE is already set to EXTENDED for other reasons, this is definitely something you may want to consider.

June 14, 2023 - 12:32 pm UTC

If your Oracle version is at least 12.2 and the MAX_STRING_SIZE is set to EXTENDED (a requirement I never understood)

This is because the collation of text of length N could be much longer than N. Collations are RAW values which - with non-extended data types - have a limit of 2,000 bytes. Meaning you can't (safely) collate strings >= 2,000 bytes and some shorter than that too.

Allowing longer VARCHAR2 & RAW values gives much more leeway, though doesn't resolve the problem entirely.

https://blogs.oracle.com/sql/post/how-to-do-case-insensitive-and-accent-insensitive-search-in-oracle-database#long-strings

Unique key and Identity column

Saša Petković, June 14, 2023 - 11:22 am UTC

I can only agree with John who suggested that we have unique key with all 8 columns and one identity column acting as PK, to me by far best solution.

June 14, 2023 - 12:36 pm UTC

Perhaps - we have very limited information about the overall process. Building a hash of the columns could be the most practical solution available.

Reasons not to hash a concatenation of PK columns

mathguy, June 17, 2023 - 11:29 pm UTC

There are several reasons to dislike the approach of concatenating PK columns after applying LOWER and then hashing the result and using the hash as PK, in addition to the (remote) possibility of collisions - caused either by concatenation or by hashing. It's not clear that the OP has a "space" or "key size" concern, but even if he does, it may not be as serious as the downside of the concat + hash solution.

If we need a child table with a foreign key pointing to this table, we will need to concatenate and hash the FK columns and point the hash to the PK. Perhaps that isn't too much work, but it doesn't feel natural.

In any case, this creates a problem that alone should disqualify this approach, if the PK is indeed meant as "PK" (as in, needed for referential integrity). Namely, in a composite PK/FK arrangement, a FK tuple should be allowed (per SQL standard and Oracle definition) if the FK has one or more NULL values, and the remaining values match at least one PK tuple. For example, (101, 3000) as PK and (101, NULL) in the FK is allowed. There is no way to hack the "concatenate + hash" approach to force it to behave in this way.

Another issue: Suppose a query has a condition like ... where lower(COL1) = 'abc'. With the BINARY_CI collation on the column, or with a virtual column on lower(COL1) (as part of a PK on virtual columns), the runtime can reference the index that supports the PK. There is no way to do that with the PK defined as the hash of a concatenation.

One can probably think of other inconveniences of the concat + hash approach. If this business of case-insensitive collation is really that important to the OP, perhaps they should just bite the bullet and change MAX_STRING_SIZE (if it isn't EXTENDED already).

June 19, 2023 - 2:25 pm UTC

we will need to concatenate and hash the FK columns and point the hash to the PK

You mean duplicate the 8 columns in any child tables too? Why? The hash is now a surrogate key, you only need to reference that in the child tables.

For example, (101, 3000) as PK and (101, NULL) in the FK is allowed.

I think that's a weak argument. In my experience explaining this, most people are both surprised this is possible and actively want to avoid this behaviour. Particularly when they realize it means you can insert values that don't exist in the parent at all!

I've yet to see a real-world case where you want this. Typically with multi-column FKs you want a check constraint to ensure either they're all null or none are.

the runtime can reference the index that supports the PK

That's only sure to help if the leading column of the PK is in the join/where clause. Query on any other subset of the columns and you'll likely need to create indexes to match the predicates.

Granted if you have these queries you'll likely need to do this anyway whichever approach you take. The point is it's not a given that the 8-column constraint solves all indexing needs.

To be clear: using column-based collation and defining the 8-column PK would be my preferred approach.

Building a single-column hash of the column values and making this the PK does have advantages though. The big one being is the hash could be much smaller than the text in the input columns. If any of the columns store "long" text the hash could be notably faster for data access - particularly if there are child tables.

Whether this happens in practice will depend on the data.

Composite primary key on many columns - Ask TOM (2024)

FAQs

How many columns can a composite primary key have? ›

Note that the data type of all the columns in a composite key can be different. Columns: The latest version of SQL supports the combining of more than 16 columns.

What is maximum number of columns that can be used for creating a composite primary key in Oracle index? ›

Oracle creates an index on the columns of a primary key; therefore, a composite primary key can contain a maximum of 16 columns.

Know More ›

Can primary key be made based on multiple columns? ›

The PRIMARY KEY constraint uniquely identifies each record in a table. Primary keys must contain UNIQUE values, and cannot contain NULL values. A table can have only ONE primary key; and in the table, this primary key can consist of single or multiple columns (fields).

Get More Info Here ›

What is the minimum number of columns needed to create a composite primary key? ›

Example: Date as composite primary key

A composite primary key is a combination of 2 or more columns used to create a unique primary key.

Get More Info Here ›

What are the rules for composite primary key? ›

Composite Primary Key Rules

A composite key cannot be null. A composite Key cannot contain duplicates. We can have duplicate values in an individual column, but they must be unique across the columns. Null values are not allowed in any columns in the composite primary key.

Find Out More ›

Can you have 3 composite primary keys? ›

A table can only ever have a one primary key.

What is the maximum length of composite primary key? ›

The maximum key length for a clustered index is 900 bytes. The index 'PK_AspNetUserTokens' has maximum length of 2700 bytes. For some combination of large values, the insert/update operation will fail. These composite key fields are nvarchar s that should be limited in length so as to stay within the 900 byte limit.

Is a composite primary key comprised of two or more fields? ›

Primary key, super key, candidate key, or any other key can be called as the composite key if it consists of two or more columns to uniquely identify the records in a table. Composite key is the combination of two or more columns used for uniquely identifying each record in a table.

Read The Full Story ›

Is composite key two or more primary keys in a table? ›

A table can have only one PRIMARY KEY, which may consist of single or multiple fields. When multiple fields are used as a primary key, they are called a Composite Key. A KEY is a field or combination of fields in a database table, that is used to retrieve and sort rows based on certain criteria.

How does primary key with multiple columns work? ›

The primary key composed using multiple columns is known as a Composite Primary Key. A primary key must be NOT NULL. So, the primary key column does not accept null values. The length of a primary key cannot exceed 900 bytes in size.

Read The Full Story ›

How do I create a primary key for multiple columns in a table? ›

In Table Designer, click the row selector for the database column you want to define as the primary key. If you want to select multiple columns, hold down the CTRL key while you click the row selectors for the other columns. Right-click the row selector for the column and select Set Primary Key.

Get More Info Here ›

How do you query a composite primary key? ›

You can use aggregate function count(*). If it returns a value greater than 1, that would mean the table has composite primary key.

Tell Me More ›

What are the advantages of composite primary key? ›

Composite keys use less disk space as compared to defining a surrogate key column, this is because the composite key already exists as attributes in the table and does not need to be defined in the table just for the purpose of unique identification. This simplifies the table and also saves space.

Keep Reading ›

Under what circ*mstances are composite primary keys appropriate? ›

Composite primary keys are particularly useful in two cases: - As identifiers of composite entities, where each primary key combination is allowed only once in the M:N relationship. - As identifiers of weak entities, where the weak entity has a strong identifying relationship with the parent entity.

Tell Me More ›

Can a table have composite primary key? ›

A primary key that is made by the combination of more than one attribute is known as a composite key. In other words we can say that: Composite key is a key which is the combination of more than one field or column of a given table. It may be a candidate key or primary key.

Tell Me More ›

What are the main disadvantages of using composite primary keys? ›

Disadvantages

Joining. Each and every join gets more complicated when multiple fields are involved. ...
Indexes. Indexes with multiple fields (which composite primary keys are) suffer from the field order problem in queries. ...
Migrations. ...
Discoverability. ...
Refactoring. ...
ORM complexity.

What is the difference between composite key and composite primary key? ›

While a primary key and a composite key might do the same things, the primary key will consist of one column, where the composite key will consist of two or more columns.

Learn More Now ›

Is composite primary key good or bad? ›

It is a relational database construct for handling many-to-many relationships. Because it doesn't really represent an entity, it should not have foreign key relationships. Hence, a composite primary key is reasonable.

Get More Info ›

Does a composite primary key break the first normal form? ›

If a relation contains a composite or multi-valued attribute, it violates the first normal form, or the relation is in first normal form if it does not contain any composite or multi-valued attribute.

Read The Full Story ›

Under what two cases are composite primary keys most useful? ›

Composite primary keys useful in two cases:
Automatically provides benefit of ensuring that there cannot be duplicate values.

Learn More ›

What is it called when primary key consists of multiple fields? ›

A key that has more than one field is called a composite key.

Discover More ›

Can composite primary key be null? ›

A composite key cannot be null.

Can we create primary key on just one column of a table? ›

You can have only one primary key in a table, but it can consist of multiple columns. This doesn't mean that each column is a primary key, but that the combination of all these columns' values are unique.

Why not use composite key? ›

Explore More ›

When should you use a composite primary key? ›

A composite key is made by the combination of two or more columns in a table that can be used to uniquely identify each row in the table when the columns are combined uniqueness of a row is guaranteed, but when it is taken individually it does not guarantee uniqueness, or it can also be understood as a primary key made ...

Tell Me More ›

Which key may be a composite key where the key contains multiple fields? ›

When multiple fields are used as a primary key, they are called a Composite Key. A KEY is a field or combination of fields in a database table, that is used to retrieve and sort rows based on certain criteria. Keys are defined to speed up data access and, in many cases, to create a relationship between tables.

Know More ›

Can a composite key have 3 attributes? ›

A 'combination of two or more' better describes the word 'composite'. Thus, a composite key in DBMS is a candidate key that is composed of two or more attributes and is capable of uniquely identifying a table or a relation.

Discover More ›

Can a table have both primary key and composite key? ›

So yes, a table can have a PRIMARY KEY on one column, and a composite UNIQUE INDEX or UNIQUE CONSTRAINT.

Get More Info ›

Which two rules apply to primary keys? ›

The rules of Primary Key are as follows: All the values in the column chosen as the primary key must be unique. Each and every table can have only one primary key. No value in the primary key column can be NULL.

Read On ›

Composite primary key on many columns - Ask TOM (2024)

Several problems with that approach

Another approach - set collation to a case-insensitive one

Unique key and Identity column

Reasons not to hash a concatenation of PK columns

FAQs

How many columns can a composite primary key have? ›

How do I create a primary key for multiple columns in a table? ›

What is the difference between composite key and composite primary key? ›

Can we create primary key on just one column of a table? ›

Why not use composite key? ›

References