Is that query okay? I tracked down the behavior of the REPLACE clause, which you should be aware of

2016.02.23

Web system development

table of contents

0.1 How do you update the data?
0.2 Let's see it in action
0.3 The REPLACE clause is not just a replacement

1 If you want to consult with a development professional

Hello. I'm Mandai, the wildcard expert on the development team.
This time, I'd like to take a closer look at the detailed behavior of the REPLACE clause, which you might be using without thinking about it.

How do you update the data?

At our company, we primarily use MySQL as our RDBMS for system development.
Typically, when updating tables with unique constraints...

SELECT and check for duplicates, then branch to INSERT or UPDATE
If you try to INSERT and find a duplicate, catch the exception and proceed to UPDATE processing

You may be forced to choose between the above two options, but in fact (and not really) there is also an update process using the REPLACE clause

If you execute this REPLACE clause thinking that it simply replaces data as it literally says, it may result in data inconsistencies and lead to unexpected situations

Let's look at a simple example to see how it works

Let's see it in action

First, create a table like this:

mysql> CREATE TABLE `users` (`id` INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT, `name` VARCHAR(16) NOT NULL UNIQUE, `level` INT UNSIGNED NOT NULL DEFAULT 1); Query OK, 0 rows affected (0.02 sec) mysql> DESC `users`; +-------+-----+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+------+------+-----+---------+----------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | name | varchar(16) | NO | UNI | NULL | | | level | int(10) unsigned | NO | | 1 | | +-------+------------------+------+-----+---------+----------------+ 3 rows in set (0.00 sec)

The `name` field is set as the unique key.
Next, we will insert some simple dummy data.

This time, I tried generating dummy data using generatedata.com

I think the atmosphere will be something like this:

mysql> INSERT INTO `users` (`name`,`level`) VALUES ("Pamela",8),("Dexter",4),("Ava",1),("Kelly",8),("Leila",2),("Quin",8),("Salvador",8),("Orli",2),("Ignacia",9),("Brennan",8); Query OK, 10 rows affected (0.00 sec) Records: 10 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM `users`; | Quin | 8 | | 7 | Salvador | 8 | | 8 | Orli | 2 | | 9 | Ignacia | 9 | | 10 | Brennan | 8 | +----+----------+-------+ 10 rows in set (0.00 sec)

Now, let's try inserting a record with a duplicate name

mysql> INSERT INTO `users` (`name`,`level`) VALUES ("Pamela",1); ERROR 1062 (23000): Duplicate entry 'Pamela' for key 'name' mysql> SELECT * FROM `users`; +----+----------+-------+ | id | name | level | +----+----------+-------+ | 1 | Pamela | 8 | | 2 | Dexter | 4 | | | 3 | Ava | 1 | | 4 | Kelly | 8 | | 5 | Leila | 2 | | 6 | Quin | 8 | | 7 | Salvador | 8 | | 8 | Orli | 10 rows in set (0.00 sec)

An error occurred because it was duplicated.
That's exactly what I expected.

Normally, you would use the UPDATE clause here to update the record

mysql> UPDATE `users` SET `level`=10 WHERE `name` = "Pamela"; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from users; +----+----------+-------+ | id | name | level | +----+----------+-------+ | 1 | Pamela | 10 | | 2 | Dexter | 4 | | 3 | Ava | 1 | | 4 | Kelly | 8 | | 5 | Leila | 2 | | 6 | Quin | in set (0.00 sec)

Now let's try updating a record using the REPLACE clause

mysql> REPLACE INTO `users` (`name`,`level`) VALUES ("Pamela",1); Query OK, 2 rows affected (0.00 sec) mysql> SELECT * FROM `users`; 3 | Ava | 1 | | 4 | Kelly | 8 | | 5 | Leila | 2 | | 6 | Quin | 8 | | 7 | Salvador | 8 | | 8 | Orli | 2 | | 9 | Ignacia | 9 | sec)

Have you realized at this point that something terrifying has happened?
That's right, for some reason, the primary key of the record with name = "Pamela" has been changed!
Let's take a closer look at this fact, which is as scary as a Junji Inagawa ghost story.

The REPLACE clause is not just a replacement

This fact was made clear in the experiment mentioned earlier, but why did this happen?
Incidentally, the reason why ID 11 is missing is because an INSERT clause was executed, and this is not the content of this ghost story.

In short, the reason for this is

The REPLACE clause "INSERTs a record with the same data after a DELETE."

That's because it behaves like that.
It's scary
, isn't it? It's scary that the primary key can change on its own.
If you've been using it a lot, thinking it's a convenient query that automatically determines whether it's an INSERT or UPDATE, you've
just avoided stepping on a landmine, but before you know it, you might be in a sea of landmines.

However, this phenomenon occurs when

Has an auto-incremented primary key
At the same time, there is a field that is a unique key
When a unique key is duplicated

is limited to [specific cases].
Some might say this is because of the use of surrogate keys, but I think the correct explanation is that it's because the implementation wasn't done with full knowledge of how the REPLACE clause works.

There is a similar syntax called "INSERT ... ON DUPLICATE KEY UPDATE," but it
is not safe to use with statement-based replication or tables with composite primary (unique) keys,
so its use is limited.
Also, many framework O/R mappers do not support it by default (because it is a MySQL dialect and not standard SQL), so if you use it, you will not be able to benefit from the O/R mapper.
That being said, it is said that the ON DUPLICATE KEY UPDATE syntax can process data much faster than SELECT → INSERT (UPDATE), so
if data update processes are taking a long time in a system using MySQL, it may be worth trying.

MySQL 5.6 Japanese documentation
MySQL 5.7 documentation (English)
(The content of the 5.7 English documentation is largely the same)

Also, simply using INSERT might lead to the unfortunate situation of running out of primary keys because it unnecessarily increments the auto-increment . To be on the safe side, it seems that SELECT → INSERT (UPDATE) is the more reliable approach.

Wow,generatedata.comis really convenient.

If you want to consult with a development professional

At Beyond, we combine our extensive track record, technology, and know-how in system development with OSS technology and cloud technologies such as AWS to provide contracted development of web systems with reliable quality and excellent cost performance

We also handle server-side/back-end development and proprietary API collaboration development, making full use of our technology and know-how in building and operating web system/application infrastructure for large-scale, high-load games, applications, and digital content

If you have any problems with your development project, please visit the following website

● Web system development
● Server-side development (API / DB)

If you found this article helpful,please give it a "Like"!

The person who wrote this article

About the author

Yoichi Bandai

My main job is developing web APIs for social games, but thankfully I'm also given the opportunity to work on various other tasks, including marketing.
My image rights within Beyond are treated as CC0.