PostgreSQL replication settings from a MySQL user perspective

Hello.
I'm Mandai, in charge of Wild on the development team.

This article summarizes the settings required for using synchronous replication with PostgreSQL 9.4.
Our company normally uses MySQL, but using PostgreSQL
was a bit confusing, even though it is the same RDBMS.

It worked surprisingly well and I was very pleased, so
I would like to summarize my experience with the hope that everyone will give it a try.

This time the process is a bit long and complicated, so I will show you the menu

PostgreSQL replication (master DB side work)

  1. Create a replication user
  2. Addition to pg_hba.conf
  3. Editing postgresql.conf
  4. Restarting the master DB

Work on the slave DB side

  1. Copy the master with pg_basebackup
  2. Edit postgresql.conf (slave side)
  3. Creating recovery.conf
  4. Editing recovery.conf
  5. Restarting the slave DB

Verifying Replication

I would like to proceed with the above steps

 

PostgreSQL replication (master DB side work)


, we worked under the truly wild circumstances
of setting up replication for a database that was already in full operation As an aside, situations like this are actually more exciting.

As a general task

  • Complete the master DB configuration and restart
  • Copy the data from the master DB to the slave DB
  • Enter the settings on the slave DB side and restart
  • I think it would be something like this

    Assuming that the master DB in the local network has been assigned the IP address 192.168.1.1 and the slave DB has been assigned the IP address 192.168.1.2

     

    Create a replication user

    First, let's start with something simple: create a user called repl_user for replication.
    After logging in to the master DB, create the user with the following query.

    CREATE USER repl_user REPLICATION PASSWORD '[password]';

     

    Addition to pg_hba.conf

    Next, edit pg_hba.conf.
    Here, you will need to enable the slave DB to work using the repl_user user you created earlier.

    host replication repl_user 192.168.1.2/32 md5


     Change the IP address to that of the slave.
    If you are creating multiple slave DBs, you will need to adjust the IP mask accordingly, but
    if you are creating only one, /32 should be fine.

     

    Editing postgresql.conf

    Next, we will move on to editing postgresql.conf.
    Here, we will add and change the settings related to replication.
    All necessary setting items are commented out, so you can enable them by uncommenting them.
    This is more user-friendly than MySQL.

    The settings required for replication are as follows:

     

    wal_level

    This item sets how much information to include in the wal file

    There are three wal levels: minimal, archive, and hot_standby.
    The default is minimal, which is fine for normal operation.

    To set up replication, you need a level of archive or higher, so in this case we will set it to hot_standby

    wal_level = hot_standby

     

    synchronous_commit

    This setting determines whether to wait for the wal file to be written to completion before returning success to the client after the transaction has been executed

    The default is on, which returns a response to the client after the wal file is written

    If you set it to off, transaction data will be lost if writing to the wal file fails, and the status reported to the client will differ.
    However, in terms of data, the data in the transaction will simply be lost, and no inconsistency will occur.

    This means that there is a possibility that a success message is returned to the client, but the message is not actually a success

    This improves the response speed to the client, but the only problem seems to be that it will occasionally (how occasionally depends on the degree to which the transaction fails) return a lie (or is that the only problem?)

    This time we will leave it on

    synchronous_commit = on

     

    archive_mode

    By turning on archive mode, you can store wal files in an area other than the normal wal storage area

    If a replication delay occurs, the slave side will track unapplied wal files in order.
    However, the life cycle of wal files in the default storage area is surprisingly short, so they cannot withstand long delays.
    If a delay occurs that is so severe that it cannot be recovered, the data will have to be re-entered, which will be time-consuming to operate.

    To solve this, we copy the wal files outside the default storage area, which allows us to handle long replication delays regardless of their lifecycle

    The copied wal files will not continue to accumulate (be careful as they may remain if a delay occurs), so there is no need to rm them separately in a batch.
    This time, we will leave it on.

    archive_mode = on

     

    archive_command

    If archive_mode = on, set the command to save the wal file. Usually, this is done with the cp command

    There are several replacement strings that can be used in the command to indicate the source wal file, so we will use them

    • %f ・・・ file name of the wal file
    • %p ・・・ Absolute path of the target wal file
    archive_command = '[Console command to save wal files]'

     

    max_wal_senders

    Set the number of slave DBs to connect.
    It is common to set this to the number of slave DBs + 1 for connections via the pg_basebackup command.

    max_wal_senders = 2 # number of slaves + 1

     

    synchronous_standby_names

    List the DB names to connect as slaves.
    You cannot connect to slaves with names that are not listed here.

    synchronous_standby_names = "db_slave1" # If you have multiple slaves, add DB names separated by commas

     

     

    Restarting the master DB

    This completes the configuration on the master DB side.
    It is possible to configure the slave side as well (the reason for this will be explained later), but this would make postgresql.conf difficult to understand, so this time we will configure it one by one.

    At this point, restart PostgreSQL for the changes to take effect

    As an aside, when I looked up the PostgreSQL startup command, I found that there are many different ways to start it, but it seems that some methods don't work due to differences in installation methods.
    The command I always use to start it is as follows:

    # Via initd /etc/init.d/postgresql-9.4 [start|stop|restart] # Via pg_ctl su postgres -c "/usr/pgsql-9.4/bin/pg_ctl [start|stop|restart] -D /var/lib/pgsql/9.4/data -m fast"

     
    Both require you to run as root, but they worked for me with PostgreSQL installed with yum


    you need to use pg_ctl if you want to start two PostgreSQL processes

     

    Work on the slave DB side

    We will now proceed with the slave side configuration.
    We will proceed under the assumption that PostgreSQL is installed on the slave DB server.
    We will also proceed assuming there are no version differences.

     

    Copy the master with pg_basebackup

    First, copy the entire master data set using pg_basebackup.
    You can copy the $PGDATA location to a location other than the default, but in that case, if you start the database via initd, you will need to change the $PGDATA path in the /etc/init.d/postgresql script.
    In this example, we will copy the data to the default location, /var/lib/pgsql/9.4/data.
    Rename the existing data directory to back it up.

    The pg_basebackup command to copy the data from the master is as follows:

    pg_basebackup -h 192.168.1.1 -U repl_user -D /var/lib/pgsql/9.4/data -P --xlog


     Depending on the size of the database, it will take a while for the copy to complete.
    If you go into the copied data directory, you will notice that pg_hba.conf and postgresql.conf have also been copied.

    As mentioned in the section about editing postgresql.conf on the master DB side, the slave side settings can also be configured on the master side because the master side's postgresql.conf is copied.
    A DB running as a master DB skips the settings required to run as a slave DB, and a DB running as a slave DB skips the settings required to run as a master DB, so I think it's a good idea.

     

    Edit postgresql.conf (slave side)

    Configure postgresql.conf for the slave

     

    hot_standby

    This is a setting item for running PostgreSQL in hot standby mode.
    A server running in hot standby mode cannot perform any update processes and becomes a read-only database.

    However, this item alone does not enable replication; the replication function will only be enabled when hot_standby is on and recovery.conf, which will be created later, exists

    hot_standby = on

     

     

    Creating recovery.conf

    There is no need to create this file from scratch; there is a sample, so you can copy it and modify it.
    The sample recovery.conf is located under /usr/pgsql-9.4/share, so copy it to /var/lib/pgsql/9.4/data.

    cp /usr/pgsql-9.4/share/recovery.conf.sample /var/lib/pgsql/9.4/data/recovery.conf

     

     

    Editing recovery.conf

    There are two places to edit in recovery.conf.
    Once this is done, the replication settings are complete!

     

    standby_mode

    By turning on standby_mode, even after the last wal file has been received and processing has been completed, the system will continue to operate while waiting for the next wal file

    standby_mode = on

     

    primary_conninfo

    Enter the information to connect to the master DB.
    Pay particular attention to whether the application_name part is included in the names listed in synchronous_standby_names in the master DB's postgresql.conf.

    Other than that, it's just as I read it

    primary_conninfo = 'host=192.168.1.1 port=5432 user=repl_user password=[password] application_name=db_slave1'

     

     

    From restarting the slave DB to checking replication

    Once the above settings are complete, you are ready to restart the slave DB.
    The restart command is the same as for the master DB.

    Once the restart is complete, replication should be taking place, so check the progress of replication from the master DB side

    Log in to the master DB and run the following query:

    select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 12345 usesysid | 12345 usename | repl_user application_name | db_slave1 client_addr | 192.168.1.2 client_hostname | client_port | 43210 backend_start | 2016-06-12 17:01:07.923879+09 backend_xmin | state | streaming sent_location | 27/B4EED5F8 write_location | 27/B4EED5F8 flush_location | 27/B4EED5F8 replay_location | 27/B4EED5C0 sync_priority | 0 sync_state | async

     
    If the state item is streaming, it means that replication is currently being performed

    The status of data synchronization can be seen in sent_location, write_location, flush_location, and replay_location

    • sent_location is the address that indicates where the WAL files have been sent from the master DB to the slave DB
    • write_location is the address that indicates how far the data has been written in the slave buffer area
    • flush_location is the address that indicates how far the data has been written to the slave's disk area
    • replay_location is the address that indicates how much data has been imported into the slave DB

    The four addresses increase with the data, so you can see the progress at a glance

    In the above example, the addresses up to sent_location, write_location, and flush_location are the same, so this indicates that the state up to this point is up to date.
    Although replay_location is slightly behind, it is currently being executed in asynchronous mode, so it is not noticeable as long as it is not significantly out of sync.

     
    That's it.

    If you found this article helpful , please give it a like!
    0
    Loading...
    0 votes, average: 0.00 / 10
    1,214
    X facebook Hatena Bookmark pocket

    The person who wrote this article

    About the author

    Yoichi Bandai

    My main job is developing web APIs for social games, but I'm also fortunate to be able to do a lot of other work, including marketing.
    Furthermore, my portrait rights in Beyond are treated as CC0 by him.