Database migration: Concepts and principles (Part 1)  |  Cloud Architecture Center  |  Google Cloud (2023)

Last reviewed 2022-10-28 UTC

This document introduces concepts, principles, terminology, and architecture ofnear-zero downtime database migration for cloud architects who are migratingdatabases to Google Cloud from on-premises or other cloud environments.

This document is part 1 of two parts.Part 2 discusses setting up and executing the migration process, including failurescenarios.

Database migration is the process of migrating data from one or more sourcedatabases to one or more target databases by using a database migration service.When a migration is finished, the dataset in the source databases resides fully,though possibly restructured, in the target databases. Clients that accessed thesource databases are then switched over to the target databases, and the sourcedatabases are turned down.

The following diagram illustrates this database migration process.

Database migration: Concepts and principles (Part 1) | Cloud Architecture Center | Google Cloud (1)

This document describes database migration from an architectural standpoint:

  • The services and technologies involved in database migration.
  • The differences between homogeneous and heterogeneous database migration.
  • The tradeoffs and selection of a migration downtime tolerance.
  • A setup architecture that supports a fallback if unforeseenerrors occur during a migration.

This document does not describe how you set up a particular database migrationtechnology. Rather, it introduces database migration in fundamental, conceptual,and principle terms.

The following diagram shows a generic database migration architecture.

Database migration: Concepts and principles (Part 1) | Cloud Architecture Center | Google Cloud (2)

A database migration service runs within Google Cloud and accesses bothsource and target databases. Two variants are represented: (a) shows themigration from a source database in an on-premises data center or a remote cloudto a managed database like Cloud Spanner; (b) shows a migration to adatabase on Compute Engine.

Even though the target databases are different in type (managed and unmanaged)and setup, the database migration architecture and configuration is the same forboth cases.

Terminology

The most important data migration terms for these documents are defined asfollows:

source database: A database that contains data to be migrated to one ormore target databases.

target database: A database that receives data migrated from one or moresource databases.

database migration: A migration of data from source databases to targetdatabases with the goal of turning down the source database systems after themigration completes. The entire dataset, or a subset, is migrated.

homogeneous migration: A migration from source databases to targetdatabases where the source and target databases are of the same databasemanagement system from the same provider.

heterogeneous migration: A migration from source databases to targetdatabases where the source and target databases are of different databasemanagement systems from different providers.

database migration system: A software system or service that connects tosource databases and target databases and performs data migrations from sourceto target databases.

data migration process: A configured or implemented process executed by thedata migration system to transfer data from source to target databases, possiblytransforming the data during the transfer.

database replication: A continuous transfer of data from source databasesto target databases without the goal of turning down the source databases.Database replication (sometimes called database streaming) is an ongoingprocess.

Classification of database migrations

There are different types of database migrations that belong to differentclasses. This section describes the criteria that defines those classes.

Replication versus migration

In a database migration, you move data from source databases to targetdatabases. After the data is completely migrated, you delete source databasesand redirect client access to the target databases. Sometimes you keep thesource databases as a fallback measure if you encounter unforeseen issueswith the target databases. However, after the target databases are reliablyoperating, you eventually delete the source databases.

With database replication, in contrast, you continuously transfer data fromthe source databases to the target databases without deleting the sourcedatabases. Sometimes database replication is referred to as database streaming.While there is a defined starting time, there is typically no defined completiontime. The replication might be stopped or become a migration.

This document discusses only database migration.

Partial versus complete migration

Database migration is understood to be a complete and consistent transfer ofdata. You define the initial dataset to be transferred as either a completedatabase or a partial database (a subset of the data in a database) plus everychange committed on the source database system thereafter.

Heterogeneous migration versus homogeneous migration

A homogeneous database migration is a migration between the source and targetdatabases of the same database technology, for example, migrating from a MySQLdatabase to a MySQL database, or from an Oracle® database to an Oracledatabase. Homogeneous migrations also include migrations between a self-hosteddatabase system such as PostgreSQL to a managed version of it such asCloud SQL (a PostgreSQL variant).

In a homogenous database migration, the schemas for the source and targetdatabases are likely identical. If the schemas are different, the data from thesource databases must be transformed during migration.

Heterogeneous database migration is a migration between source and targetdatabases of different database technologies, for example, from an Oracledatabase to Spanner. Heterogeneous database migration can bebetween the same data models (for example, from relational to relational), orbetween different data models (for example, from relational to key-value).

Migrating between different database technologies doesn't necessarily involvedifferent data models. For example, Oracle, MySQL, PostgreSQL, andSpanner all support the relational data model. However,multi-model databases like Oracle, MySQL, or PostgreSQL support different datamodels. Data stored as JSON documents in a multi-model database can be migratedto MongoDB with little or no transformation necessary, as the data model is thesame in the source and the target database.

Although the distinction between homogeneous and heterogeneous migration isbased on database technologies, an alternative categorization is based ondatabase models involved. For example, a migration from an Oracle database toSpanner is homogeneous when both use the relational data model; amigration is heterogeneous if, for example, data stored as JSON objects inOracle is migrated to a relational model in Spanner.

Categorizing migrations by data model more accurately expresses the complexityand effort required to migrate the data than basing the categorization on thedatabase system involved. However, because the commonly used categorization inthe industry is based on the database systems involved, the remaining sectionsare based on that distinction.

Migration downtime: zero versus minimal versus significant

After you successfully migrate a dataset from the source to the targetdatabase, you then switch client access over to the target database and deletethe source database.

Switching clients from the source databases to the target databases involvesseveral processes:

  • To continue processing, clients must close existing connections to thesource databases and create new connections to the target databases.Ideally, closing connections is graceful, meaning that you don'tunnecessarily roll back ongoing transactions.
  • After closing connections on the source databases, you must migrateremaining changes from the source databases to the target databases (calleddraining) to ensure that all changes are captured.
  • You might need to test target databases to ensure that these databasesare functional and that clients are functional and operate within theirdefined service level objectives (SLOs).

In a migration, achieving truly zero downtime for clients is impossible;there are times when clients cannot process requests. However, you can minimizethe duration that clients are unable to process requests in several ways(near-zero downtime):

  • You can start your test clients in read-only mode against the targetdatabases long before you switch the clients over. With this approach,testing is concurrent with the migration.
  • You can configure the amount of data being migrated (that is, in flightbetween the source and target databases) to be as small as possible whenthe switch over period approaches. This step reduces the time for drainingbecause there are fewer differences between the source databases and thetarget databases.
  • If new clients operating on the target databases can be startedconcurrently with existing clients operating on the source databases, youcan shorten the switch over time because the new clients are ready toexecute as soon as all data is drained.

While it's unrealistic to achieve zero downtime during a switch over, you canminimize the downtime by starting activities concurrently with the ongoing datamigration when possible.

In some database migration scenarios, significant downtime is acceptable.Typically, this allowance is a result of business requirements. In such cases,you can simplify your approach. For example, with a homogeneous databasemigration, you might not require data modification; export/import orbackup/restore are perfect approaches. With heterogeneous migrations,the database migration system does not have to deal with updates of sourcedatabase systems during the migration.

However, you need to establish that the acceptable downtime is long enough forthe database migration and follow-up testing to occur. If this downtime cannotbe clearly established or is unacceptably long, you need to plan a migrationthat involves minimal downtime.

Database migration cardinality

In many situations database migration takes place between a single sourcedatabase and a single target database. In such situations, the cardinality is1:1 (direct mapping). That is, a source database is migrated without changesto a target database.

A direct mapping, however, is not the only possibility. Other cardinalitiesinclude the following:

  • Consolidation (n:1). In a consolidation, you migrate data fromseveral source databases to a smaller number of target databases (or evenone target). You might use this approach to simplify database management oremploy a target database that can scale.
  • Distribution (1:n). In a distribution, you migrate data from onesource database to several target databases. For example, you might use thisapproach when you need to migrate a large centralized database containingregional data to several regional target databases.
  • Re-distribution (n:m). In a re-distribution, you migrate data fromseveral source databases to several target databases. You might use thisapproach when you have sharded source databases with shards of verydifferent sizes. The re-distribution evenly distributes the sharded dataover several target databases that represent the shards.

Database migration provides an opportunity to redesign and implement yourdatabase architecture in addition to merely migrating data.

Migration consistency

The expectation is that a database migration is consistent. In the context ofmigration, consistent means the following:

  • Complete. All data that is specified to be migrated is actuallymigrated. The specified data could be all data in a source database or asubset of the data.
  • Duplicate free. Each piece of data is migrated once, and only once.No duplicate data is introduced into the target database.
  • Ordered. The data changes in the source database are applied to thetarget database in the same order as the changes occurred in the sourcedatabase. This aspect is essential to ensure data consistency.

An alternative way to describe migration consistency is that after a migrationcompletes, the data state between the source and the target databases isequivalent. For example, in a homogenous migration that involves the directmapping of a relational database, the same tables and rows must exist in thesource and the target databases.

This alternative way of describing migration consistency is important becausenot all data migrations are based on sequentially applying transactions in thesource database to the target database. For example, you might back up thesource database and use the backup to restore the source database content intothe target database (when significant downtime is possible).

Active-passive versus active-active migration

An important distinction is whether the source and target databases are bothopen to modifying query processing. In an active-passive database migration,the source databases can be modified during the migration, while the targetdatabases allow only read-only access.

An active-active migration supports clients writing into both the source aswell as the target databases during the migration. In this type of migration,conflicts can occur. For instance, if the same data item in the source andtarget database is modified so as to conflict with each other semantically, youmight need to run conflict resolution rules to resolve the conflict.

In an active-active migration, you must be able to resolve all data conflictsby using conflict resolution rules. If you cannot, you might experience datainconsistency.

Database migration architecture

A database migration architecture describes the various components required forexecuting a database migration. This section introduces a generic deploymentarchitecture and treats the database migration system as a separate component.It also discusses the features of a database management system that support datamigration as well as non-functional properties that are important for many usecases.

Deployment architecture

A database migration can occur between source and target databases located inany environment, like on-premises or different clouds. Each source and targetdatabase can be in a different environment; it is not necessary that all arecollocated in the same environment.

The following diagram shows an example of a deployment architecture involvingseveral environments.

Database migration: Concepts and principles (Part 1) | Cloud Architecture Center | Google Cloud (3)

DB1 and DB2 are two source databases, and DB3 and Spanner arethe target databases. Two clouds and two on-premises data centers are involvedin this database migration. The arrows represent the invocation relationships:the database migration service invokes interfaces of all source and targetdatabases.

A special case not discussed here is the migration of data from a database intothe same database. This special case uses the database migration system for datatransformation only, not for migrating data between different systems acrossdifferent environments.

Fundamentally, there are three approaches to database migration, which thissection discusses:

  • Using adatabase migration system
  • Usingdatabase management system replication functionality
  • Usingcustom database migration functionality

Database migration system

The database migration system is at the core of database migration. The systemexecutes the actual data extraction from the source databases, transports thedata to the target databases, and optionally modifies the data during transit.This section discusses the basic database migration system functionality ingeneral. Examples of database migration systems includeStriim,tcVision andCloud Data Fusion.

Data migration process

The core technical building block of a database migration system is the datamigration process. The data migration process is specified by a developer anddefines the source databases from which data is extracted, the target databasesinto which data is migrated, and any data modification logic applied to the dataduring the migration.

You can specify one or more data migration processes and execute themsequentially or concurrently depending on the needs of the migration. Forexample, if you migrate independent databases, the corresponding data migrationprocesses can run in parallel.

Data extraction and insertion

You can detect changes (insertions, updates, deletions) in a database system intwo ways: database-supported change data capture (CDC) based on a transactionlog, and differential querying of data itself using the query interface of adatabase management system.

CDC based on a transaction log

Database-supported CDC is based on database management features that areseparate from the query interface. One approach is based on transaction logs(for example thebinary log in MySQL).A transaction log contains the changes made to data in the correct order. Thetransaction log is continuously read, and so every change can be observed. Fordatabase migration, this logging is extremely useful, as CDC ensures that eachchange is visible and is subsequently migrated to the target database withoutloss and in the correct order.

CDC is the preferred approach for capturing changes in a database managementsystem. CDC is built into the database itself and has the least load impact onthe system.

Differential querying

If no database management system feature exists that supports observing allchanges in the correct order, you can use differential querying as analternative. In this approach, each data item in a database gets an additionalattribute that contains a timestamp or a sequence number. Every time the dataitem is changed, the change timestamp is added or the sequence number isincreased. A polling algorithm reads all data items since the last time itexecuted or since the last sequence number it used. Once the polling algorithmdetermines the changes, it records the current time or sequence number into itsinternal state and then passes on the changes to the target database.

While this approach works without problems for inserts and updates, you need tocarefully design deletes because a delete removes a data item from the database.After the data is deleted, it is impossible for the poller to detect that adeletion occurred. You implement a deletion by using an additional status field(a logical delete flag) that indicates the data is deleted. Alternatively,deleted data items can be collected into one or more tables, and the polleraccesses those tables to determine if deletion occurred.

For variants on differential querying, seeChange data capture.

Differential querying is the least preferred approach because it involvesschema and functionality changes. Querying the database also adds a query loadthat does not relate to executing client logic.

Adapter and agent

The database migration system requires access to the source and to the databasesystems. Adapters are the abstraction that encapsulates the accessfunctionality. In the simplest form, an adapter can be a JDBC driver forinserting data into a target database that supports JDBC. In a more complexcase, an adapter is running in the environment of the target (sometimes calledagent), accessing a built-in database interface like log files. In an evenmore complex case an adapter or agent interfaces with yet another softwaresystem, which in turn accesses the database. For example, an agent accessesOracle GoldenGate, and that in turn accesses an Oracle database.

The adapter or agent that accesses a source database implements the CDCinterface or the differential querying interface, depending on the design of thedatabase system. In both cases, the adapter or agent provides changes to thedatabase migration system, and the database migration system is unaware if thechanges were captured by CDC or differential querying.

Data modification

In some use cases, data is migrated from source databases to target databasesunmodified. These straight-through migrations are typically homogeneous.

Many use cases, however, require data to be modified during the migrationprocess. Typically, modification is required when there are differences inschema, differences in data values, or opportunities to clean up data while itis in transition.

The following sections discuss several types of modifications that can berequired in a data migration—data transformation, data enrichment orcorrelation, and data reduction or filtering.

Data transformation

Data transformation transforms some or all data values from the sourcedatabase. Some examples include the following:

  • Data type transformation. Sometimes data types between the sourceand target databases are not equivalent. In these cases, data typetransformation casts the source value into the target value based on typetransformation rules. For example, a timestamp type from the source mightbe transformed into a string in the target.
  • Data structure transformation. Data structure transformationmodifies the structure in the same database model or between differentdatabase models. For example, in a relational system, one source tablemight be split into two target tables, or several source tables might bedenormalized into one target table by using a join. A 1:n relationship inthe source database might be transformed into a parent/child relationshipin Spanner. Documents from a source document database systemmight be decomposed into a set of relational rows in a target system.
  • Data value transformation. Data value transformation is separatefrom data type transformation. Data value transformation changes thevalue without changing the data type. For example, a local time zone isconverted to Coordinated Universal Time (UTC). Or a short zip code (fivedigits) represented as a string is converted to a long zip code (fivedigits followed by a dash followed by 4 digits, also known as ZIP+4).
Data enrichment and correlation

Data transformation is applied on the existing data without reference toadditional, related reference data. With data enrichment, additional data isqueried to enrich source data before it's stored in the target database.

  • Data correlation. It is possible to correlate source data. Forexample, you can combine data from two tables in two source databases. Inone target database, for instance, you might relate a customer to all open,fulfilled, and canceled orders whereby the customer data and the orderdata originate from two different source databases.
  • Data enrichment. Data enrichment adds reference data. For example,you might enrich records that only contain a zip code by adding the cityname corresponding to the zip code. A reference table containing zip codesand the corresponding city names is a static dataset accessed for this usecase. Reference data can be dynamic as well. For example, you might use alist of all known customers as reference data.
Data reduction and filtering

Another type of data transformation is reducing or filtering the source databefore migrating it to a target database.

  • Data reduction. Data reduction removes attributes from a dataitem. For example, if a zip code is present in a data item, thecorresponding city name might not be required and is dropped, because itcan be recalculated or because it is not needed anymore. Sometimes thisinformation is kept for historical reasons to record the name of the cityas entered by the user, even if the city name changes in time.
  • Data filtering. Data filtering removes a data item altogether. Forexample, all canceled orders might be removed and not transferred to thetarget database.
Data combination or recombination

If data is migrated from different source databases to different targetdatabases, it can be necessary to combine data differently between source andtarget databases.

Suppose that customers and orders are stored in two different source databases.One source database contains all orders, and a second source database containsall customers. After migration, customers and their orders are stored in a 1:nrelationship within a single target database schema—not in a single targetdatabase, however, but several target databases where each contains a partitionof the data. Each target database represents a region and contains all customersand their orders located in that region.

Target database addressing

Unless there is only one target database, each data item that is migrated needsto be sent to the correct target database. A couple of approaches to addressingthe target database include the following:

  • Schema-based addressing. Schema-based addressing determines thetarget database based on the schema. For example, all data items of acustomer collection or all rows of a customer table are migrated to thesame target database storing customer information, even though thisinformation was distributed in several source databases.
  • Content-based routing. Content-based routing (using acontent-based router,for example) determines the target database based on data values. Forexample, all customers located in the Latin America region are migrated toa specific target database that represents that region.

You can use both types of addressing at the same time in a database migration.Regardless of the addressing type used, the target database must have thecorrect schema in place so that data items are stored.

Persistence of in-transit data

Database migration systems, or the environments on which they run, can failduring a migration, and in-transit data can be lost. When failures occur, youneed to restart the database migration system and ensure that the data stored inthe source database is consistently and completely migrated to the targetdatabases.

As part of the recovery, the database migration system needs to identify thelast successfully migrated data item to determine where to begin extracting fromthe source databases. To resume at the point of failure, the systemneeds to keep an internal state on the migration progress.

You can maintain state in several ways:

  • You can store all extracted data items within the database migrationsystem before any database modification, and then remove the data item onceits modified version is successfully stored in the target database. Thisapproach ensures that the database migration system can exactly determinewhat is extracted and stored.
  • You can maintain a list of references to the data items in transit. Onepossibility is to store the primary keys or other unique identifiers ofeach data item together with a status attribute. After a failure, thisstate is the basis for recovering the system consistently.
  • You can query the source and target databases after a failure todetermine the difference between the source and target database systems.The next data item to be extracted is determined based on the difference.

Other approaches to maintaining state can depend on the specific sourcedatabases. For example, a database migration system can keep track of whichtransaction log entries are fetched from the source database and which areinserted into the target database. If a failure occurs, the migration can berestarted from the last successful inserted entry.

Persistence of in-transit data is also important for other reasons than errorsor failures. For example, it might not be possible to query data from the sourcedatabase to determine its state. If, for instance, the source databasecontained a queue, the messages in that queue might have been removed at somepoint.

Yet another use case for persistence of in-transit data is large windowprocessing of the data. During data modification, data items can be transformedindependently of each other. However, sometimes the data modification depends onseveral data items (for example, numbering the data items processed per day,starting at zero every day).

A final use case for persistence of in-transit data is to provide repeatabilityof the data during data modification when the database system cannot access thesource databases again. For example, you might need to re-execute the datamodifications with different modification rules and then verify and compare theresults with the initial data modifications. This approach might be necessary ifyou need to track any inconsistencies in the target database because of anincorrect data modification.

Completeness and consistency verification

You need to verify that your database migration is complete and consistent. Thischeck ensures that each data item is migrated only once, and that the datasetsin the source and target databases are identical and that the migration iscomplete.

Depending on the data modification rules, it is possible that a data item isextracted but not inserted into a target database. For this reason, directlycomparing the source and target databases is not a solid approach for verifyingcompleteness and consistency. However, if the database migration system tracksthe items that are filtered out, you can then compare the source and targetdatabases along with the filtered items.

Replication functionality of the database management system

A special use case in a homogeneous migration is where the target database is acopy of the source database. Specifically, the schemas in the source and targetdatabases are the same, the data values are the same, and each source databaseis a direct mapping (1:1) to a target database.

In this case, you can use functionality within the database management systemto replicate one database to another. Replication only creates an exact copy; itdoes not perform data modification. Examples areMySQL replication,PostgreSQL replication (see alsopglogical),orMicrosoft SQL Server replication.

However, if data modification is required, or you have any cardinality otherthan a direct mapping, a database migration system's functionality is needed toaddress such a use case.

Custom database migration functionality

Some reasons for building database migration functionality instead of using adatabase migration system or database management system functionality includethe following:

  • You need full control over every detail.
  • You want to reuse functionality.
  • You want to reduce costs or simplify your technological footprint.

Building blocks for building migration functionality include the following:

  • Export/import. If downtime is not a factor, you can use databaseexport and database import to migrate data in homogenous databasemigrations. Export/import, however, requires that you quiesce the sourcedatabase to prevent updates before you export the data. Otherwise, changesmight not be captured in the export, and the target database will not be anexact copy of the source database.
  • Backup/restore. Like in the case of export/import, backup/restoreincurs downtime because you need to quiesce the source database so that thebackup contains all data and the latest changes. The downtime continuesuntil the restore is completed successfully on the target database.
  • Differential querying. If changing the database schema is an option,you can extend the schema so that database changes can be queried at thequery interface. An additional timestamp attribute is added, indicating thetime of the last change. An additional delete flag can be added, indicatingif the data item is deleted or not (logical delete). With these twochanges, a poller executing in a regular interval can query all changessince its last execution. The changes are applied to the target database.Additional approaches are discussed inChange data capture.

These are only a few of the possible options to build a custom databasemigration. Although a custom solution provides the most flexibility and controlover implementation, it also requires constant maintenance to address bugs,scalability limitations, and other issues that might arise during a databasemigration.

Additional considerations of database migration

The following sections briefly discuss non-functional aspects that areimportant in the context of database migration. These aspects include errorhandling, scalability, high availability, and disaster recovery.

Error handling

Failures during database migration must not cause data loss or the processingof database changes out of order. Data integrity must be preserved regardless ofwhat caused the failure (such as a bug in the system, a network interruption, aVM crash, or a zone failure).

A data loss occurs when a migration system retrieves the data from the sourcedatabases and does not store it in the target databases because of some error.When data is lost, the target databases do not match the source databases andare thus inconsistent and incomplete. The completeness and consistencyverification functionality flags this state(Completeness and consistency verification).

Scalability

In a database migration, time-to-migrate is an important metric. In a zerodowntime migration (in the sense of minimal downtime), the migration of the dataoccurs while the source databases continue to change. To migrate in areasonable timeframe, the rate of data transfer must be significantly fasterthan the rate of updates of the source database systems, especially when thesource database system is large. The higher the transfer rate, the faster thedatabase migration can be completed.

When the source database systems are quiesced and are not being modified, themigration might be faster because there are no changes to incorporate. In ahomogeneous database, the time-to-migrate might be quite fast because you canuse backup/restore or export/import functionality, and the transfer of filesscales.

High availability and disaster recovery

In general, source and target databases are configured for high availability. Aprimary database has a corresponding read replica that is promoted to be theprimary database when a failure occurs.

When a zone fails, the source or target databases fail over to a different zoneto be continuously available. If a zone failure occurs during a databasemigration, the migration system itself is impacted because several of the sourceor target databases it accesses become inaccessible. The migration system mustreconnect to the newly promoted primary databases that are running after afailure. Once the database migration system is reconnected, it must recover themigration itself to ensure the completeness and consistency of the data in thetarget databases. The migration system must determine the last consistenttransfer to establish where to resume.

If the database migration system itself fails (for example, the zone it runs inbecomes inaccessible), then it must be recovered. One recovery approach is acold restart. In this approach, the database migration system is installed in anoperational zone and restarted. The biggest issue to address is that themigration system must be able to determine the last consistent data transferbefore the failure and continue from that point to ensure data completeness andconsistency in the target databases.

If the database migration system is enabled for high availability, it can failover and continue processing afterwards. If limited downtime of the databasemigration system is important, you need to select a database and implement highavailability.

In terms of recovering the database migration, disaster recovery is verysimilar to high availability. Instead of reconnecting to newly promoted primarydatabases in a different zone, the database migration system must reconnect todatabases in a different region (a failover region). The same holds true for thedatabase migration system itself. If the region where the database migrationsystem runs becomes inaccessible, the database migration system must fail overto a different region and continue from the last consistent data transfer.

Pitfalls

Several pitfalls can cause inconsistent data in the target databases. Somecommon ones to avoid are the following:

  • Order violation. If scalability of the migration system isachieved by scaling out, then several data transfer processes are runningconcurrently (in parallel). Changes in a source database system are orderedaccording to committed transactions. If changes are picked up from thetransaction log, the order must be maintained throughout the migration.Parallel data transfer can change the order because of varying speedbetween the underlying processes. It is necessary to ensure that the datais inserted into the target databases in the same order as it is receivedfrom the source databases.
  • Consistency violation. With differential queries, thesource databases have additional data attributes that contain, for example,commit timestamps. The target databases will not have commit timestampsbecause the commit timestamps are only put in place to establish changemanagement in the source databases. It is important to ensure that insertsinto the target databases must be timestamp consistent, which means allchanges with the same timestamp must be in the same insert or update orupsert transaction. Otherwise, the target database might have aninconsistent state (temporarily) if some changes are inserted and otherswith the same timestamp are not. This temporary inconsistent state does notmatter if the target databases are not accessed for processing. However, ifthey are used for testing, consistency is paramount. Another aspect is thecreation of the timestamp values in the source database and how they relateto the commit time of the transaction in which they are set. Because oftransaction commit dependencies, a transaction with an earlier timestampmight become visible after a transaction with a later timestamp. If thedifferential query is executed between the two transactions, it won't seethe transaction with the earlier timestamp, resulting in an inconsistencyon the target database.
  • Missing or duplicate data. When a failover occurs, acareful recovery is required if some data is not replicated between theprimary and the failover replica. For example, a source database fails overand not all data is replicated to the failover replica. At the same time,the data is already migrated to the target database before the failure.After failover, the newly promoted primary database is behind in terms ofdata changes to the target database (called flashback). A migrationsystem needs to recognize this situation and recover from it in such a waythat the target database and the source database get back into a consistentstate.
  • Local transactions. To have the source and targetdatabase receive the same changes, a common approach is to have clientswrite to both the source and target databases instead of using a datamigration system. This approach has several pitfalls. One pitfall is thattwo database writes are two separate transactions; you might encounter afailure after the first finishes and before the second finishes. Thisscenario causes inconsistent data from which you must recover. Also, thereare several clients in general, and they are not coordinated. The clientsdo not know the source database transaction commit order and thereforecannot write to the target databases implementing that transaction order.The clients might change the order, which can lead to data inconsistency.Unless all access goes through coordinated clients, and all clients ensurethe target transaction order, this approach can lead to an inconsistentstate with the target database.

In general, there are other pitfalls to watch out for. The best way tofind problems that might lead to data inconsistency is to do a complete failureanalysis that iterates through all possible failure scenarios. If concurrency isimplemented in the database migration system, all possible data migrationprocess execution orders must be examined to ensure that data consistency ispreserved. If high availability or disaster recovery (or both) is implemented,all possible failure combinations must be examined.

What's next

  • ReadDatabase migrations: Concepts and principles (Part 2).
  • Read about database migration in the following documents:
    • Migrating from PostgreSQL to Spanner
    • Migrating from an Oracle® OLTP system to Spanner
    • Migrating a MySQL cluster to Compute Engine using HAProxy
  • SeeDatabase migration for more database migration guides.
  • Explore reference architectures, diagrams, and best practices about Google Cloud.Take a look at ourCloud Architecture Center.

References

Top Articles
Latest Posts
Article information

Author: Frankie Dare

Last Updated: 07/12/2023

Views: 6356

Rating: 4.2 / 5 (53 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Frankie Dare

Birthday: 2000-01-27

Address: Suite 313 45115 Caridad Freeway, Port Barabaraville, MS 66713

Phone: +3769542039359

Job: Sales Manager

Hobby: Baton twirling, Stand-up comedy, Leather crafting, Rugby, tabletop games, Jigsaw puzzles, Air sports

Introduction: My name is Frankie Dare, I am a funny, beautiful, proud, fair, pleasant, cheerful, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.