When you pair MySQL and Redshift, you eliminate the risk of crashing your production database while running queries.
Join the DZone community and get the full member experience. MySQL is the most popular open source cloud database in the world, and for good reason. It’s powerful, flexible, and extremely reliable. Tens of thousands of companies use MySQL to power their web-based applications and services every day. But when it comes to data analytics, it’s a different story. MySQL is quickly bogged down by even the smallest analytical queries, putting your entire application at risk of crashing. As one FlyData customer said to us, “I have nightmares about our MySQL production database going down.” That’s why so many companies turn to Amazon Redshift to complement MySQL. Redshift is built to handle petabytes of data and deliver analytics in a fraction of the time. When you pair MySQL and Redshift, you eliminate the risk of crashing your production database while running queries. There are several ways to replicate your MySQL data to Redshift. But first, let’s dig a little deeper into why you should replicate your MySQL database to Redshift. Many companies that use MySQL to power their web applications choose Redshift for data analytics. There are several reasons why you should do this, too: Because of MySQL’s inherent weaknesses, many companies replicate data to Redshift for their analytics needs. There are 4 ways to accomplish this: The simplest way to replicate to Redshift is to export your entire MySQL data. However, this is also the least efficient method. There are three steps: To start, export data using MySQL’s mysqldump command. A typical mysqldump command looks like this: The output of this command is your MySQL SQL statement. You cannot run the SQL on Redshift as is — you’ll have to transform the statement into a format suitable for Redshift import. For the best upload performance, convert your SQL statement into TSV (tab-separated values) format. You can do this by using the Redshift COPY command. The COPY command converts your SQL statement into TSV format. Then it batch uploads the files into a Redshift table in Amazon S3. For example, a row of data in your MySQL dump would look like this: Using COPY, it will be transformed into this: Note that values are separated by a tab character (\t). You may also have to convert data values to be Redshift compatible. This is because MySQL and Redshift support different column and data types. For example, the DATE value ‘0000-00-00’ is valid in MySQL, but will throw an error in Redshift. You have to convert the value into an acceptable Redshift format, like ‘0001-01-01.’ After you’ve transformed your MySQL statement, the last step is to import it from S3 to Redshift. To do this, simply run the COPY command: Although import and export is the simplest way to replicate to Redshift, it is not ideal for frequent updates. For example, it takes roughly 30 minutes to export 18 GB of data from MySQL over a 100 Mbps network. It takes another 30 minutes to import that data into Redshift. This assumes you experience zero connection issues during import or export, which would force you to start the process over. A more efficient method of replicating MySQL to Redshift is incremental SELECT and COPY. If the import and export is too slow for your needs, incremental SELECT and COPY might be your answer. The SELECT and COPY method only updates the records that have changed since the last update. This takes considerably less time and bandwidth compared to importing and exporting your entire dataset. SELECT and COPY enables you to sync MySQL and Redshift much more frequently.