Sunday, April 3, 2022
HomeBig DataWhat to think about when migrating knowledge warehouse to Amazon Redshift

What to think about when migrating knowledge warehouse to Amazon Redshift


Clients are migrating knowledge warehouses to Amazon Redshift as a result of it’s quick, scalable, and cost-effective. Nonetheless, knowledge warehouse migration tasks will be complicated and difficult. On this publish, I enable you perceive the widespread drivers of knowledge warehouse migration, migration methods, and what instruments and companies can be found to help along with your migration venture.

Let’s first focus on the massive knowledge panorama, the that means of a contemporary knowledge structure, and what that you must take into account in your knowledge warehouse migration venture when constructing a contemporary knowledge structure.

Enterprise alternatives

Information is altering the way in which we work, stay, and play. All of this habits change and the motion to the cloud has resulted in an information explosion over the previous 20 years. The proliferation of Web of Issues and sensible telephones have accelerated the quantity of the info that’s generated daily. Enterprise fashions have shifted, and so have the wants of the folks working these companies. Now we have moved from speaking about terabytes of knowledge just some years in the past to now petabytes and exabytes of knowledge. By placing knowledge to work effectively and constructing deep enterprise insights from the info collected, companies in several industries and of varied sizes can obtain a variety of enterprise outcomes. These will be broadly categorized into the next core enterprise outcomes:

  • Enhancing operational effectivity – By making sense of the info collected from numerous operational processes, companies can enhance buyer expertise, improve manufacturing effectivity, and improve gross sales and advertising and marketing agility
  • Making extra knowledgeable choices – By means of growing extra significant insights by bringing collectively full image of knowledge throughout a corporation, companies could make extra knowledgeable choices
  • Accelerating innovation – Combining inside and exterior knowledge sources allow quite a lot of AI and machine studying (ML) use circumstances that assist companies automate processes and unlock enterprise alternatives that had been both not possible to do or too tough to do earlier than

Enterprise challenges

Exponential knowledge progress has additionally introduced enterprise challenges.

Initially, companies have to entry all knowledge throughout the group, and knowledge could also be distributed in silos. It comes from quite a lot of sources, in a variety of knowledge sorts and in massive quantity and velocity. Some knowledge could also be saved as structured knowledge in relational databases. Different knowledge could also be saved as semi-structured knowledge in object shops, corresponding to media information and the clickstream knowledge that’s consistently streaming from cellular units.

Secondly, to construct insights from knowledge, companies have to dive deep into the info by conducting analytics. These analytics actions usually contain dozens and a whole lot of knowledge analysts who have to entry the system concurrently. Having a performant system that’s scalable to satisfy the question demand is commonly a problem. It will get extra complicated when companies have to share the analyzed knowledge with their prospects.

Final however not least, companies want a cheap resolution to handle knowledge silos, efficiency, scalability, safety, and compliance challenges. Having the ability to visualize and predict price is critical for a enterprise to measure the cost-effectiveness of its resolution.

To resolve these challenges, companies want a future proof fashionable knowledge structure and a sturdy, environment friendly analytics system.

Fashionable knowledge structure

A contemporary knowledge structure allows organizations to retailer any quantity of knowledge in open codecs, break down disconnected knowledge silos, empower customers to run analytics or ML utilizing their most well-liked device or approach, and handle who has entry to particular items of knowledge with the correct safety and knowledge governance controls.

The AWS knowledge lake structure is a contemporary knowledge structure that lets you retailer knowledge in an information lake and use a hoop of purpose-built knowledge companies across the lake, as proven within the following determine. This lets you make choices with velocity and agility, at scale, and cost-effectively. For extra particulars, consult with Fashionable Information Structure on AWS.

Fashionable knowledge warehouse

Amazon Redshift is a completely managed, scalable, fashionable knowledge warehouse that accelerates time to insights with quick, simple, and safe analytics at scale. With Amazon Redshift, you possibly can analyze all of your knowledge and get efficiency at any scale with low and predictable prices.

Amazon Redshift presents the next advantages:

  • Analyze all of your knowledge – With Amazon Redshift, you possibly can simply analyze all of your knowledge throughout your knowledge warehouse and knowledge lake with constant safety and governance insurance policies. We name this the fashionable knowledge structure. With Amazon Redshift Spectrum, you possibly can question knowledge in your knowledge lake without having for loading or different knowledge preparation. And with knowledge lake export, it can save you the outcomes of an Amazon Redshift question again into the lake. This implies you possibly can benefit from real-time analytics and ML/AI use circumstances with out re-architecture, as a result of Amazon Redshift is totally built-in along with your knowledge lake. With new capabilities like knowledge sharing, you possibly can simply share knowledge throughout Amazon Redshift clusters each internally and externally, so everybody has a stay and constant view of the info. Amazon Redshift ML makes it simple to do extra along with your knowledge—you possibly can create, prepare, and deploy ML fashions utilizing acquainted SQL instructions instantly in Amazon Redshift knowledge warehouses.
  • Quick efficiency at any scale – Amazon Redshift is a self-tuning and self-learning system that lets you get the very best efficiency in your workloads with out the undifferentiated heavy lifting of tuning your knowledge warehouse with duties corresponding to defining type keys and distribution keys, and new capabilities like materialized views, auto-refresh, and auto-query rewrite. Amazon Redshift scales to ship constantly quick outcomes from gigabytes to petabytes of knowledge, and from a couple of customers to 1000’s. As your consumer base scales to 1000’s of concurrent customers, the concurrency scaling functionality mechanically deploys the required compute assets to handle the extra load. Amazon Redshift RA3 situations with managed storage separate compute and storage, so you possibly can scale every independently and solely pay for the storage you want. AQUA (Superior Question Accelerator) for Amazon Redshift is a brand new distributed and hardware-accelerated cache that mechanically boosts sure kinds of queries.
  • Straightforward analytics for everybody – Amazon Redshift is a completely managed knowledge warehouse that abstracts away the burden of detailed infrastructure administration or efficiency optimization. You possibly can deal with attending to insights, slightly than performing upkeep duties like provisioning infrastructure, creating backups, establishing the format of knowledge, and different duties. You possibly can function knowledge in open codecs, use acquainted SQL instructions, and benefit from question visualizations out there via the brand new Question Editor v2. You may also entry knowledge from any utility via a safe knowledge API with out configuring software program drivers, managing database connections. Amazon Redshift is appropriate with enterprise intelligence (BI) instruments, opening up the facility and integration of Amazon Redshift to enterprise customers who function from inside the BI device.

A contemporary knowledge structure with an information lake structure and fashionable knowledge warehouse with Amazon Redshift helps companies in all totally different sizes deal with large knowledge challenges, make sense of a considerable amount of knowledge, and drive enterprise outcomes. You can begin the journey of constructing a contemporary knowledge structure by migrating your knowledge warehouse to Amazon Redshift.

Migration concerns

Information warehouse migration presents a problem by way of venture complexity and poses a threat by way of assets, time, and value. To scale back the complexity of knowledge warehouse migration, it’s important to decide on a proper migration technique primarily based in your current knowledge warehouse panorama and the quantity of transformation required emigrate to Amazon Redshift. The next are the important thing elements that may affect your migration technique choice:

  • Measurement – The full measurement of the supply knowledge warehouse to be migrated is decided by the objects, tables, and databases which might be included within the migration. A very good understanding of the info sources and knowledge domains required for transferring to Amazon Redshift results in an optimum sizing of the migration venture.
  • Information switch – Information warehouse migration includes knowledge switch between the supply knowledge warehouse servers and AWS. You possibly can both switch knowledge over a community interconnection between the supply location and AWS corresponding to AWS Direct Join or switch knowledge offline by way of the instruments or companies such because the AWS Snow Household.
  • Information change price – How typically do knowledge updates or adjustments happen in your knowledge warehouse? Your current knowledge warehouse knowledge change price determines the replace intervals required to maintain the supply knowledge warehouse and the goal Amazon Redshift in sync. A supply knowledge warehouse with a excessive knowledge change price requires the service switching from the supply to Amazon Redshift to finish inside an replace interval, which results in a shorter migration cutover window.
  • Information transformation – Shifting your current knowledge warehouse to Amazon Redshift is a heterogenous migration involving knowledge transformation corresponding to knowledge mapping and schema change. The complexity of knowledge transformation determines the processing time required for an iteration of migration.
  • Migration and ETL instruments – The collection of migration and extract, remodel, and cargo (ETL) instruments can affect the migration venture. For instance, the efforts required for deployment and setup of those instruments can fluctuate. We glance nearer at AWS instruments and companies shortly.

After you’ve got factored in all these concerns, you possibly can decide a migration technique choice in your Amazon Redshift migration venture.

Migration methods

You possibly can select from three migration methods: one-step migration, two-step migration, or wave-based migration.

One-step migration is an efficient choice for databases that don’t require steady operation corresponding to steady replication to maintain ongoing knowledge adjustments in sync between the supply and vacation spot. You possibly can extract current databases as comma separated worth (CSV) information, or columnar format like Parquet, then use AWS Snow Household companies corresponding to AWS Snowball to ship datasets to Amazon Easy Storage Service (Amazon S3) for loading into Amazon Redshift. You then take a look at the vacation spot Amazon Redshift database for knowledge consistency with the supply. In spite of everything validations have handed, the database is converted to AWS.

Two-step migration is often used for databases of any measurement that require steady operation, corresponding to the continual replication. Throughout the migration, the supply databases have ongoing knowledge adjustments, and steady replication retains knowledge adjustments in sync between the supply and Amazon Redshift. The breakdown of the two-step migration technique is as follows:

  • Preliminary knowledge migration – The information is extracted from the supply database, ideally throughout non-peak utilization to attenuate the affect. The information is then migrated to Amazon Redshift by following the one-step migration strategy described beforehand.
  • Modified knowledge migration – Information that modified within the supply database after the preliminary knowledge migration is propagated to the vacation spot earlier than switchover. This step synchronizes the supply and vacation spot databases. After all of the modified knowledge is migrated, you possibly can validate the info within the vacation spot database and carry out obligatory assessments. If all assessments are handed, you then swap over to the Amazon Redshift knowledge warehouse.

Wave-based migration is appropriate for large-scale knowledge warehouse migration tasks. The precept of wave-based migration is taking precautions to divide a fancy migration venture into a number of logical and systematic waves. This technique can considerably cut back the complexity and threat. You begin from a workload that covers variety of knowledge sources and topic areas with medium complexity, then add extra knowledge sources and topic areas in every subsequent wave. With this technique, you run each the supply knowledge warehouse and Amazon Redshift manufacturing environments in parallel for a sure period of time earlier than you possibly can totally retire the supply knowledge warehouse. See Develop an utility migration methodology to modernize your knowledge warehouse with Amazon Redshift for particulars on how one can establish and group knowledge sources and analytics purposes emigrate from the supply knowledge warehouse to Amazon Redshift utilizing the wave-based migration strategy.

To information your migration technique choice, consult with the next desk to map the consideration elements with a most well-liked migration technique.

. One-Step Migration Two-Step Migration Wave-Primarily based Migration
The variety of topic areas in migration scope Small Medium to Massive Medium to Massive
Information switch quantity Small to Massive Small to Massive Small to Massive
Information change price throughout migration None Minimal to Frequent Minimal to Frequent
Information transformation complexity Any Any Any
Migration change window for switching from supply to focus on Hours Seconds Seconds
Migration venture period Weeks Weeks to Months Months

Migration course of

On this part, we overview the three high-level steps of the migration course of. The 2-step migration technique and wave-based migration technique contain all three migration steps. Nonetheless, the wave-based migration technique consists of various iterations. As a result of solely databases that don’t require steady operations are good suits for one-step migration, solely Steps 1 and a couple of within the migration course of are required.

Step 1: Convert schema and topic space

On this step, you make the supply knowledge warehouse schema appropriate with the Amazon Redshift schema by changing the supply knowledge warehouse schema utilizing schema conversion instruments corresponding to AWS Schema Conversion Device (AWS SCT) and the opposite instruments from AWS companions. In some conditions, you might also be required to make use of customized code to conduct complicated schema conversions. We dive deeper into AWS SCT and migration greatest practices in a later part.

Step 2: Preliminary knowledge extraction and cargo

On this step, you full the preliminary knowledge extraction and cargo the supply knowledge into Amazon Redshift for the primary time. You should utilize AWS SCT knowledge extractors to extract knowledge from the supply knowledge warehouse and cargo knowledge to Amazon S3 in case your knowledge measurement and knowledge switch necessities mean you can switch knowledge over the interconnected community. Alternatively, if there are limitations corresponding to community capability restrict, you possibly can load knowledge to Snowball and from there knowledge will get loaded to Amazon S3. When the info within the supply knowledge warehouse is obtainable on Amazon S3, it’s loaded to Amazon Redshift. In conditions when the supply knowledge warehouse native instruments do a greater knowledge unload and cargo job than AWS SCT knowledge extractors, chances are you’ll select to make use of the native instruments to finish this step.

Step 3: Delta and incremental load

On this step, you employ AWS SCT and generally supply knowledge warehouse native instruments to seize and cargo delta or incremental adjustments from sources to Amazon Redshift. That is typically referred to vary knowledge seize (CDC). CDC is a course of that captures adjustments made in a database, and ensures that these adjustments are replicated to a vacation spot corresponding to an information warehouse.

You must now have sufficient info to begin growing a migration plan in your knowledge warehouse. Within the following part, I dive deeper into the AWS companies that may enable you migrate your knowledge warehouse to Amazon Redshift, and the very best practices of utilizing these companies to speed up a profitable supply of your knowledge warehouse migration venture.

Information warehouse migration companies

Information warehouse migration includes a set of companies and instruments to help the migration course of. You start with making a database migration evaluation report after which changing the supply knowledge schema to be appropriate with Amazon Redshift by utilizing AWS SCT. To maneuver knowledge, you need to use the AWS SCT knowledge extraction device, which has integration with AWS Information Migration Service (AWS DMS) to create and handle AWS DMS duties and orchestrate knowledge migration.

To switch supply knowledge over the interconnected community between the supply and AWS, you need to use AWS Storage Gateway, Amazon Kinesis Information Firehose, Direct Join, AWS Switch Household companies, Amazon S3 Switch Acceleration, and AWS DataSync. For knowledge warehouse migration involving a big quantity of knowledge, or if there are constraints with the interconnected community capability, you possibly can switch knowledge utilizing the AWS Snow Household of companies. With this strategy, you possibly can copy the info to the system, ship it again to AWS, and have the info copied to Amazon Redshift by way of Amazon S3.

AWS SCT is a necessary service to speed up your knowledge warehouse migration to Amazon Redshift. Let’s dive deeper into it.

Migrating utilizing AWS SCT

AWS SCT automates a lot of the method of changing your knowledge warehouse schema to an Amazon Redshift database schema. As a result of the supply and goal database engines can have many various options and capabilities, AWS SCT makes an attempt to create an equal schema in your goal database wherever attainable. If no direct conversion is feasible, AWS SCT creates a database migration evaluation report that can assist you convert your schema. The database migration evaluation report supplies necessary details about the conversion of the schema out of your supply database to your goal database. The report summarizes all of the schema conversion duties and particulars the motion gadgets for schema objects that may’t be transformed to the DB engine of your goal database. The report additionally consists of estimates of the quantity of effort that it’s going to take to put in writing the equal code in your goal database that may’t be transformed mechanically.

Storage optimization is the guts of an information warehouse conversion. When utilizing your Amazon Redshift database as a supply and a take a look at Amazon Redshift database because the goal, AWS SCT recommends type keys and distribution keys to optimize your database.

With AWS SCT, you possibly can convert the next knowledge warehouse schemas to Amazon Redshift:

  • Amazon Redshift
  • Azure Synapse Analytics (model 10)
  • Greenplum Database (model 4.3 and later)
  • Microsoft SQL Server (model 2008 and later)
  • Netezza (model 7.0.3 and later)
  • Oracle (model 10.2 and later)
  • Snowflake (model 3)
  • Teradata (model 13 and later)
  • Vertica (model 7.2 and later)

At AWS, we proceed to launch new options and enhancements to enhance our product. For the newest supported conversions, go to the AWS SCT Consumer Information.

Migrating knowledge utilizing AWS SCT knowledge extraction device

You should utilize an AWS SCT knowledge extraction device to extract knowledge out of your on-premises knowledge warehouse and migrate it to Amazon Redshift. The agent extracts your knowledge and uploads the info to both Amazon S3 or, for large-scale migrations, an AWS Snowball Household service. You possibly can then use AWS SCT to repeat the info to Amazon Redshift. Amazon S3 is a storage and retrieval service. To retailer an object in Amazon S3, you add the file you wish to retailer to an S3 bucket. While you add a file, you possibly can set permissions on the thing and likewise on any metadata.

In large-scale migrations involving knowledge add to a AWS Snowball Household service, you need to use wizard-based workflows in AWS SCT to automate the method through which the info extraction device orchestrates AWS DMS to carry out the precise migration.

Concerns for Amazon Redshift migration instruments

To enhance and speed up knowledge warehouse migration to Amazon Redshift, take into account the next ideas and greatest practices. Tthis checklist just isn’t exhaustive. Be sure you have understanding of your knowledge warehouse profile and decide which greatest practices you need to use in your migration venture.

  • Use AWS SCT to create a migration evaluation report and scope migration effort.
  • Automate migration with AWS SCT the place attainable. The expertise from our prospects exhibits that AWS SCT can mechanically create the vast majority of DDL and SQL scripts.
  • When automated schema conversion just isn’t attainable, use customized scripting for the code conversion.
  • Set up AWS SCT knowledge extractor brokers as shut as attainable to the info supply to enhance knowledge migration efficiency and reliability.
  • To enhance knowledge migration efficiency, correctly measurement your Amazon Elastic Compute Cloud (Amazon EC2) occasion and its equal digital machines that the info extractor brokers are put in on.
  • Configure a number of knowledge extractor brokers to run a number of duties in parallel to enhance knowledge migration efficiency by maximizing the utilization of the allotted community bandwidth.
  • Modify AWS SCT reminiscence configuration to enhance schema conversion efficiency.
  • Use Amazon S3 to retailer the massive objects corresponding to pictures, PDFs, and different binary knowledge out of your current knowledge warehouse.
  • Emigrate massive tables, use digital partitioning and create sub-tasks to enhance knowledge migration efficiency.
  • Perceive the use circumstances of AWS companies corresponding to Direct Join, the AWS Switch Household, and the AWS Snow Household. Choose the precise service or device to satisfy your knowledge migration necessities.
  • Perceive AWS service quotas and make knowledgeable migration design choices.

Abstract

Information is rising in quantity and complexity quicker than ever. Nonetheless, solely a fraction of this invaluable asset is obtainable for evaluation. Conventional on-premises knowledge warehouses have inflexible architectures that don’t scale for contemporary large knowledge analytics use circumstances. These conventional knowledge warehouses are costly to arrange and function, and require massive upfront investments in each software program and {hardware}.

On this publish, we mentioned Amazon Redshift as a completely managed, scalable, fashionable knowledge warehouse that may enable you analyze all of your knowledge, and obtain efficiency at any scale with low and predictable price. Emigrate your knowledge warehouse to Amazon Redshift, that you must take into account a variety of things, corresponding to the full measurement of the info warehouse, knowledge change price, and knowledge transformation complexity, earlier than choosing an acceptable migration technique and course of to cut back the complexity and value of your knowledge warehouse migration venture. With AWS companies such AWS SCT and AWS DMS, and by adopting the information and the very best practices of those companies, you possibly can automate migration duties, scale migration, speed up the supply of your knowledge warehouse migration venture, and delight your prospects.


Concerning the Creator

Lewis Tang is a Senior Options Architect at Amazon Internet Companies primarily based in Sydney, Australia. Lewis supplies companions steering to a broad vary of AWS companies and assist companions to speed up AWS observe progress.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments