TLDR
Data integration is synchronizing information between 1 system (e.g.
Salesforce) and another (e.g.
Snowflake)
Outline
What is it?
Why do you need it?
How do you implement it?
Who is it for?
Future and next steps
What is it?
Data integration is the process of loading data from an external source and then exporting it to an internal destination. This process can also be reversed (e.g. reverse ETL).
Scenario
You regularly collect gold from scattered kingdoms that have pledged loyalty to your throne. You collect their gold payments through
Stripe.
Source: Giphy
Your treasury (aka finance team) wants to better understand which demographics of kingdoms pay well and on time.
Why do you need it?
Use cases for integrating external data with internal systems include (but aren’t limited to):
Combine data from multiple sources to build comprehensive data models for business use cases
Analyze data from external SaaS tools
Combine data for personalization
Forecasting
Example use cases for loading data from your internal systems (e.g.
PostgreSQL) and then exporting it to an external system (e.g.
HubSpot) include (but aren’t limited to):
Getting internal application data into marketing tools (e.g.
Mailchimp,
Google Ads, etc.) for outreach campaigns, ads, etc.
Syncing user data with CRMs (e.g.
Salesforce)
Exporting internal data to task management (e.g.
Airtable) software for ops
Scenario
You’re having a big tournament to celebrate your next heir to the throne. You put out lots of ad posters all over the land. Many knights sign up to compete in your tournament.
Source: Giphy
You build a data integration between all the ad service providers (e.g.
Google Ads,
Facebook Ads, etc.), ingesting data from those APIs and storing it in your
BigQuerydata warehouse.
Your team uses the ad campaign performance data, knight sign ups, tournament results , and spending from spectators to calculate the ROI of ad campaigns. These models will help your team better promote the next tournament more efficiently.
How do you implement it?
There are 3 ways you can accomplish data integration:
Source: Giphy
Some benefits of using SAAS or open-source software is that you get some out-of-the-box features such as (not a comprehensive list):
When a 3rd party API gets updated, maintainers will help update it
When there is a duplicate record, you can automatically ignore it or update an existing record
Automatically track progress and status of synchronizations
Retry failed synchronizations
And more…
Who is it for?
Typically, data engineering owns data integrations. Data engineers can choose how it’s implemented (e.g. buy SAAS or implement open-source software).
Marketing, sales, operations, etc. can help influence which 3rd party sources need data integration or which 3rd party APIs need data synced to it from internal data warehouses.
Depending on which sources are required by specific business use cases, different solutions are required.
Source: Giphy
Companies use SAAS or a self-hosted open-source software to integrate data from common sources. When an uncommon source doesn’t exist out-of-the-box from a SAAS provider, teams typically write the code themselves to handle the synchronization.
In addition, when a team needs to export data from their internal systems to 3rd party APIs (e.g. Salesforce) and has very customized needs, they also typically write custom code to handle their specific use case.
Future and next steps
Once data is integrated into a common destination, you can begin to combine it with data from other sources, analyze it, then predict on it.
In order to accomplish that, you’ll first need to clean the data, wrangle it, and transform the data all through a data pipeline. Managing this process in a testable, repeatable, and observable way is critical.
Source: Giphy
In this data integration series, upcoming content will include:
- Singer Spec
: the data engineering community standard for writing data integrations
How to write your own data integration
How to build an end-to-end data pipeline to sync data
Start building for free
No need for a credit card to get started.
Trying out Mage to build ranking models won’t cost a cent.
No need for a credit card to get started. Trying out Mage to build ranking models won’t cost a cent.