Data Integration: Google BigQuery with Mage
Shashank Mishra
Data Engineering
TLDR
This article outlines the integration between
Mageand
Google BigQuery, a serverless data warehousing service. We'll discuss the integration process, its benefits, and how it aids businesses in making data-driven decisions.
Outline
Introduction to Mage
Overview of Google BigQuery
Step by step process to integrate Google BigQuery with Mage
Conclusion
Introduction to Mage
In an age where data is the new oil, efficient and reliable data management tools are essential.
Mageis a platform committed to simplifying data integration and analytics. Designed for seamless data transformation and loading, Mage is transforming how businesses approach data management. Here are its key features:
Automated Data Pipeline
: Mage automates data extraction, transformation, and loading (ETL) processes. It can extract data from multiple sources, transform it to a desirable format, and load it into a data warehouse.
Data Connectors
: Mage offers various data connectors to widely-used data sources like Shopify, Facebook Ads, Google Ads, Google Analytics, etc. This makes it easier to import data from these platforms.
Easy Integration
: Mage provides easy integration with popular data warehouses including Google BigQuery, Amazon Redshift, and Snowflake.
Pre-built SQL Models
: Mage comes with pre-built SQL models for popular e-commerce platforms like Shopify and WooCommerce. These models simplify the process of data analysis.
Incremental Loading
: Mage supports incremental loading, which means only new or updated data is loaded into the data warehouse. This saves storage space and improves efficiency.
Data Transformations
: Mage performs automatic data transformations, converting raw data into a more usable format. This process makes the data ready for analysis and reporting.
Scheduled Refresh
: Data refreshes can be scheduled in Mage, ensuring that the data in the warehouse is always up-to-date.
Data Security
: Mage places a high emphasis on data security, ensuring data privacy and compliance with GDPR and other data protection regulations.
(Source:
Overview of Google BigQuery
is a highly scalable, serverless data warehouse offered by Google as part of its Google Cloud Platform (GCP). It is designed to streamline and simplify the processing of big data.
Serverless Architecture:
BigQuery operates on a serverless model, which means users don't need to manage any servers or infrastructure. This means you can focus more on analysis and less on maintenance. It allows you to query massive datasets in seconds and get insights in real-time, without needing to worry about resource provision.
Real-Time Analytics:
BigQuery is engineered for real-time analytics. It allows users to analyze real-time data streams instantly. With its ability to run SQL queries on petabytes of data, it delivers speedy results on real-time data analytics, enabling businesses to make timely decisions.
Google BigQuery, with its serverless architecture and real-time analytics, serves as a robust platform to handle, analyze, and draw insights from massive datasets with ease.
(Source:
Step by step process to migrate Google BigQuery with Mage
Before we begin, we’ll need to create a service account key. Please read
Google Cloud’s documentationon how to create that.
Once we are finished, following these steps:
Create a new pipeline or open an existing pipeline.
Expand the left side of the screen to view the file browser.
Scroll down and click on a file named
io_config.yaml
Enter the following keys and values under the key named
default
(we can have multiple profiles, add it under whichever is relevant for us)
Note: we only need to add the keys under
GOOGLE_SERVICE_ACC_KEY
or the value for key
GOOGLE_SERVICE_ACC_KEY_FILEPATH
(both are not simultaneously required).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
version: 0.1.1
default:
GOOGLE_SERVICE_ACC_KEY:
type: service_account
project_id: project-id
private_key_id: key-id
private_key:
"-----BEGIN PRIVATE KEY-----\nyour_private_key\n-----END_PRIVATE_KEY"
client_email: your_service_account_email
auth_uri: "https://accounts.google.com/o/oauth2/auth"
token_uri: "https://accounts.google.com/o/oauth2/token"
auth_provider_x509_cert_url: "https://www.googleapis.com/oauth2/v1/certs"
client_x509_cert_url:
"https://www.googleapis.com/robot/v1/metadata/x509/your_service_account_email"
GOOGLE_SERVICE_ACC_KEY_FILEPATH: "/path/to/your/service/account/key.json"
Using SQL block
Create a new pipeline or open an existing pipeline.
Add a data loader, transformer, or data exporter block.
Select
SQL
.
Under the
Data provider
dropdown, select
BigQuery
.
Under the
Profile
dropdown, select
default
(or the profile we added credentials underneath).
Next to the
Database
label, enter the database name we want this block to save data to.
Next to the
Save to schema
label, enter the schema name we want this block to save data to.
Under the
Write policy
dropdown, select
Replace
or
Append
(please see
SQL blocks guidefor more information on write policies).
Enter in this test query:
SELECT 1
Run the block.
Using Python block
Create a new pipeline or open an existing pipeline.
Add a data loader, transformer, or data exporter block (the code snippet below is for a data loader).
Select
Generic
(no template).
Enter this code snippet (note: change the
config_profile
from
default
if we have a different profile):
1
2
3
4
5
from mage_ai.data_preparation.repo_manager import get_repo_path
from mage_ai.io.bigquery import BigQuery
from mage_ai.io.config import ConfigFileLoader
from os import path
from pandas import DataFrame
1
2
if 'data_loader' not in globals():
from mage_ai.data_preparation.decorators import data_loader
1
2
3
4
5
@data_loader
def load_data_from_big_query(**kwargs) -> DataFrame:
query = 'SELECT 1'
config_path = path.join(get_repo_path(), 'io_config.yaml')
config_profile = 'default'
1
return BigQuery.with_config(ConfigFileLoader(config_path, config_profile)).load(query)
Run the block.
(Source:
Conclusion
Integrating Mage with Google BigQuery provides your team with a potent combination of automated data pipeline management and robust data warehousing. This partnership not only simplifies data extraction, transformation, and loading but also provides a seamless pathway for data analysis and insight generation. As we've demonstrated in this step-by-step guide, the integration process is straightforward, making it an accessible option for businesses of all sizes. By leveraging this integration, you can unlock the full potential of your data, streamline operations, and drive data-informed decisions.
Start building for free
No need for a credit card to get started.
Trying out Mage to build ranking models won’t cost a cent.
No need for a credit card to get started. Trying out Mage to build ranking models won’t cost a cent.