There is no doubt that virtual data warehousing (the “cloud”) is becoming increasingly important and is the future of how enterprises use and store their big data. And so it becomes all the more important to judiciously decide on a data warehousing solution that aligns with your business needs and helps you get the crucial business insights in a timely and cost efficient manner. Looking at how the enterprise data is growing in volume and complexity, sometimes a hybrid solution works better and so it’s important to objectively look at the benefits of all the cloud warehousing options that are available and work on a custom solution that works based on the organization’s goals. This article explains two of the most popular and common platforms enterprises use today, with use cases to use one or both based on your needs—BigQuery (Google Cloud Platform – GCP) and Snowflake.
Thanks to working with many global organizations, many times the InfoTrust team gets asked about the feasibility of working with more than one cloud platform. Questions asked like:
- Does it provide cost and query execution efficiencies to use more than one cloud platform (hybrid cloud) with Snowflake being our primary cloud platform? If yes, what are the possible use cases?
- If Snowflake is our organization’s primary cloud platform, which cloud platform can it run on? Which one will help us get the most synergies between the two?
- What are the benefits of one cloud platform over another to run the Snowflake instance (GCP vs AWS vs Azure)?
- Can I run my Snowflake instance on GCP and also use BigQuery? If yes, what would be the possible use cases? Why GCP and not AWS or Azure?
If your organization is also looking to find answers to some of these questions, read on! I’m hoping you will gain some clarity and direction.
About Google BigQuery
Released in May 2010, BigQuery is Google’s fully managed, serverless data warehouse that enables scalable analysis. BigQuery employs columnar storage for fast data scanning and a tree architecture for dispatching queries and aggregating results across huge computer clusters. With Google BigQuery you can focus on what is important to get the most value from your data. Infrastructure and scalability are handled automatically, and you only pay for the use of the service. With extreme performance and a standard SQL language, you can explore your data in completely new ways without thinking about investing in hardware or licenses. This provides unique flexibility and facilitates less time spent from data to valuable insights.
Launched in October 2014, Snowflake is a fully managed SaaS (software as a service) that provides a single platform for data warehousing, data lakes, data engineering, data science, data application development, and secure sharing and consumption of real-time / shared data. Snowflake includes out-of-the-box features like separation of storage and compute, scalable compute, data sharing, data cloning, and third-party tools support in order to handle the demanding needs of growing enterprises.
If You Have Snowflake as Your Enterprise Data Warehouse …
Snowflake is a great data warehouse choice for any enterprise because of the multitude of benefits that it offers. From easy migration to data security, it has all the features that a global enterprise will look for to manage their big data analytics.
Apart from the above, some of the other noteworthy benefits of Snowflake are as follows:
- Extreme scalability: Users can create a practically unlimited number of virtual warehouses, each one running its own workload against the data in its database. What’s more, users can quickly and easily resize the number of nodes in each cluster for optimal performance. Snowflake can also improve connectivity and optimize database storage.
- Cloud-based: Snowflake is a cloud data warehouse, meaning you don’t need to invest in expensive hardware to store and analyze data.
- Automatic performance tuning: Snowflake includes built-in automatic query performance optimization via a query optimization engine, without the need for users to manually tweak the settings.
- Strong data security: Snowflake comes with a wide variety of industry-leading data security features, including IP allow and block lists, multi-factor authentication, and automatic 256-bit AES encryption. Snowflake is said to be compliant with data security standards such as GDPR, HIPAA, PCI DSS, SOC 1, and SOC 2.
- User access controls: Now you can control who accesses your sensitive data and prevent insiders from viewing confidential information.
But is that all? Maybe not. Since we looked into the benefits of Snowflake, it’s only fair to look at the features where Snowflake misses by a bit and raises a wider discussion of data durability, integrity, and overall data ecosystem design.
One such topic of discussion is where to run the Snowflake instance? As we know, we need a baseline cloud platform to run the Snowflake instance on, whether it be Amazon (AWS), Google (GCP), or Microsoft (Azure) public clouds. This is the decision that an enterprise has to make.
So What Makes Google Cloud Platform a Great Candidate for Snowflake Integration?
Snowflake on GCP provides users with the best of both worlds. All of the benefits of these robust platforms combined are bound to give you unmatched data processing and analyzing power. Some of the notable GCP features are:
Understanding the Architectural and Processing Handshake between BigQuery and Snowflake
- Google Cloud Platform, like Snowflake, separates storage from data processing; both of these can find higher performance synergies compared to other available public cloud options.
- Snowflake supports structured and semi-structured data whereas GCP (BigQuery) also supports unstructured data in columnar format (NESTED data). Some may look at this as a disadvantage or a need of data transformation before ETL considering Analytics and Ad Tech data at times has NESTED fields, but if we really understand the data warehousing concepts and align it with the modern data tracking concept of “data minimization and tracking with purpose”, this will be looked at as an opportunity for an organization to decide what data lives in which platform hence leading to better analytical capabilities.
- Snowflake has broader support for use cases beyond traditional reporting and dashboards. Its decoupled storage and compute architecture enables you to isolate different workloads to meet SLAs, and it also supports high user concurrency. But Snowflake is not the best with interactive or ad hoc query performance because of inefficient data access along with a lack of extensive indexing and query optimization. This is where BigQuery comes in handy. You can isolate workloads by assigning each workload to different reserved slots. Unlike Snowflake, Redshift, or Athena, BigQuery also supports low latency streaming.
Other Significant Benefits that GCP (BigQuery) provides to the Overall Data Ecosystem while Working in Parallel with Snowflake
- GCP (BigQuery) being a Google product has seamless integration capabilities with other Google SaaS platforms like DV360, SA360, Campaign Manager, and Google Ads. Most of the enterprise organizations run their media campaigns on these platforms and having a seamless integration is one of the major requirements to get media data integrated with Analytics data. This opens up a huge opportunity for analyzing their customer behavior and optimizing marketing strategies accordingly.
Even if most of the enterprise data lives in Snowflake that runs on Google Cloud, due to the ease of integration and with predictive analytics and machine learning capabilities available in BigQuery, the first-party data analysis, audience building, and subsequent R-ETL to Ad Tech platforms for activation can happen via GCP by using out-of-the-box APIs or data transfer services. This will definitely help build synergies between the two data warehouses and provide selective utility depending on their advantages.
- As we know that the machine learning and predictive analytics capabilities is one of the biggest advantages one can get by using BigQuery for data integration and analysis, it is advised to use BigQuery to run data modeling use cases, even if Snowflake is an enterprise primary data warehouse.
- If your organization’s go-to analytics platform is Google Analytics and if the requirement is to integrate Analytics and media data, this can be achieved rather seamlessly in BigQuery without changing the baseline data architecture. Snowflake would require analytics data to be UNNESTED (flattened) before we could actually import the data in the platform to maintain the event sequence. Whereas if the integration and analysis is performed in BigQuery, we can still work with the NESTED data structure. This helps reduce a step in data cleansing and also helps maintain data more efficiently in the data warehouse.
- When it comes to query run-time optimizations and cost efficiency, Snowflake and BigQuery in tandem provide significant benefits. If your enterprise deals with large data sets but with a spiky workload, it would be ideal to choose BigQuery to optimize cost whereas if you are an enterprise and you have a steadier usage pattern when it comes to queries and data, you can opt for Snowflake as you get a chance to cram more queries in the same compute hours, it would be cost-effective to use Snowflake in this case.
Snowflake and BigQuery both offer decoupled storage and compute resources that can scale to meet all of your mission-critical workload requirements which makes the case stronger to run the Snowflake instance on GCP rather than any other cloud providers in the market. It empowers the organizations to not only utilize the GCP-Snowflake in-sync benefits, but also enables use cases which are better run in BigQuery.
To summarize when to use which data warehouse:
Use BigQuery when:
- You have very large data, but inconsistent spikes in workload (i.e. you run lots of queries occasionally, with high idle time). BigQuery will probably be cheaper and easier for you with its query-based pricing.
- You have many use cases that require Ad Tech (GMP) integrations with GCP coupled with data import. Since both the GMP and GCP suites come from the same parent company, there is a high probability of achieving integration synergies.
- You are planning to do some advance analysis that will require BigQuery’s Machine Learning capabilities.
Use Snowflake when:
- You have a steadier usage pattern when it comes to queries and data. It may be more cost effective to go with Snowflake, since you’ll be able to cram more queries into your compute hours.
- You are dealing with structured/process datasets. Though Snowflake supports semi-structured and unstructured data formats, BigQuery will be an easier pick to work on these data formats (specially when it is Analytics and Ad Tech data from GMP suite).