Snowflake vs. BigQuery: What’s the Difference?

Estimated Reading Time: 9 minutes
April 24, 2023
Snowflake vs. BigQuery: What’s the Difference?

In this article, we will articulate the differences between Snowflake and BigQuery. The goal is not to say which platform is better, but rather to highlight their differences and expose what tasks they might be suited for individually. Both platforms offer a lot of benefits, and they are more unique than they are similar.

Before we dive in, let’s review some important definitions:

  • Data Lake – A place for all unstructured data. In layman’s terms, it’s where we put all of our data regardless of form or function. We dump everything into the lake; then we can decide what to do with it later.
  • Data Warehouse – A place for processed, structured data. In layman’s terms, this is where we put data that has been cleaned and processed for a purpose. These tables are clearly defined with a clear defined reason for existing.
  • Space/Memory – Databases storing data that has to be physically somewhere. Cloud storage is stored just like data is stored on the harddrive on your computer—and just like your harddrive on your laptop, that storage is limited.
  • Processing/Processor – In order to operate a database, we need to process the data. Just like reading data off your own laptop’s harddrive, you need a processor to read data. When we build databases we have to be sure there are enough processors to support the load of reading the data.

Now let’s explore what exactly BigQuery and Snowflake are.

Infrastructure

  • BigQuery is a serverless database. Serverless is a funny term, because nothing can really be “serverless” in that servers are computers and we need computers to run our software. What serverless means is it is not run by any single computer. It’s not one hard-drive, or not one processor, but instead a whole suite of computers whose job is to keep BigQuery running as a platform, like a SaaS product. When a non-technical person reads serverless what you should hear is scaling without the need for IT to deal with it. Ultimately this translates to less upkeep, no need to provision more space, or add more processors for better computing time of queries. It’s a fully managed scaling system. We give it data, and it does its job, that simple.
  • Snowflake is a traditional data service. Unlike BigQuery’s serverless structure, we need to tell Snowflake how much data we need to store and how many processors our teams need to do their job. If we run out of room or need more processing power, someone will have to configure the system to solve those issues. While this is more work, it more closely resembles the environment your IT team would be used to if they have been managing databases in the past. Oftentimes this adds to an easier transition of cloud data ownership.

Cost

  • Snowflake is priced as a traditional data service. Snowflake charges based on the amount of space provisioned and the amount of processors you have checked out at a given time. You can expect your data to continue to cost a flat fee month to month and as it grows you can expect that data to steadily continue.
  • BigQuery is priced uniquely. BigQuery pricing can be a bit complicated to explain. You are charged for two things when storing data in BigQuery. You pay a flat fee per TB stored per month, and you pay a fee for each TB of data processed. These numbers are very low, like fractions of cents per TB processed and approximately $5 per TB stored. Another caveat is if you do not change a table it will actually become cheaper to store over time. 
  • Clever BigQuery partitioning. This means if you architect your tables correctly you can partition them daily so that they never change and therefore they always get cheaper and cheaper to store over time. Partitioning is when we store a fraction of a certain table by itself (for example, our users table). Each day we separate a new users table and store it by itself. We can query these separate tables as one big table and treat them as a single database, but we can also query them individually if we want to see users on a specific day. What this does is keep us from changing the daily tables so they become cheaper to store over time; it limits how much data we process (cost saving) when we only look at tables in a date range, and limits damage if something were to go wrong with a single daily table.

Processing

  • Snowflake processing is limited to how many processors you’re using. Snowflake as a platform is very efficient, and at large scale can even edge out BigQuery on performance. This larger scale though means you are renting a lot of processors, which can be expensive. However, Snowflake is the better platform if you are ever going to make any of this data and its queries public facing, such as in an application. Unlike BigQuery, Snowflake can get sub-second response times which you need for using data on the frontend. This means if you need to process data very quickly you have the option to add enough processors to process and respond to any systems built on top of Snowflake.
  • BigQuery is a serverless platform, but with some limits. As we have covered, BigQuery is serverless. This means no matter how large the query, we don’t ever have to think about processors, but remember you do pay per TB processed. For low-mid sized queries you can expect very high performance (1-3 second response time). However, if the queries get very large you can wait as long as 30 seconds on a query response. This kind of wait is fine if you’re just doing some analysis, but is unacceptable if we are providing data to an analysis app that needs quicker user response time. The only limitation of BigQuery’s serverless environment is how many queries can be running at the same time. By default, only 50 concurrent queries can be running, but this number can be raised.

Unique Platform Value Props

  • BigQuery has built-in modeling capabilities. The value of this is massive. In a normal modeling pipeline you need to build data, train a model, store the model in a format to be used later, and then apply the model over time. This whole pipeline has code and systems, and all that can be skipped by just training with one line in BigQuery.
  • BigQuery tables can be visualized in Data Studio with one-button click.
  • BigQuery has a suite of built-in ETL tools for pulling data from GMP platforms and other uniform systems, which are free to use.
  • BigQuery now is able to sit on top of other cloud platform data systems such as Amazon’s S3.
  • Snowflake is on most major cloud providers.
  • Snowpark is a platform feature that allows data scientists and engineers to manage their code in Snowflake for development against the datasets.

The Positives

Okay, now that we are familiar with the platforms let’s talk about what these two platforms are good at. While it’s possible to use both as a data warehouse and a data lake, I think the two are more uniquely fit for one of each.

We have outlined that BigQuery is very good at storing lots of data very cheaply without needing to manage overflow because we can’t run out of space. This sounds like a perfect place to dump our data into a data lake. Since BigQuery has some built-in ETL and no platform cost, the barrier to entry to begin sending all of our data to BigQuery is quite low. On top of this, we know BigQuery cost per query, so if we were to take our data lake and process it into a data warehouse we would only be paying for those queries one time.

On the other hand, Snowflake is a bit more rigid in definition, is more expensive at scale, and performs better with query response time. This makes perfect sense for a data warehouse where we will be actively modeling, visualizing, and interacting with a data farm more often. Rather than throwing all of our data into this structure which would be taxing on our data engineering team, we can use BigQuery as the lake and process data into Snowflake to serve as our warehouse. This keeps volumes low where volume is expensive, and processing low where processing is expensive.

Conclusion

If you are using either BigQuery or Snowflake as both a data lake and warehouse right now you are not wrong. Either of these tools is perfectly suitable for both jobs and there are benefits and drawbacks for all scenarios. However, using the two in tandem often leads to the lowest cost and highest benefits. Often I hear this as BigQuery vs. Snowflake, but it’s important in tech to understand what the software is doing, where it’s used, and how it can benefit the current stack. I don’t see these platforms as competing as much as I see them as compliments of one and other. When approaching cloud platforms, it should always be under the lens of, ‘how do I get the most functionality, for the least cost, and the least amount of support needed?’ By being clever about our cloud choices we can make sure we are setting the business up for long-term success.

Do you have cloud data warehouse needs?

Our team of data analyst experts is here to help whenever you need us!

Author

  • Tyler Blatt

    Tyler Blatt is the Manager of Emerging Solution Strategy at InfoTrust. Using his 10 years of experience in martech, he assists clients in constructing reliable frameworks to navigate the ever-changing technical environment resulting from privacy and technology advancements in the fields of marketing and analytics. In his free time, he can be found wrestling with his new baby girl, streaming on Twitch, or trying to improve his golf game.

    View all posts
Last Updated: April 24, 2023

Get Your Assessment

Thank you! We will be in touch with your results soon.
{{ field.placeholder }}
{{ option.name }}

Talk To Us

Talk To Us

Receive Book Updates

Fill out this form to receive email announcements about Crawl, Walk, Run: Advancing Analytics Maturity with Google Marketing Platform. This includes pre-sale dates, official publishing dates, and more.

Search InfoTrust

Leave Us A Review

Leave a review and let us know how we’re doing. Only actual clients, please.