Data cleaning is the essential process of identifying and correcting errors and inconsistencies in your datasets prior to analysis.
Have you ever wondered how big companies keep track of all their data and then actually use it to make decisions? Imagine owning thousands of books in a library without any shelves or labels, and you needed to know how many stories had a protagonist named "Matthew"—that would be very time consuming to answer, right? Large companies face a similar challenge with their data. That's where a data warehouse and analytical software come in! You can read more about how data improves outcomes for growing businesses here. But before you get started, you need to know: What is the true cost of building a modern data infrastructure in-house?
The Parts of a Modern Data Infrastructure
A complete data infrastructure will detail the complete journey of data from its raw state to the procured insight that is used to eventually drive business impact. This includes the transformation pipeline that moves data from its source to the target data warehouse, the actual data warehouse that houses all the data and enables users to query data to perform analyses and finally the visualization software that automates recurring reports via dashboards.
What Makes Up the Cost of a Data Warehouse?
Each of four components outlined above usually live in different software. Additionally, they require a team of data architects and engineers to tie in all the pieces together and provide ongoing maintenance and support. The cost for each will depend significantly on which software you choose and how much data is involved. A typical all-in-cost for a mid-market company ranges from $25k-$500k per year for the software plus resources and an additional $400k-$800k for salary and benefits of a professional data team to support and maintain the solution.
That cost may seem high at first, and it is. Let's break it down to get a more true cost for a company like yours.
1) Transformation Pipeline:
Transforming raw data from its original source (i.e. accounting software) to cleaned data in the target data warehouse happens through a transformation pipeline, often referred to as ETL, or Extract, Transform, Load.
Data sources can include business software (i.e. Quickbooks or other specialized accounting software), databases (i.e. PostgreSQL, MySQL or other database), Google Sheets, CSV uploads and even unstructured text data (PDFs, word documents, Notion, etc). See here for an ever growing list of data sources that Go Fig supports.
Go Fig offers a No-Code Workflow builder with plug-n-play functions to clean and transform data into a format that is useful and suitable for analysis and reporting. Datasets that are not clean can potentially be dangerous, resulting in errors that can lead to misguided decision-making that creates risk for the business. According to Gartner, such bad data cost companies an average of $15 million in lost revenue in 2017-- significantly more than costs of building a robust data infrastructure.
All-in Cost: between $5k to over $50k per year depending on the plan you choose and the volume of jobs. Many ETL solutions offer a variable pricing model that starts off with a free or low introductory rate, but scales up rapidly as you begin to use it more. This keeps you locked in on elevated prices that can jump unpredictably on any given month with particularly heavy usage.
To keep costs lower and predictable month to month, you may want to consider an ETL solution with a fixed pricing model that keeps rates consistent, and only increases when you choose a higher tiered plan.
Go Fig, for example, charges a fixed monthly or annual rate for unlimited workload for any of the tiered plans offered and only increases if you exceed the storage limit for each plan. This allows you to know exactly how much the plan will cost and you will have months to plan for a price hike, if at all.
2) Data Warehouse:
A data warehouse is like a giant, super-organized library for a company's information. It stores post-transformed data from lots of places, like websites, sales records, and customer lists, and stores it all in one location. This is also the place where people in the company go to access their data for creating reports and doing analysis on company performance.
Companies can store data on their own servers (called on-premises) or using an external cloud-based solution. Deciding between on-premise vs cloud solution is wholly dependent on the company. On-premise requires physical storage space, upfront investment in time and resources, and ongoing maintenance, but they also give the company full control of their data.
All-in Cost:
- Cloud Storage: $400 per terabyte per year. The average midmarket company has about 40-80 terabytes of data storage, which would total $16k-$32k per year.
- On-Premises Storage: Can cost up to $30k for initial purchase and installation and up to $10k per year afterwards for maintenance.
Starting out, a company will likely find it more cost efficient with a cloud-based data warehouse. Once a company's data infrastructure reaches a peak stage of maturity, it could migrate its infrastructure to its own servers to obtain more cost savings.
All data warehousing solutions offer a platform for analysts to write SQL queries to pull the data they need to perform specific analyses, such as explaining the main drivers of recent sales trends or investigating an opportunity to improve marketing outcomes. This operation has typically been reserved for professional data analysts with a strong understanding of relational databases and technical experience writing code to get accurate data. With the advent of LLMs and AI, it has become possible for others to also query accurate data to answer such questions.
Go Fig is an example of a company that is leveraging LLMs like ChatGPT to equip C-Suite leaders and frontline employees alike with this analytics capability. Our proprietary Harvest-1 foundational model is built to understand the intent of each individual user and translates requests into a simple and understandable No-Code Workflow and Fig that can be validated or modified further.
3) Dashboard and Data Visualization Software
Completing the chain of the data lifecycle are dashboards, which are elegant visual representations of the data prepared in a consumable format for humans to understand and digest insights from data so they can make informed decisions that drive the business forward. Data visualization software that prepare these dashboards are often referred to as Business Intelligence, or BI tools.
Without a dashboarding software, data that lives in a data warehouse can be pretty meaningless, so this software is a critical piece to the puzzle in order to get value from your investment in data. Analysis that depends on a human to manually query data and update a static excel spreadsheet, as valuable as that may be, is slow and time-consuming. With dashboards, any type of report or data manipulation you would perform in Excel could be available in a dashboard, updated every time the underlying data warehouse is updated. Imagine saving 2 hours every Monday from your intern who manually updates your weekly sales report, and having that same report updated every day of the week.
Extending beyond visualization, more advanced BI tools offer features to proactively monitor data and send alerts proactively. For example, Go Fig can send an alert when sales by 12pm on a particular day are below the acceptable threshold, signaling that there could be a severe issue with the sales team that requires your attention immediately. Creating an alert is simple, simply determine thresholds for a pre-defined metric, and we'll check it every time new data is pulled in!
All-in Cost: the average business intelligence solution costs $3k per year, but can cost upwards of $10k per year for more advanced solutions
4) Data Team
As highlighted above, the cost of bad data is higher than the cost to build a robust data infrastructure. Building a team of qualified and experienced data and software professionals is critical to accomplish this goal. There are four main roles that are typically required to set up and maintain such a data infrastructure:
- Information Systems Manager oversees a company's technology needs and manages the data team
- Backend Developer constructs and maintains the data pipelines and backend systems
- Database Architect designs the structure of datasets and procures ETLs to make clean data accessible to the team
- Data Analyst ties in business context with the stored data to identify insights and propose recommendations to improve business outcomes
All-in Cost: the average salary for each of these roles exceeds $100k per year. Midmarket companies will require at least 1 of each role, for a minimum $400k per year but can likely reach up to $800k.
That being said, some of the more advanced software solutions can simplify the setup and maintenance of data infrastructure. Go Fig, for example, manages data storage for you, offers a simple, drag-and-drop ETL solution with a user-friendly visual interface, AI-powered analytics and dashboarding, as well as managed service add-ons. Choosing Go Fig as your all-in-one solution could potentially save you a lot on both software and staffing.
Examples of Companies and Their Data Warehouse Costs
Let's look at some pretend companies to see how much a data warehouse might cost them.
Example 1: Road Runner USA
- Industry: Custom Tire Sales
- Number of Employees: 50
- Total Revenue: $5 million per year
- Amount of Data Storage: 2.5 terabytes
Costs:
- ETL Software: A fixed pricing plan with a moderate volume of jobs at $9k per year
- Data Warehouse: 2.5 TB x $400 per TB = $1k per year
- Visualization Software: Middle-of-the-road solution at $3k per year
Software Cost: $9k (ETL) + $1k (data warehouse) + $3k (Visualization) = $13k per year
Total Cost: $13k per year
Estimated cost using Go Fig's Self-Service Premium plan: $4,500 per year (savings of $8,500!)
Example 2: Accounting Done Right
- Industry: Professional B2B Accounting Firm
- Number of Employees: 200
- Total Revenue: $50 million per year
- Amount of Data Storage: 80 terabytes
Costs:
- ETL Software: A variable pricing plan with high volume of job at $25k per year
- Data Warehouse: 50 TB x $400 per TB = $20k per year
- Visualization Software: Advanced solution at $15k per year
Software Cost: $25k (ETL) + $20k (data warehouse) + $15k (Visualization) = $60k per year
Staffing Cost: four full-time data professional at $500k per year
Total Cost: $560k per year
Estimated cost using Go Fig's Enterprise plan with fractional data services: $285k per year (savings of $275k!)
Why Do Companies Invest in Data Warehouses?
As we can see above, the all-in cost of building a robust data solution can be steep, starting with $10k per year for just storage and software and easily exceeding $500k for larger companies who need to hire out a professional data team for more complex solutions. Companies choose to invest in data solutions, however, because the return is significantly higher than the cost:
- Make Better Decisions: By organizing data, companies can see what's selling well or what needs improvement.
- Save Time: Finding information quickly means employees can work more efficiently.
- Stay Competitive: Companies that understand their data can better set long-term strategies to be leaders in their industry.
- Lower Risk: Understanding what's going on in the business allows leaders to quickly respond to gaps that are costing short-term losses and creating long-term risk.
Why You Should Choose an All-in-One Solution
Data infrastructure does not need to be as complex as it once was. Go Fig allows companies to bring their data warehouse, ETL and visualization solutions all in one place. Go Fig is a powerful platform that is simple to use, yet offers customizations you cannot find in other platforms. It is specific to your company and you, so you don't need to be overwhelmed by all the clutter.
It turns out that making it extremely simple for C-Suite leaders to centralize and access their own data, you can significantly reduce the cost of building and maintaining your own data infrastructure. Go figure! Schedule a demo today to see how Go Fig can work for your unique business needs and objectives.