The modern data stack has matured rapidly over the last five years. What once required a team of six engineers and a year of integration work can now be assembled in weeks using cloud-native, off-the-shelf components. But more options also means more ways to overcomplicate things — and we routinely audit data stacks where 70% of the tooling is unused, unmaintained, or actively causing problems.
This guide walks through how we approach data stack design in 2025: which layers actually matter, when to add complexity, and the mistakes we see most often when companies build in-house.
The four layers that actually matter
Strip away the hype and almost every modern data stack reduces to the same four layers: ingestion, storage, transformation, and presentation. Pick one solid tool per layer before adding anything exotic, and you'll be ahead of 80% of the market.
1. Ingestion
This is how data gets from source systems into your warehouse. For most SaaS sources (Stripe, HubSpot, Salesforce, Google Ads), managed connectors from Fivetran or Airbyte will save you months. For niche sources or when cost becomes an issue at scale, lightweight Python jobs orchestrated by Airflow or Dagster work well.
2. Storage
Snowflake, BigQuery, and Redshift dominate. Choose primarily based on your existing cloud (BigQuery on GCP, Redshift on AWS, Snowflake anywhere), then secondarily based on workload patterns. Avoid lakehouses unless you have a clear reason — the operational complexity rarely pays off for sub-petabyte workloads.
3. Transformation
dbt is the de facto standard. It brings software engineering practices — version control, testing, documentation, modularity — to SQL. There is no good reason to build raw SQL pipelines without it in 2026.
4. Presentation
Looker, Power BI, and Tableau cover most enterprise needs. For smaller teams, Metabase and Lightdash offer 80% of the value at a fraction of the cost. Pick the one your business users will actually adopt — the most powerful BI tool is the one people log into.
When to add complexity
Streaming, reverse ETL, semantic layers, data catalogs, observability platforms — all valuable, none essential on day one. Add them when a concrete business problem demands them, not because a vendor told you to or because you saw a conference talk.
- Add streaming when batch latency genuinely costs money (fraud detection, real-time personalization).
- Add reverse ETL when you need to push enriched data back to operational tools (CRM, ad platforms).
- Add a semantic layer when you have multiple BI tools or your metric definitions keep drifting.
- Add a catalog when more than 20 people regularly query the warehouse.
"The best data stack is the smallest one that answers your top 10 business questions reliably."
Common pitfalls we see in audits
Three mistakes recur across nearly every stack we audit. First, choosing a warehouse based on vendor marketing rather than actual workload patterns. Second, modeling data before agreeing on KPIs — which guarantees rework once executives change their mind. Third, skipping documentation until it's too late, leaving the next analyst to reverse-engineer years of business logic.
Each of these mistakes costs months of rework and erodes business trust in the data team. Avoiding them is mostly a discipline problem, not a technology problem.
Where to start in 2026
If you're starting fresh, our default recommendation is: Fivetran or Airbyte for ingestion, Snowflake or BigQuery for storage, dbt for transformation, and Metabase or Looker for presentation. This stack handles companies from seed-stage through Series C without breaking a sweat, and the talent pool to maintain it is large and growing.
Keep it simple, document as you go, and resist the urge to add tools until you've measurably outgrown the basics.
Want this applied to your business?
Book a free 30-minute consultation and we'll discuss how these ideas map to your data and goals.
Book a Free Consultation