Data Engineering for Marketers at (un)Common Logic

Posted on 2026-04-10 09:13:50

Marketing teams do not lack data. They lack shape, timing, and trust. The most profitable campaigns we have ever managed were not the ones with the most flashy creatives or the largest budgets. They were the ones where the data showed up clean, on time, and tied back to the customer and the dollar. That is the essence of data engineering for marketers at (un)Common Logic. It is not a tool stack flex or a one time report build. It is an operating discipline that turns messy platform exhaust into decisions you can take at 9 a.m. And measure by 3 p.m.

What marketers actually need from data

Most teams ask for dashboards. What they need are decisions. Decisions live on timelines that vary widely. A brand team wants weekly pacing against a quarterly plan. A search specialist wants to know by lunchtime if a keyword is cannibalizing margin. A CFO wants to see the shape of payback over six months. The data has to be engineered to fit those timelines, otherwise everyone is working uphill.

At (un)Common Logic, we plan the data around the questions, not the other way around. Here are a few we anchor to:

Which audiences and channels drive profitable incremental conversions, not just attributed ones? Where do we have diminishing returns right now, within the day and across the quarter? What steps in the funnel are failing, and are those failures due to media, site experience, or sales follow up? How confident are we in the data feeding these answers, and what happens to the answer if the data is off by 5 percent?

We find that once answers to these questions are embedded in a stable data workflow, everything else begins to self correct. Budgets move faster. Testing gains statistical power. Creative gets sharper.

Agency reality, warts and all

Working across dozens of clients, you see the same patterns. Pixels get turned off by a tag manager publish. UTM parameters are inconsistently cased, which fractures campaigns into dozens of fake variants. A CRM lead status changes names mid quarter after a sales ops cleanup, and suddenly lead to sale conversion rates look like they fell off a cliff. None of these are unique, and yet they can pierce a P&L.

Data engineering for marketing inside an agency like (un)Common Logic has to absorb these shocks. It has to assume systems will change names and IDs without warning, that cookies will expire faster than you planned, and that the most important dataset is usually the one no one prioritized for access. So we design for change. We prefer schemas over free form fields, versioned rules over ad hoc fixes, and a healthy suspicion of any number that looks too clean on the first pass.

From ad platform clicks to CFO truth

Everyone loves to diagram a pipeline. The reality is always messier, but the spine is consistent:

Collection. We use managed connectors where it helps with speed and maintenance, and we write custom pulls where platforms are fragile or fast changing. If a client relies on a niche call tracking system, we are not waiting for a connector roadmap to catch up. We will build a small, testable ingestion job that pulls what matters and nothing more. Storage. Centralized warehouses win for long term cost and governance. BigQuery and Snowflake are our typical landing zones. We size them based on query patterns, and we encourage clients to prune raw ingestion after 12 to 18 months unless compliance dictates otherwise. Modeling. This is the heart. We reshape raw log tables into human scale models with business definitions, not platform definitions. For example, “qualified lead” becomes a modeled state that flows consistently from CRM to paid media, with a lock tight definition controlled in a single transformation. Activation. Data is not done at the dashboard. Winning teams push it back into platforms. Propensity scores, product availability, or audience suppressions belong inside the ad platforms, the email service provider, and the call center cadence tooling.

The best test that a model works is whether the media buyer can act on it in the same hour they read it. That requires latency targets that are realistic and tailored. For search bidding and rapid creative testing, we aim for end to end latency under 15 minutes. For daily pacing and LTV recalculations, overnight is more than enough. For executive views, weekly rollups reduce noise and make the story clearer.

Identity is a strategy choice, not a toggle

Identity resolution drives attribution quality and the ability to suppress waste. But it also drives risk if you get it wrong. We separate identity into three layers.

First, consented customer identity inside owned systems. CRM, commerce, and support tools sit here. This is where email addresses and phone numbers live. The resolution work is deterministic, based on keys you control, and you can hold it to a high standard.

Second, site and app identity. You will work with cookies, device IDs, and server side tracking. This is probabilistic more often than not. We focus on event integrity, consistent event names, and a small set of durable IDs that survive platform shifts. Server side tagging can help, but only if it respects consent.

Third, media identity. Google, Meta, and retail media networks all operate their own graphs. Your job is not to knit them into a mythical single person view. Your job is to connect their identifiers back to your modeled funnel states, so that you can optimize spend across them. That means mapping metadata like campaign, ad group, and creative to a canonical taxonomy, then keeping those mappings current as people change naming conventions in the platforms.

A common mistake is to chase universal identity and stall the program. We aim for useful identity. If we can link 60 to 70 percent of on site events to a durable session or user key and 90 percent of back office revenue to a customer key, we can make high quality, budget moving decisions.

Attribution, incrementality, and the temptation to overfit

Attribution models are like diets. The one you follow consistently is better than the perfect one you abandon. We run three tracks in parallel.

Track one: platform attribution for intra platform optimization. Let Google Ads use its view of touchpoints to set bids within Google. https://dallasgjmr416.wpsuo.com/omnichannel-orchestration-by-un-common-logic This drives day to day tactics. We monitor it but rarely fight it for small moves.

Track two: modeled attribution at the warehouse level. Here we create channel and campaign level credit using a few canonical options, with definitions that survive quarter to quarter. For many clients, a time decay variant plus position based credit, evaluated side by side, gives enough signal to choose between investments. The key is not which algorithm you pick, rather that you fix the business rules around things like direct traffic and brand search, then apply them consistently.

Track three: incrementality tests. Holdouts, geo splits, or auction time experiments answer the question attribution cannot. Did this spend create net new conversions or just rearrange credit? We build infrastructure that makes these tests easy to run and measure. Labels in the platforms, prebuilt variance calculators, and clean ways to tag audiences or geos reduce friction. We do not run these every week, but we run them often enough to re anchor the model when the market shifts.

An edge case worth noting is products with long sales cycles. If time to revenue is 90 days, daily budget decisions can drift. We mitigate with leading indicators that correlate with future revenue, then back test frequently. Conversion to qualified opportunity might show a 0.7 correlation with revenue in the first three months. That is enough to move spend while we wait for the slower signal to confirm.

Modeling that marketers can read without a decoder ring

We build simple, predictable layers. The jargon is less important than the idea that analysts and buyers know where to find things, and that measures do not change under their feet. A typical core includes:

A calendar table with fiscal periods, holidays, and campaign phases. You would be surprised how often a Black Friday sale breaks a report because the calendar was naive. A channel taxonomy with business friendly names and strict mapping rules. If “Paid Social” becomes “Meta” in a platform update, our taxonomy catches and maps it before it pollutes the model. A funnel table that starts at the first touch we can trust and ends at revenue recognized, with states like site visit, engaged session, lead, opportunity, customer, and repeat purchase. Each state has a timestamp, a source, and a confidence score if the upstream data is probabilistic. A spend and impression fact table with harmonized currency, time zones, and platform metadata. Here we standardize cost to a single currency, map time to the brand’s operating time zone, and pin any audience or creative tags that will shape optimization later.

Marketers get nervous when schemas stretch to dozens of wide tables with cryptic names. We prefer a small number of opinionated models with clear documentation and lineage. If a buyer can open a single spend table and a single funnel table, then answer 80 percent of their weekly questions, we have done the job.

Quality, observability, and the cost of bad joins

The fastest way to lose credibility with a CFO is to present numbers that bounce. Observability is not an add on, it is part of the build. We track four categories.

Freshness. Data has a target arrival time. If Google Ads has not landed by 8 a.m., the morning pacing report auto flags it. We do not rely on Slack alarms alone. Dashboards display data currency directly on the page, which prevents stale decisions.

Completeness. Rows and columns should show expected ranges. If a platform reports spend every day, a zero on a weekday is suspicious. We keep expected row counts and null tolerances per source, and we flag when they slip.

Validity. Business rules enforce sanity. Cost must be non negative. Clicks cannot exceed impressions. Dates do not live in the future. These are simple tests that catch complex failures.

Consistency. Measures across tables should reconcile. Channel level spend should equal the sum of campaign level spend within a small tolerance. Revenue in the warehouse should match finance rollups at month end, accounting for timing differences.

The cost of bad joins is not academic. We saw a client’s cost per qualified lead spike by 40 percent after a CRM admin introduced new lead sources that overlapped with old ones. The join keys still worked, but the funnel state logic now double counted and mismatched. The fix was not heroic. We introduced a controlled mapping table for lead sources, versioned it in the model, and set a test that fails the build if a new source appears without a mapping entry. The spike disappeared, and the root cause was documented for the next admin.

Orchestration and SLAs that match campaign tempo

Data pipelines should be predictable, but marketing teams need elasticity. Product launches and seasonal surges intensify data needs and shorten patience. We tune orchestration to the campaign.

For daily, routine ingestion we use managed schedulers so the team spends time on modeling, not on cron archaeology. For heavier workflows, like identity stitching or MMM refreshes, we run orchestrators that can parallelize and retry without babysitting. The SLA is as important as the result. If a model refresh fails at 2 a.m., the on call path is clear, and a degraded but useful subset of the dashboard still loads by 8 a.m. The media buyer does not need the perfect view to pause a wasteful ad set. They need a reliable view to avoid waiting another 24 hours.

We also align warehouse compute to the calendar. During major promotions, we temporarily raise slots or warehouses to handle peak modeling and reporting without latency jitters, then scale back after the window closes. Clients appreciate a line item that goes up during money making weeks and down after, rather than a permanently overprovisioned bill.

Privacy, consent, and the pragmatics of governance

Compliance is not a blocker when it is built in early. We segment data based on sensitivity, minimize the spread of identifiers, and maintain clear dictionaries for anything that touches PII. Consent states follow the event, not just the session. If a user revokes consent, suppression propagates. We store hashed identifiers where feasible, with salting that aligns to the activation need. Legal teams tend to respond well when they see that structure. Marketers gain speed because fewer approvals are required on each new test.

A practical note on regionality. When campaigns expand to the EU or Canada, the simplest path is to keep collection, storage, and processing for those users region scoped, then move only the aggregates across regions. Trying to retrofit global tables later usually costs more time and introduces more risk.

Tooling that respects trade offs

Marketers do not need a monolithic stack. They need tools that do their job and play well together. At (un)Common Logic, we lean on a few patterns.

Managed connectors are a gift for speed. We use them when they are stable and priced fairly against expected volume. If a source is noisy or the client is small, the cost may not pencil out. A simple scripted pull with alerts can be the right choice for a period.

Transformations belong in code, version controlled, and testable. SQL with templating through tools like dbt keeps logic exposed and easy to review. We write tests for schema, primary keys, and accepted values. Business logic lives in models, not in dashboard filters where it can fork silently.

Reverse ETL is worth it when activation moves the needle. Shipping a churn score into paid social audiences or suppressing recent buyers from prospecting campaigns often saves more than the tooling costs in the first month. We watch sync failure rates carefully. A 2 percent failure to update an audience can wreck a carefully designed incrementality test.

Warehouses come down to usage patterns. BigQuery is forgiving for spiky, ad hoc analysis and large scans. Snowflake shines when you need consistent performance and clear isolation across workloads. Both play well with columnar storage and have native features to manage cost. The key is to structure tables for the most common queries, partition sensibly, and document the boundaries so power users do not trip into the expensive path.

Budgets, value, and proof that data work pays for itself

The CFO does not care how pretty the schema is. They care that improved decisions outpace the cost of the data team. We measure return in three ways.

Waste reduced. Duplicate reach and audience overlap shrink when identity and activation are sound. For a retail client spending mid seven figures monthly, suppressing recent buyers from prospecting saved 6 to 8 percent of spend with no drop in net new customer volume. The change took two weeks to build and paid back instantly.

Revenue gained. Better allocation toward profitable segments or geographies moves topline. In B2B, joining call transcription keywords to CRM outcomes let us pause lead gen keywords that sounded relevant but rarely converted to opportunities. The cost per qualified opportunity improved by 18 percent over six weeks, and sales accepted leads went up because quality increased.

Time returned. Analysts and buyers spend less time reconciling numbers and more time testing. When we centralized taxonomy management for a portfolio of thirteen brands, report build time dropped from hours to minutes for weekly meetings. Over a quarter, that reclaimed time funds more creative tests and geo splits, which often uncover 10 to 20 percent efficiency pockets.

Costs are transparent. We forecast warehouse, connectors, and orchestration based on expected data volume and query patterns, then show the client when scale triggers a plan change. When volume surges during a campaign, the uptick is expected, not a surprise.

Two brief stories from the field

A subscription ecommerce brand came to us with stalled growth. Paid search was profitable on paper but cash flow felt tight. Their CRM tracked cancellations manually, so revenue in platforms did not reflect churn until months later. We built a cancel event stream from support tickets and payment processor events into the warehouse, then modeled lifetime value by cohort with a two week refresh. Within a month, we found that one non brand keyword cluster drove signups with a 30 percent higher 90 day churn rate. Pivoting budget from that cluster to a creative focused paid social audience cut net churn and raised 90 day contribution margin by roughly 12 percent.

A B2B SaaS firm with a nine month sales cycle relied on leads and MQLs to steer media. Sales complained about quality, marketing claimed rising volume, and finance could not reconcile either side. We created a disciplined funnel table with a single definition of qualified opportunity and stitched in sales stage transitions. We migrated weekly reporting to show opportunity creation and movement, not just leads. Along the way, we discovered that a small change in a marketing automation rule had quietly cut email nurtures for a third of leads. Fixing that rule improved opportunity creation from email nurtures by 40 percent over two months. More importantly, the team stopped arguing about numbers and started debating which campaigns were raising early stage opportunity velocity. That changed the tone of budget meetings.

How we start an engagement without boiling the ocean

The first 30 to 60 days are about speed to trust. We do not try to solve every future use case. We pick the needles that move budgets and morale right away.

Clarify the business questions that drive spend shifts, then tie each to a data source and a freshness target. Stand up a minimal warehouse with raw spends, a clean channel taxonomy, and a funnel table that reaches at least to qualified lead or first purchase. Add observability that blocks broken updates from flowing into dashboards, even if that means a partial view for a day. Document rules in the model itself. If brand search is excluded from prospecting, the code says so where the measure is created. Build one activation loop that proves value, such as a simple audience suppression or a geographic reallocation based on modeled incrementality.

Once this foundation is in place, the team can add sophistication without destabilizing the base. MMM, propensity scoring, and creative level analysis layer on cleanly when the spine is strong.

What to watch as the landscape shifts

Privacy regulation will keep evolving, and platforms will keep closing their gardens. Two choices help future proof the work. First, invest in event integrity and consent. Precise, well named events survive tool changes. Second, keep business definitions in your models, not embedded in vendor workflows. When you control the logic that defines a qualified lead or a retained customer, you can swap tools without changing the meaning of your metrics.

Measurement mix will balance. Attribution will never be perfect, but well run holdouts and MMM that is refreshed with disciplined priors will anchor spend decisions. Expect MMM cycles that are lighter weight and closer to the day to day, not once a year monoliths.

Creative data will matter more. Text and image variants, hooks, and offers need structured capture if you want to learn across campaigns. We attach creative metadata at ingest, so that a question like “Which lead offer lifted paid social conversion rate for high LTV cohorts last quarter?” takes minutes, not a day of spelunking.

Why (un)Common Logic does it this way

We work at the intersection of media and measurement, so we feel the pain of broken data immediately. That has taught us a few hard earned habits. We favor small, reliable pieces over sprawling architectures. We stay close to the buyers and the questions that move spend. We model definitions so they are clear and durable, even when platforms change names or sunset features. We build tests and observability into the pipeline, so the data that reaches decision makers is sturdy.

Most of all, we believe the point of data engineering for marketers is not to be fancy. It is to let smart people move money with confidence. When a search lead can pause a losing ad set before lunch because the numbers updated cleanly at 9:15, when a strategist can shift budget toward a cohort that will still be a customer in six months, when a CFO sees a clear link from spend to contribution margin, the system is doing its job.

That is the bar we hold ourselves to at (un)Common Logic, and it is the standard that turns fragmented platform data into a competitive advantage.