There are 3 factors driving the sum of individual line item reach not matching aggregate reach:
- Reach goes through a deduplication process when aggregated
- A user downloads an episode containing Ad 1 and Ad 2 or a user downloads multiple episodes - Ad 1 in Episode 1 and Ad 2 in Episode 2.
- Each of those ads "reached" one household and it counts as one reach per line item in the table view.
- In the aggregated reach number for all line items in a campaign, Ad 1 and Ad 2 both "reached" the same household so this counts as only 1 household reached for a campaign and therefore, it doesn't sum the reach from both ads.
- Summary
- The table view showing individual line items in a campaign displays distinct households reached per ad/line item
- Aggregated reach is the total number of distinct households reached across all ads in a campaign
- There are different data sources for aggregated vs. line item breakdown
- Aggregated data is pulled from Big Query and Line Item breakdown is pulled/calculated from a combination of Spanner, BQ, Postgres
- Big Query is populated once per day and Spanner is populated hourly causing some lag in data available to Big Query
- Reach for a line item is an approximate calculation using impressions and frequency because we don't calculate reach on an hourly basis like we do for impressions.
- Total Impressions and Total Reach (over the campaign) for a line item are used to generate a frequency value
- Frequency = Impressions / Reach
- This frequency value is used to generate an estimated reach value for a time interval using an impressions count over that interval divided by frequency
- Reach = Impressions / Frequency
- The frequency for a particular line item will be different than the aggregate frequency of all line items across a campaign
- Summary
- Therefore, the calculation will not produce consistent values across different levels of granularity (overall campaign vs 1 line item). The sum of line item reach could be less than or greater than overall campaign reach depending on how far the time frame impressions vary from the average.
- Total Impressions and Total Reach (over the campaign) for a line item are used to generate a frequency value