(Source: Data Strategy)

If you could use every piece of data you had available to run customer analytics, would you do it? Or would aggregating variables avoid the risk of drowning in data? David Reed finds out
When you decide to cook a curry, do you buy all the individual spices, roast, grind and blend them together? Or do you opt for a jar of proprietary paste that has already done this for you?
Much depends on how hungry you are, how much time is available and your level of cooking skill, of course. When it comes to working with data to get some customer insight, pretty much the same set of options are available.
You can keep data at a fully granular level, churning every record and field through a query string to ensure you get a totally accurate result. That does require a lot of resource, both in terms of data storage and access, processing time and especially work to condition the raw data.
Alternatively, you can create an analytical data set in which raw data has been pre-configured. This might involve aggregation of variables into averages, for example, or derived variables based on the range of values found in a field, such as an indicator of frequency.
To the data warehousing and business intelligence community, the idea of creating a "data paste" like this is often anathema. But is it not a quicker, cheaper and easier way to start interrogating large data sets and, as a result, more in line with current business needs and resources?
"Derived fields and aggregated values are not a replacement for the true information," says Richard Higginbotham, marketing manager at CDMS. "However, for the needs of marketing campaign management, granular data is too much."
When planning a campaign, what marketers need is an indication of a customer's value, rather than a string of data on their every purchase, for example. Flags showing which products or categories a customer has bought in will be more useful for targeting or exclusion than having to process every product or transaction code in the data.
He notes that the purpose of creating a single customer view is precisely to support the view needed by its business sponsor, rather than to replicate the enterprise data warehouse. Equally, the SCV does need to offer the flexibility to change, update or delete these aggregations.
"You may want to change the definition of a category in the future. For example, if you had mobile phones as a single product some years ago, you might want to filter it now by capabilities like camera, email, etc. If you have still got the granular data, you can go back to it and redesign your view," says Higginbotham.
Aggregations will almost certainly have to be updated as market conditions or product definitions change. A key aspect of managing this approach is also to ensure that old clusters get deleted - running a query string that is no longer required just slows down analytical processing.
So does running an aggregation at a higher frequency than is necessary. If the grouping of data is only needed for the end of month financial report, it does not need to be included in every overnight analysis batch.
To create the flags and averages in a single customer view built for marketing purposes the organisation will still need to retain the raw data. For this reason, data warehousing practitoners do not see the two views as antithetical.
"Aggregation isn't a forbidden term here," notes Umpom Tantipech, customer management programme director, EMEA, at Teradata. "Our view is that clients should be able to load data once and use it many times to achieve the best total cost of ownership.