Fifty percent (50%) of retailers reported that their data volume doubles every five years, and grows at an average of 30% each year, according to Edgell Knowledge Network’s report, “State of the Industry Report 2014: 2nd Annual Stores Benchmark – Future of the Store: Re-Imagining Stores as Hubs of Omni-Channel Customer Engagement.” While existing structured and new unstructured data sources swirl inside of retailers’ data warehouses, savvy companies are learning how to use emerging robust analytical tools to combine both versions of information, gain a holistic view of their customers, and make more relevant business decisions.

     As the retail industry finds it’s footing in an omni-channel world, operations are becoming more digitally-focused, especially from a customer-facing point-of-view. Between traditional point-of-service solutions to newer customer touch points, the amount of incoming data is hitting all-time high levels. While increasing data volume is not a new phenomenon, the type of data entering enterprises is.

     Traditional structured data is alive and well, and regularly filtering into retailer repositories. It is typically created from “fixed records” often residing in row-column files, such as transaction data originating in POS systems; and financials information filtering through ERP (enterprise resource planning) systems, and similar solutions. It is often created from numeric, alphabetic, traceable customer-specific, even currency-related information. 

     However, the adoption of digital touch points has upped the ante by creating a new information source known as unstructured data. Generated from non-defined, text-based sources, unstructured data is typically more text-heavy than its structured counterparts. While it can contain dates, numbers and other structured data information, unstructured data is often irregular and ambiguous, making it difficult to store in fielded formats. 

     While unstructured data is most commonly comprised of digital information driven by Web sites, social media and electronic customer touch points, including kiosks and mobile devices, it can also evolve from basic sources, as well. For example, Excel files may be structured, but when entering enterprises as PDF files from multiple, disparate wholesale partners, and then manually input into retailer systems, this information becomes highly unstructured. Worse, this manipulated information is also subject to errors, making it hard to authenticate, identify or format into a source that is considered formally structured information. 

It takes work to tame and process this information, and unsurprisingly, remains a constant struggle.  In fact, it is not uncommon for companies to collect this disparate data, and store it in a "data lake".  Unlike centralized purposed databases, data lakes are designed for analyzing disparate sources of data in their native formats. Unfortunately, data lakes lack consistency and governed metadata, and in the end, lakes end up being a collection of information silos.

     However, to ignore this information is forcing companies to operate half-blindly. Understanding this unharnessed information is still critical to making complete business decisions, innovative companies are finding ways to merge structured and unstructured information to gain a full view of their customer base. Enter the value of Hadoop, an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. By applying this statistically-based method to these often disparate, voluminous data sources, organizations gain insight into actionable information, even if across structured and unstructured files. 

     Macy’s knows this value first-hand, as data analysts compare its voluminous data, both traditional and digital, stored in Hadoop with data residing in its historical database. The result: an enterprise view into products, marketing and merchandising, across its omni-channel experience.

     Similarly, Alice + Olivia is combining structured POS data with PDF and Excel files from its ERP system to gain a complete view of its omni-channel customers, sales and market position. 

     Rather than drown in the rising data waters, retailers need to find ways to wade through information and merge both structured and unstructured sources. By applying robust, innovative analytical tools, retailers are finding the metaphorical flashlight needed to swim through the otherwise murky data-filled waters, and make more insightful business decisions.

Deena M. Amato-McCoy