Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which...

12
Crowd Powered Data Enrichment Combining crowdsourcing and automation to drive smarter business processes from unstructured data

Transcript of Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which...

Page 1: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

Crowd Powered Data Enrichment Combining crowdsourcing and automation to drive smarter business processes from unstructured data

Page 2: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

The flood of data available in the digital age represents a key opportunity for companies to improve customer experiences, make faster business decisions or leverage knowledge assets. But only a trickle of the data is structured enough to feed directly into the enterprise data supply chain.

The rest of the data is floating in forms that make it difficult to use—unstructured text from social feeds and online reviews, as well as photos, audio clips and videos. Literally millions of text strings are available on the web from which companies could, in theory, extract valuable cues about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

For example, imagine a clothing retailer has customers who have granted access to photos on their social media feeds. Insight from these photos could be used to analyze fashion trends or provide customers with recommendations for clothing that they might wish to buy. But analysis and recommendation algorithms need structured attributes of clothing to work with, not unstructured photographs.

This is the condition in which companies often find themselves today. Digital channels are delivering a torrent of unstructured data that contains the potential to unlock better, smarter business operations. The key is to transform the data into a structured format.

This typically means cleansing and enriching the data by removing duplicates and other errors, while adding structured meta-data, which are descriptive tags that can be used by other processing systems. For instance, adding meta-data to a photo makes it sortable and more useful for further human or automated processing. Tagging a knowledge asset makes it easier to route, protect and retrieve. Categorizing unstructured text with attributes such as positive or negative comments about a service makes it suitable for analytics. But what is the best way to do this cleansing and enrichment?

The Unstructured Data Flood

2

Page 3: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

3

Page 4: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

Improve Customer ExperiencesRecommendation engines are popular tools for enticing customers to buy more products; however, they are based largely on analyzing an individual customer’s previous activity. In order to deliver highly personalized product recommendations, businesses must get to know their customers at a much finer-grain level. Analyzing a customer’s social media posts, online reviews of branded products or services, video clips or other posted material can reveal specific preferences or tendencies and more subtle cues, such as moods.

For instance, a customer who regularly mentions loving to travel and posts videos of recent trips will be more predisposed to offers for airline discounts to new destinations. The same customer might appreciate a coupon for new travel gear if she posted that she lost her luggage on the last trip. In this case, crowdsourced data enrichment would be used to extract pertinent attributes from the data and give it structure for the analysis.

Make Faster Business DecisionsMany businesses need to ingest and process data assets from multiple sources in near real time in order to provide correct listings for websites or make better business decisions. Crowdsourcing is a viable way to distribute the effort to cleanse and enrich the data to make it accurate, void of duplicates, assigned to the correct listing/reference and in the correct order.

Some data must also be acted upon within a certain time constraint. Crowdsourcing can help organizations make human judgments about what to prioritize. For example, the FDA collects reports from consumers who have experienced adverse reactions to certain drugs. The government agency uses crowdsourcing to handle the large volume of incoming reports to identify the critical cases and report them to healthcare practitioners, pharmacists and drug manufacturers in a timely manner.1

Leverage Knowledge AssetsManufacturers, service firms and other large companies generate thousands of documents—proposals, white papers, manuals, product and technology descriptions, incident reports and more—all of which are stored in enterprise content management systems. Businesses can use crowdsourcing to classify and annotate these unstructured documents in order to make them easier to search and reuse, route to the right people or protect with automated security safeguards. Humans can also confirm if a knowledge asset has been correctly categorized by an automated algorithm.

Innovations with Unstructured Data

Companies can use crowdsourced data enrichment to add attributes to unstructured data in a number of ways. For example, three areas where Accenture Technology Labs sees potential are to:

4

Page 5: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

With the maturation of machine learning and other advanced analytics techniques, it is natural to assume that automated analytics—for instance, image recognition or text understanding technology—would be the best way to enrich raw data with structured meta-data. And, in many cases, automation can help.

However, while automation is desirable for cost reasons, many tasks that are easy for people to do are still beyond the capabilities of algorithmic analysis. Extracting insights from highly variable data requires common sense, language processing skills, higher-order cognitive reasoning or subjective human judgment. For example, most humans quickly know the difference between a prom photo and a wedding photo. They can usually detect sarcasm or humor in a text message. And they can provide an opinion on whether a man looks bad or good in a new suit.

Even if it is technically feasible to create algorithms to handle all of these things, they would be expensive to develop and time-consuming to adapt when requirements changed even slightly. At the same time, using a dedicated team to manually process data is expensive and difficult to sustain. The work can be tedious and, in many contexts, the need comes in bursts, which calls for a capability available on demand rather than a dedicated team.

In response to this need, businesses are starting to turn to paid crowdsourcing platforms such as Amazon Mechanical Turk, CrowdFlower and Crowdsource to enrich a variety of data sets with meta-data. Think of these large, micro-task labor platforms as eBay-like markets for paid human labor rather than for products.

How do these crowdsourcing platforms work? A company’s data enrichment system sends data to a platform via a web service or API, similar to how the system would send data to any automated data manipulation service. Rather than attempting to process the data automatically, which may not be possible, the platform posts small data enrichment tasks to its web-based labor markets for workers to access. Each post may include a web form for entering results and the price offered, such as a few cents for each task.

By having many workers process small chunks of data, these platforms enable human-in-the loop computing to extend processing beyond what automation alone can handle. The paid crowdsourcing platforms also employ a number of techniques to ensure quality results, including restricting jobs to workers who have earned a strong reputation in the market, spot-checking with test questions, or redundantly giving the same task to multiple workers to ensure consistent answers.

Some organizations are also beginning to mobilize crowdsourced labor through mechanisms that do not involve cash payments, or what is called “free crowd.” This free micro-task labor channel provides incentives through alternative means, such as incorporating the tasks into an interactive mobile game that produces useful meta-data as a byproduct of the play. The ESP game, for instance, is an image tagging game played between two people.2 Each types in attributes of the picture they are viewing, such as a chair or a water bottle, until one of the attributes matches and is saved as meta-data. The next time the game players are shown the same picture, they must type in a different attribute. By using a game format, companies can tap into a crowd of people who have aligned preferences, tendencies and interests with a particular type of task.

Another example approach involves embedding the work into a useful function, such as user authentication. The reCAPTCHA application asks web users to type in one word for authentication and a second word, which is actually an image of archived text, to confirm the word for data enrichment.3 In this way, the New York Times cleaned up the digitized version of its archives.4

Adding Structured Meta-Data with Human-in-the-Loop Computing

5

Page 6: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

Patterns for Combining Crowdsourcing with AutomationTo combine crowd work and automation systematically and effectively, companies must first understand the various patterns and then determine which one works best for their specific tasks. In our studies, the Labs identified three key patterns (see Figure 1):

PATTERN 1: Pipelining

PATTERN 2: Find Answer/Check Answer PATTERN 3: Crowd Trains System

Crowd Operation #2

... ...Automated Analysis #1

Crowd Operation #1

Automated Analysis #2

Crowd Review

PROPOSED ANSWERREJECTED

APPROVED ANSWER

Automated Analytics

Automated Classifier

UNLABELED TRAINING DATA

LABELED TRAINING DATA

Crowd

FIGURE 1: Patterns for crowdsourcing combined with automation. Copyright © 2016 Accenture. All rights reserved.

6

Page 7: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

PATTERN 1: Crowd Plus Automation PipelineIn this approach, a workflow is parceled out in phases of work, with some going to the crowd and others to automated algorithms in sequence that makes best use of the strengths of each. For example, a retailer could use humans to assign attributes to a sweater—crew or V-neck, argyle or plaid, casual or dressy and so on—according to predefined questions. The resulting enriched and structured data would be available to enable searching, sorting or further processing. The business could feed the attributes into a search engine to complete an automated analysis and find similar clothing items to create a look that goes with the sweater. The crowd would then judge which sweater is best for an individual person.

PATTERN 2: Crowd Verifies the Automated SystemMany tasks are time-consuming to perform, but once performed, quick to verify. In these cases, businesses could use automated analytics to perform a task and propose an answer, which could then be reviewed by the crowd and approved, or rejected and returned for additional processing. For example, real estate management companies must catalog and store thousands of contracts. A business could automate the process of finding the clause in every contract that governs how much lead time a landlord needs to give a tenant before terminating an apartment lease. The crowd would then be used to check the automated results and confirm if the identified clause is correct or not.

PATTERN 3: Crowd Trains the Automated SystemAutomated analytics approaches often involve machine learning algorithms that need to be “trained” with a large set of sample data. By using the crowd to produce the training data for an automated algorithm, and then using the automated algorithm to process production data, a company can reduce the time, effort and cost needed to establish an automated processing capability. For example, an auto insurance provider could send photos of damaged automobiles and ask the crowd to assign the correct attributes, such as whether a car was rear ended or a rock hit the windshield. The results would be used to train the computer vision algorithm.

7

Page 8: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

Doing so can provide speed and lowest-possible cost, along with flexibility and the nuances that still require human judgment. Linking crowdsourcing with automation epitomizes the “Workforce Reimagined” trend described in the Accenture Technology Vision 2015 in which companies will increasingly use a collaborative model of humans and machines to achieve more than either could do alone.

Our unique approach to data enrichment (see Figure 2) involves:

• AUTOMATION to handle rote, repeatable tasks; adapt existing analytics solutions to alleviate human processing; train existing algorithms with human-generated data sets; and perform automated search.

• PAID CROWD vendors to extend micro-tasking beyond rote, repetitive tasks to new types of work, such as handling data enrichment tasks that algorithms cannot compute or producing training data sets to refine algorithms.

• FREE CROWD to tap into new sources of labor, coalesce dispersed crowd expertise, gain specialized expertise, cultivate and grow.

As shown in the sidebar on p6-7, the Labs has identified some effective patterns for combining crowdsourcing with automation. The use of free, game-powered crowd work is a new concept, and we are still working to determine the best pattern for combining free crowd with paid crowd and automation. One possible approach is to use the paid crowd to seed pre-determined correct answers for the free crowd, and then to utilize the free crowd to generate or scale up on required judgments. It will require more real-world experience with the paradigm to develop mature patterns for workflows that include all three elements.

Linking Automation to Paid and Free Crowd At Accenture Technology Labs, we envision a future in which digital businesses maximize their value of unstructured data by adding meta-data through the use of crowdsourced labor cleverly combined with automation.

FIGURE 2: Accenture Technology Labs’ data enrichment approach.

Copyright © 2016 Accenture. All rights reserved.

REIMAGINED APPROACHto Data Enrichment

PAID CROWD

AUTOMATION FREE CROWD

8

Page 9: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

In order to get started, companies should take inventory of new data sources available from various digital channels. The next step is to examine and analyze these sources to determine which kinds of unstructured data could be converted into useful data and ingested into the data supply chain to help achieve business objectives. Companies should then run experiments to determine what patterns combining crowdsourcing and automation are most effective. Once this is decided, the final phase is to design and develop a data enrichment production system that will allocate workflows to the crowd.

The ability to run experiments is greatly enhanced with a platform that allows companies to set up new workflows, and apply and execute various patterns of crowdsourcing for data enrichment. Accenture Technology Labs has created a crowd-powered data enrichment prototype platform that facilitates this process through iterative experimentation and systematic fine-tuning of crowd workflow configurations, price per task and other variables (see Figure 3 for screen shot of functionality).

Our platform serves as an environment to experiment with sourcing work to the crowd for ideal cost, quality and turnaround time across a variety of crowdsourcing platforms and managed services. For example, we have tested different workflows that combine crowdsourced data with automation; different price points for

optimization; different quality assurance mechanisms, such as redundancies, gold standard questions or a combination of the two; and different engagement mechanisms for crowd workers, including monetary bonuses, task sourcing layouts and methodologies.

Putting the Crowdsourcing Data Enrichment Pieces Together

FIGURE 3: Accenture data enrichment prototype platform allows for testing of various crowd configurations and price points.

Copyright © 2016 Accenture. All rights reserved.

9

Page 10: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

RAW DATA ATTRIBUTES REQUIRED

Questions for the Crowd

Predefined Structured ResponseQuestions

Add an Additional Question

• What is the individual’s gender?• What is the individual’s style of dress?• What kind of outerwear is the individual

wearing?• What kind of sweater is the individual

wearing?• What si the pattern of the individual’s

sweater?• What kind of shirt is the individual wearing?• What is the pattern of the individual’s shirt?• What kind of bottoms is the individual

wearing?• What kind of shoes does the individual

have on?• Do the shoes have laces?

Analysis Completed• Gender: Female• Style of Dress: Dressy Casual• Outerwear: Trenchcoat• Sweater / Sweatshirt: V Neck• Sweater / Sweatshirt Pattern: Solid• Shirt: Blouse• Shirt Pattern: Solid• Clothing-Bottom: Pants• Footwear / Shoes: Sandals• Laces: No Laces• addQuestionResponse: She seems happy.

Sort ByShop The Look Analysis Completed

Shop The Look Analysis Completed

Shirt Pattern

Solid

Striped

None

Analysis Completed• Gender: Female• Style of Dress: Dressy Casual• Outerwear: None• Sweater / Sweatshirt: None• Sweater / Sweatshirt Pattern: None• Shirt: None• Shirt Pattern: None• Clothing-Bottom: Dress• Footwear / Shoes: High Heels• Laces: No Laces• addQuestionResponse: Happy.

• What emotion is the individual expressing?

ENRICHED DATA SEARCHABLE/SORTABLE ATTRIBUTE-BASED SEARCH

FIGURE 4: Retail use case: SHOP THE LOOK. Copyright © 2016 Accenture. All rights reserved.

The Labs used a version of this crowd-powered data enrichment platform for a “Shop the Look” use case (see Figure 4). The approach illustrates 1) the process of starting with raw data and using the crowd to respond to questions and assign attributes; 2) completing analysis to generate enriched data, which is then able to be searched and sorted; and 3) conducting an attribute-based search to complete the analysis and recommend products to recreate the look.

The Labs also developed an apparel attribute tagging game in which the crowd contributor races against the clock, providing judgments related to a particular image to compete for a reputation or prize (see Figure 5). Possible variations of this game include a multiplayer game in which players compete against each other. To access a free crowd to play the game, a retailer would want to find millions of people who follow style, fashion apparel by posting the game on platforms such as Instagram. These people would have the appropriate context, background and domain expertise to provide useful meta-data.

FIGURE 5: “Shop the Look” game.

Copyright © 2016 Accenture. All rights reserved.

10

Page 11: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

With the explosion of new sources of data, cleansing and enrichment are growing increasingly important, and automatic analysis alone still handles only a part of that challenge. By combining paid and free crowdsourced data enrichment with automation—and using the Labs’ prototype platform to bring human labor into the processing loop—companies can maximize the many potential streams of unstructured data into a continuously flowing river of useful data. As this area matures, companies will want to integrate crowdsourcing seamlessly with other business systems, and work to combine mastery of the crowd with cognitive computing techniques.

Crowd Powered Data Enrichment Conclusion As Accenture discusses crowdsourcing and data enrichment with a broad range of clients—from clothing retailers and grocers, to enterprises with large internal knowledge bases to manage, we are hearing a similar story in many different contexts.

11

Page 12: Crowd Powered Data Enrichment€¦ · about their customers. Ditto for millions of photos, which often contain context and insight that is difficult for automated systems to unlock.

Authors

About AccentureAccenture is a leading global professional services company, providing a broad range of services and solutions in strategy, consulting, digital, technology and operations. Combining unmatched experience and specialized skills across more than 40 industries and all business functions—underpinned by the world’s largest delivery network—Accenture works at the intersection of business and technology to help clients improve their performance and create sustainable value for their stakeholders. With approximately 373,000 people serving clients in more than 120 countries, Accenture drives innovation to improve the way the world works and lives. Visit us at www.accenture.com.

Notes1 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4387295/; http://mobihealthnews.com/44366/fda-taps-patientslikeme-to-test-the-waters-of-social-media-adverse-event-reporting/

2 https://en.wikipedia.org/wiki/ESP_game

3 https://www.google.com/recaptcha/intro/index.html

4 http://www.nytimes.com/2011/03/29/science/29recaptcha.html?_r=0

About Accenture Technology LabsAccenture Technology R&D is the dedicated research and development organization within Accenture that includes the Technology Vision group, Accenture Open Innovation and Accenture Technology Labs.

For more than 20 years, Accenture Technology R&D has helped Accenture and its clients convert technology innovation into business results. Our R&D organization explores new and emerging technologies to create a vision of how technology will shape the future and shape the next wave of cutting-edge business solutions.

It is physically located across six locations: Beijing, China; Sophia Antipolis, France; Bangalore, India; Dublin, Ireland; Silicon Valley and Arlington, Virginia in the United States.

16-1270U/9-10897

This document makes descriptive reference to trademarks that may be owned by others. The use of such trademarks herein is not an assertion of ownership of such trademarks by Accenture and is not intended to represent or imply the existence of an association between Accenture and the lawful owners of such trademarks.

Copyright © 2016 Accenture All rights reserved.

Accenture, its logo, and High Performance Delivered are trademarks of Accenture.

Alex Kass [email protected]

Manish Mehta [email protected]