Challenges in Intelligence Analysis Under Data...

25
Challenges in Intelligence Analysis Under Data Overload Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Transcript of Challenges in Intelligence Analysis Under Data...

Page 1: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Challenges in Intelligence Analysis Under Data Overload

Emily S. Patterson, PhD

Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Page 2: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

August 7, 1998 Bombing of US Embassy in Africa

224 killed, including 12 US personnel

Intelligence Analysis: Avoiding Surprise

Page 3: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Data Overload Definition

•  Data overload in inferential analysis*:

–  A condition where a domain practitioner, supported by artifacts and other human agents,

finds it extremely challenging to focus in on, assemble, and synthesize

the significant subset of data for the problem context into a coherent assessment of a situation,

where the subset of data is a small portion of a vast data field

*See Woods, Patterson, and Roth, 2002 for alternative definitions

Page 4: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

•  Identify vulnerabilities to failing to meet work demands

•  Reduce vulnerabilities with innovations that:

–  Cope with context sensitivity in interpreting the meaning of data

–  Are robust to brittleness in machine processing

•  Focus on leverage points:

1.  Process vulnerability (either now or in the future) that has high consequences for failure

2.  “New” technological or organizational capability

3.  Confident in predicting impact on performance

•  well-developed “design” and “science” research base, experience in other worlds…

Interdisciplinary (CSE + Design) Approach

Page 5: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

CSEL “Study base” on Intel Analysis

Studies: •  10 expert (13 yrs) NASIC analysts on Ariane 501 •  2 junior (not cleared) NASIC analysts on Ariane 501 •  6 expert NASIC analysts critiquing junior analyst on Ariane 501 •  6-day observations of army captains (in training) doing

collaborative counter-terrorism (Germans in 1940s) •  4-day observations of army lieutenants (in training) doing

modern-day Stability and Support Operations (SASO) •  Interviews of ~50 army intelligence analysts •  2 novice, 3 expert NSA analysts critiquing junior NASIC analyst

on Ariane 501 •  A lot not in Intel Analysis

Page 6: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Community “knowledge base” (very incomplete – help welcome)

•  Bamford’s "The Puzzle Palace: A Report on America's Most Secret Agency”

•  Richards Heuer’s The Psychology of Intelligence Analysis (1999) •  Klein Assoc studies of “profilers” (Klein, 2001, Hutchins, 2003,

Pirolli et.al, 2004) •  Department of Defense’s “Novel Intelligence in Massive Data /

Glass Box” •  Anthropologically-based needs analysis (Johnston, 2005) •  Laboratory experiments (Cheikes and Taylor, 2003 and Cheikes,

Brown, Lehner and Adleman, 2004)

•  Website: http://www.tkb.org has great data on terrorism •  Friends of the Intelligence Community (FOIC - Brian Moon)

Page 7: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Down Collect Conflict and Corroboration

Hypothesis Exploration

Convergent broadening / narrowing model of decision-making for intelligence analysis

Broadening checks

Page 8: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Down Collect •  10 NASIC experts, 1 novice doing Ariane 501:

–  Refine until manageable (22 – 419 documents) –  Open based on dates and titles (4 – 29 documents) –  Rely heavily on small number (1-4 documents)

•  NSA expert: Start with key terms…something jumps out at me and I follow that route…”I know it when I find it”…Always look for dates - current means less than 2 years…do anything to reduce the number of hits…Wean by year…I go through 2-3 filter and sort processes (hi-level, sorting, choosing what to use) story? Outliers? Conflicts?

•  NSA expert: 57 hits, that’s nothing…lead information, jot down tidbits for later digging, anything that can be pulled on, names companies, software, buildings…biggest problem is getting right search terminology to get what you want…‘gold nuggets’ concept…I build a house, get started and fill in…write down search terms on paper, but do most of the analysis in my head

•  NSA expert: These search terms are generic, going to get lots of the same stuff…query is too broad…add in ‘economic impact’ or ‘political’…looking for consensus on what happened…take little trails off the main path to investigate subtopics

Page 9: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Down Collect •  NASIC critiquing interviews

–  Look for one or two articles that specifically talk about incident, get feel, then go back and search.

–  Use a broad query to pull in lots of things –  Need to check intelligence sources not just open source –  Refine search if you know there should be lots of

information out there –  Only going to get 10% of the data if you're lucky -

frequently only 1% –  Use a broad query if unsure of what is being looked for –  Documents that have more detailed information might be

more valuable –  Commercial translations miss subtleties - use translators to

get all of the little connotations for critical data –  Documents before an event can give you background that

might not come up in later documents

Page 10: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Down Collect (Document Selection) •  NSA expert: I do quick glancing, I don’t read whole documents right

away; names, technology, places, too many nuances for an automated system…

•  NSA expert: –  scandals and dirt and stolen intellectual property are always

important to find…Event recognition from newspapers… –  there are usually 4-5 I’m quoting large chunks from; requirements

for me: •  doc uses certain set of phrases •  provides a good succinct history of matter at hand •  good lay-translation •  written 2-6 weeks after the event in question

–  historical familiarity – favorite source…sometimes serendipity…good analysis

–  “editorial” style docs more useful for tidbits and pointers, someone’s opinion, some fact but lower weighting…sometimes useful to capture the debate

•  NSA novice: Q: If you could see more than date and title, what would you want? Set up like MS Outlook preview, 1st three lines…want to see source of document too. I go straight down the list 1, 2, 3.

Page 11: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Low and High Profit Documents (Indistinguishable with Current Interface)

Page 12: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

* Significant difference using Wilcoxon-Mann-Whitney Non-Parametric Test

Outcome Comparison: Did Not Rely on High Profits vs. Did

Page 13: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

“Narrowing” Search Tactics

Page 14: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Conflict and Corroboration •  10 NASIC experts, 2 novices doing Ariane 501:

–  Trust “key” documents –  Mixed on whether explicitly search for conflicting

assessments (high level only, not on details) –  Mixed on whether reference sources (considered

unprofessional to put multiple explanations in analysis document)

–  (When noticed) break ties based on “quality” attributes : •  Language: Technical expertise, translation, biased

interpretation, “facts” vs implications, past vs future, consensus vs multiple interpretations, uncertainty

•  Source: reason for deception, access to privileged information, “trustworthiness” of source

Page 15: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Conflict and Corroboration •  Individual strategies vary - little electronic support

–  Search more documents to break ties –  Check if multiple reports were from the same press release –  Look to see whether corrections were made later –  Look for how things are generally done to see if different –  Use highlighter pens for all documents on that topic –  Print out documents and highlight one topic per color from

independent sources –  Highlight “loose ends” phrases in Word with colored font –  Highlight when data comes from the same original source –  Tracking reference information for discrepant information –  Ask expert in the area –  Sort printed documents into categories (one paragraph

each) and then review for differences of opinion –  Pick “best” source and cut/paste

Page 16: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Conflict and Corroboration •  NSA interviews

–  NSA novice: I compare it across multiple sources…gather as much data as possible, and rely on my mentor for feedback/guidance.

–  NSA expert: dealing with contraindicating facts…have to dig hard to deconflict…the weight I put into the source effects how I deconflict facts…aware of ultimate source (creeping validity) problem – in regards to multiple instances of essentially the same data…hard to tell when a cited source has been updated…

–  NSA expert: never take one source’s word…corroborate; want unaffected contributions – sigint, imageint, etc.; go from different data sources…reports can look like multiple sources of information, if there’s no serial number…go outside my document frequently to verify things, and will say I spoke to experts in X shop and they agree that Y is the case…not just citing reports, but saying talked to actual person or office

–  NSA novice: have electronic notes of who said what, reference the document, build all together in one big notes page…write down different POV’s

Page 17: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Conflict and Corroboration •  NASIC critiquing interviews

–  Talk to other analysts to see to discuss the problem –  What source information comes from is very important, loses

validity if 2nd or 3rd hand information –  It's necessary to corroborate information, might not use if

only in one source. –  Be aware of directed sources, where they only put in what

they want you to believe –  Need multiple sources to confirm data –  Talk to other people to get their take –  Reports six months or so after an event (depending on the

event) probably have more accurate information than those immediately around event

–  Take open source with a grain of salt - might be on soap box, misled themselves, intentionally misleading audience

–  Human sources have to have direct knowledge for creditability

Page 18: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

1.  Relying on default assumptions

2.  Repeating inaccurate information

3.  Missing updates that overturn analyses

Reasons for Inaccurate Statements

Page 19: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Note-Taking (Hypothesis Exploration) •  NASIC typical process: cut and paste “snippets” to WORD,

mixed on saving what document it is from, near the end organize notes into briefing topics (based on question)

•  NSA novice: use copy&paste, and add comments to that •  NSA expert: I print everything possibly interesting; sorting

process on big stack of documents, separate classified from unclassified…lots of footnotes, “cutting and pasting”, open Word, retype selection by hand; when sorting documents, always have a misc. pile (for backbone docs, single instances, tidbits, and info to revisit)…everything gets printed, then I can intermingle classified and unclassified info…highlighted info means “will be included”…use “starring” for multiple-fit areas

•  NSA expert: group anything of interest in a long Word document, sometimes start new notes documents to have versioning points in my reasoning process

•  NSA novice: put important bits at the top of my notes page; start a new document based on topic or new person to track

Page 20: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

“Building a House” (Hypothesis Exploration) •  NSA expert:

–  Look at everything involving satellite failures – is there a pattern –  Get context of problem, then start new notes document for

information specifically related to Ariane… –  Better to organize as you go, but gets unwieldy after things get

big –  names, facilities, software (unique identifiers), then do another

search on uniques – novice should have looked up “interspace 592”

–  He stopped looking for other reasons after he found software failure – incomplete analysis…doesn’t differentiate enough between ‘lost’ and ‘blown up’ – would restart my search at this point; once you realize something, prove that it wasn’t the case (differential diagnosis)…could have a tool to remind people to both try to prove something right AND prove it wrong

•  NSA expert: novices are not as aware of different perspectives; briefing to General who says “My boys aren’t telling me that” normally only happens once before you realize the importance of referencing

Page 21: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Briefing (Hypothesis Exploration) •  NSA expert:

–  Want a timeline of how it is developed (if my job is to fix this)…what else does this affect, what are the recommendations?; want details up front – who what where when why…where the breakdown happened…what can I do about it…convince me we ruled out everything else; a fixation on one singular problem can make you miss other co-existing contributions

–  He’s just answering question, not doing analysis…analysis is going beyond the question - what other things are associated with software failure? don’t collate – analyze…make sense of it and make own assessment…we have gaps in the intelligence community, things we don’t know…people don’t like to say “I don’t know”…when in actuality it will generate new requirements for asset tasking; analysis and collection are pretty linked, have to go through some steps though to task assets

Page 22: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Briefing (Hypothesis Exploration) •  NSA expert:

–  When I send products out, from big report from another agency, senior management wants a 3 sentence summary; 3 sentences with a detailed attachment..

–  Normal audience types are: management, techies, novices –  Key findings up front, then go back and fill in my process to

getting them –  Timeframe for products varies:

•  no set time (until critical mass, trend) •  specific question from boss (less than 1 week) •  mostly 1-2 days (8-10 hrs actual time)

Page 23: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Predicted Reasoning to Explain Data (Abductive Inference)

Page 24: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

Observed Reasoning (“Second order” Abductive Inference)

Page 25: Challenges in Intelligence Analysis Under Data Overloadcsel.eng.ohio-state.edu/.../4_Institutes/2005/Patterson2005_CPoDKickoff-Slides.pdf– Talk to other analysts to see to discuss

•  Missing information

–  unsophisticated search strategies

–  missed high profit documents

–  sequential search – read/cut&paste – organize - brief

•  Making inaccurate statements –  relying on default assumptions

–  repeating inaccurate information (from key documents for skeleton and other documents for “gaps”)

–  missing updates that overturn analyses

•  Premature closure –  uncertain when to stop

–  confidence hard to gauge (highly variable)

Under data overload, a short deadline, with a baseline electronic environment, and particularly for tasks outside bases of expertise, intel analysts are vulnerable to...

High-level Summary of Vulnerabilities