The data scientist identity - chaire PARI › wp-content › uploads › 2019 › ... · • 2012:...
Transcript of The data scientist identity - chaire PARI › wp-content › uploads › 2019 › ... · • 2012:...
Reflexive creativity in the digital age
The data scientist identity
Séminaire de la chaire PARI
Philipp Brandt Sciences Po
October 16, 2019
1
2
Plan• Data science
Unconventional work with data
• Data science in historyNew approaches in quantitative fields
• Data science of data scienceExperimenting with novel skill combinations
• Professional identity from reflexive creativityNew boundaries around old ideas and familiar problems
3
Data scienceRachel Schutt, a senior research scientist at Johnson Research Labs,…: “a hybrid computer scientist software engineer statistician. … The best tend to be really curious people, thinkers who ask good questions and are O.K. dealing with unstructured situations and trying to find structure in them.”
4
Data scienceRachel Schutt, a senior research scientist at Johnson Research Labs,…: “a hybrid computer scientist software engineer statistician. … The best tend to be really curious people, thinkers who ask good questions and are O.K. dealing with unstructured situations and trying to find structure in them.”
5
Data scienceRachel Schutt, a senior research scientist at Johnson Research Labs,…: “a hybrid computer scientist software engineer statistician. … The best tend to be really curious people, thinkers who ask good questions and are O.K. dealing with unstructured situations and trying to find structure in them.”
6
Data scienceRachel Schutt, a senior research scientist at Johnson Research Labs,…: “a hybrid computer scientist software engineer statistician. … The best tend to be really curious people, thinkers who ask good questions and are O.K. dealing with unstructured situations and trying to find structure in them.”
7
Data science
“Data Scientist” in U.S. newspapers; Source: Factiva
Rachel Schutt, a senior research scientist at Johnson Research Labs,…: “a hybrid computer scientist software engineer statistician. … The best tend to be really curious people, thinkers who ask good questions and are O.K. dealing with unstructured situations and trying to find structure in them.”
8
Data science
9
Professional identity
Professions and Expertise
10
How have data experts constructed a novel one?
11
Professional identities are plentiful.
How have data experts constructed a novel one?
12
Professional identities are plentiful.
Reflexive creativity!
History
• 2012: “The sexiest job of the 21st century” (Davenport and Patil)
• 2005: “Data science as an academic discipline” (Smith)
• 2001: “Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics” (Cleveland)
• 1998: “What is Data Science?”(Hayashi in Data Science, Classification, and Related Methods)
13
History
• 2012: “The sexiest job of the 21st century” (Davenport and Patil)
• 2005: “Data science as an academic discipline” (Smith)
• 2001: “Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics” (Cleveland)
• 1998: “What is Data Science?”(Hayashi in Data Science, Classification, and Related Methods)
14
History
• 2012: “The sexiest job of the 21st century” (Davenport and Patil)
• 2005: “Data science as an academic discipline” (Smith)
• 2001: “Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics” (Cleveland)
• 1998: “What is Data Science?”(Hayashi in Data Science, Classification, and Related Methods)
15
History
• 2012: “The sexiest job of the 21st century” (Davenport and Patil)
• 2005: “Data science as an academic discipline” (Smith)
• 2001: “Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics” (Cleveland)
• 1998: “What is Data Science?”(Hayashi in Data Science, Classification, and Related Methods)
16
HistoryScientific impurity
Time
Statistics
Data science
Practical discourse
Social sciences
Classification Association
CODATA
1996 201217
Data science of data science
18
Data science jobsYou will handle data exploration, hypothesis creation (from both business and product goals), testing algorithms, scaling to large data-sets and validating results. We have a broad set of technologies with which the Senior Data Scientist will work: Hadoop/HDFS; Shark/Spark; NoSQL databases, and numerous charting, graphing and analysis applications such as: Gephi, Google Charts, etc. [...] We’d like to see good coding skills covering some procedural as well as statistical or data oriented languages. (Such as: Java, Scala, Python as well as R, SQL, etc.). Good communication skills and an awareness of how to communicate data effectively is a must. This individual must be comfortable working in newly forming ambiguous areas where learning and adaptability are key skills . Required Education: MS or higher in the field of Statistics or Computer Science or Applied Mathematics.
19
Data science jobsYou will handle data exploration, hypothesis creation (from both business and product goals), testing algorithms, scaling to large data-sets and validating results. We have a broad set of technologies with which the Senior Data Scientist will work: Hadoop/HDFS; Shark/Spark; NoSQL databases, and numerous charting, graphing and analysis applications such as: Gephi, Google Charts, etc. [...] We’d like to see good coding skills covering some procedural as well as statistical or data oriented languages. (Such as: Java, Scala, Python as well as R, SQL, etc.). Good communication skills and an awareness of how to communicate data effectively is a must. This individual must be comfortable working in newly forming ambiguous areas where learning and adaptability are key skills. Required Education: MS or higher in the field of Statistics or Computer Science or Applied Mathematics.
20
Data structure
21
Job title Time N (in 1,000)Attorney 2013 2.9Data scientist 2013 1.8Software engineer 2013 8.4Random 2010-2013 ~40Source: LinkedIn API; US jobs
Data structure
22
Job title Time N (in 1,000)Attorney 2013 2.9Data scientist 2013 1.8Software engineer 2013 8.4Random 2010-2013 ~40Source: LinkedIn API; US jobs
Analytical pipelineRandom
skill 1 skill 2 …
Data scientist
skill 1 skill 2 …
Random
skill 1 skill 2 …
Data scientist
skill 1 skill 2 …
Random
skill 1 skill 2 …
Data scientist
skill 1 skill 2 …
Random
skill 1 skill 2 …
Data scientist
skill 1 skill 2 …
Random
skill 1 skill 2 …
Data scientist
skill 1 skill 2 …
Random
Blah bla skill 3 bla skill 4 bla …
Data scientist
Blah bla skill 1 bla skill 2 bla …
S1, S2, …, Title
1, 0, …, 1 1, 0, …, 1
1, 0, …, 1 0, 1, …, 0
1, 1, …, 0
Pr(yi=1) = logit-1(Xiβ)
23
Performance
24
Professional recognition
25
Sociological puzzle
26
• Employers associate data scientists with a distinct skill set
• But: this skill set has formed independent of the data scientist label
Professional identity from reflexive creativity
27
Concepts• Structures
Language, capitalism, states (Sewell 1992)Technological objects (Becky 2003; Bowker 2006)Professional rhetorics (Fine 1996; Suddaby and Greenwood 2005)
• Relations Commit, mediate, select (White 2008)Economic advice (Vedres and Stark 2010; Burt 1987)Professional conflicts & expertise networks (Abbott 1988; Eyal 2013)
• PracticeNarrative identity construction (Somers 1994)Creativity and the self (Dewey 1929; Mead 1934; Joas 1994)Provisional selves of professionals (Ibarra 1999)
28
Reflexive creativity
• “Creative action [builds on] pre-reflective aspirations towards which the reflection on the concretization of values is oriented.”
29
(Joas 1996:163)
30
“just one term … to get those annoying research scientists
to do some real work …”
Jeff Hammerbacher Data scientist
Qualitative design
• Field observations between 2012 and 2015
• “Meet-ups” and public events in New York City
• Open Statistical Programming, Data Driven NYC, New York Data Science, MongoDB meet-up
• 70+ Events, 100+ speakers
31
32
33
34
–Claudia, advertising
“By the way, you really, really, really don’t want me to optimize clicks. … if you ever want to have a great
click-through campaign, all you need to do is to show the ad on the flashlight app. … It’s a whole bunch of
people fumbling in the dark. They will click on it eventually.”
–Claudia, advertising35
“So ultimately what you need is something I call alternative histories, or counterfactual .…”
36
–Jake, data science institute
- Attendee
“for any of you [in the audience] who are hiring data scientists …, it is really important that you get clear on which one of these buckets you predominantly fit in.”
“… part of the value your graduates could add would be looking at the big picture and saying
look, here is where I could add value .…”
37
–Jake, data science institute
- Attendee
–Jake, data science institute
“for any of you [in the audience] who are hiring data scientists …, it is really important that you get clear on which one of these buckets you predominantly fit in.”
“… part of the value your graduates could add would be looking at the big picture and saying
look, here is where I could add value .…”
“And the best way is actually … just to come up with very specific examples, …”
38
–Jake, data science institute
- Attendee
–Rachel, media company39
“.. we talk to the people in the company who are in charge of the business systems about what data that they have and what data we can have access to. They
often speak in terms of cubes and enterprise data warehouses. And so we wanna get to the raw logs.”
–Rachel, media company40
“.. we talk to the people in the company who are in charge of the business systems about what data that they have and what data we can have access to. They
often speak in terms of cubes and enterprise data warehouses. And so we wanna get to the raw logs.”
–Rachel, media company41
“And often get initial pushback because they want to know ‘why do you want to deal with the messy logs if
we have these nice, clean data warehouses.’”
Different boundaries
“just one term … to get those annoying research scientists
to do some real work …”
42
“This new field is a better academic enlargement of statistics and machine learning than today’s
Data Science Initiatives, …”
43
The data scientist identity
• Reflexive creativity:Data scientists reconcile their initial ideas with challenges to their self-presentations
• Boundaries and relations:Technical ideas and common imageryClients, scientists and peers
• Shaping data science:Professions, academics, intellectuals, intelligentsiaData “hackers” vs. Data do-gooders
44
References• Abbott, Andrew. 1988. The system of professions: An essay on the division of expert labor. University of Chicago Press.• Bearman, Peter S, and Katherine Stovel. 2000. "Becoming a Nazi: A model for narrative networks." Poetics 27 (2-3): 69-90.• Bechky, B. A. 2003. "Object lessons: Workplace artifacts as representations of occupational jurisdiction." American Journal
of Sociology 109 (3): 720-752.• Bowker, Geoffrey C. 2005. Memory practices in the sciences. Vol. 205. Mit Press Cambridge, MA.• Burt, Ronald S. 1987. "Social Contagion and Innovation: Cohesion versus Structural Equivalence." American Journal of
Sociology 92 (6): 1287-1335.• Collins, H. M., and Robert Evans. 2007. Rethinking expertise. Chicago: University of Chicago Press.• Dewey, John. 1929. The quest for certainty: a study of the relation of knowledge and action.Gifford lectures, 1929. New York:
Minton, Balch.• Eyal, Gil. 2013. "For a Sociology of Expertise: The Social Origins of the Autism Epidemic." American Journal of Sociology 118
(4): 863-907.• Ibarra, Herminia. 1999. "Provisional selves: Experimenting with image and identity in professional adaptation." Administrative
science quarterly 44 (4): 764-791.• Joas, Hans. 1996. The Creativity of Action. Chicago: University of Chicago Press.• Marres, Noortje. 2017. Digital sociology: The reinvention of social research. John Wiley & Sons.• Mead, George Herbert. 1934. Mind, self and society. Chicago University of Chicago Press.• Salganik, Matthew J. 2017. Bit by bit: social research in the digital age. Princeton University Press.• Sewell, William H., Jr. 1992. "A Theory of Structure: Duality, Agency, and Transformation." American Journal of Sociology 98
(1): 1-29.• Somers, Margaret R. 1994. "The narrative constitution of identity: A relational and network approach." Theory and society 23
(5): 605-649.• Suddaby, Roy, and Royston Greenwood. 2005. "Rhetorical strategies of legitimacy." Administrative science quarterly 50 (1):
35-67.• White, Harrison C. 2008. Identity and control: How social formations emerge. 2nd. ed.: Princeton university press.
45
Higher education
Statistics Applied mathematics Mathematics with a statistics focus
Higher education
Master of Science in Analytics at Louisiana State University’s E.J. Ourso College of Business
Higher education
Master of Professional Studies in applied statistics, data science option (Cornell University)
Higher education
M.S./Ph.D. in Data Science New York University Purdue University
Quantitative design
Job title N (in 1,000)
Attorney 2.9Data scientist 1.8
Financial advisor 1.8
Risk analyst 1.2
Software engineer 8.4
Random ~40
50
Professional recognition
51
Data science
Financial advisors Law
Name Setting Story
Motivation Proposition Presentation
Claudia Advertising Advertising counterfactuals flashlight appAaron Education Education validity guys next doorJake Human rights Violations accuracy We’re not in JordanJohn Digital design Schedule polytopes lunch breaksRiley Accommodation Efficiency data repository democratizes dataRachel Mass media Business raw logs chaos, tactics
Con
sulti
ngC
ontra
cts
Broader pattern
53
Name Setting Story
Motivation Proposition Presentation
Claudia Advertising Advertising counterfactuals flashlight appAaron Education Education validity guys next doorJake Human rights Violations accuracy We’re not in JordanJohn Digital design Schedule polytopes lunch breaksRiley Accommodation Efficiency data repository democratizes dataRachel Mass media Business raw logs chaos, tactics
Con
sulti
ngC
ontra
cts
54
Name Setting Story
Motivation Proposition Presentation
Claudia Advertising Advertising counterfactuals flashlight appAaron Education Education validity guys next doorJake Human rights Violations accuracy We’re not in JordanJohn Digital design Schedule polytopes lunch breaksRiley Accommodation Efficiency data repository democratizes dataRachel Mass media Business raw logs chaos, tactics
Con
sulti
ngC
ontra
cts
55
Name Setting Story
Motivation Proposition Presentation
Claudia Advertising Advertising counterfactuals flashlight appAaron Education Education validity guys next doorJake Human rights Violations accuracy We’re not in JordanJohn Digital design Schedule polytopes lunch breaksRiley Accommodation Efficiency data repository democratizes dataRachel Mass media Business raw logs chaos, tactics
Con
sulti
ngC
ontra
cts
56
Name Setting Story
Motivation Proposition Presentation
Claudia Advertising Advertising counterfactuals flashlight appAaron Education Education validity guys next doorJake Human rights Violations accuracy We’re not in JordanJohn Digital design Schedule polytopes lunch breaksRiley Accommodation Efficiency data repository democratizes dataRachel Mass media Business raw logs chaos, tactics
Con
sulti
ngC
ontra
cts
57
Name Setting Story
Motivation Proposition Presentation
John Photo sharing Technique R. A. Fisher titans and blessing
Tristan Analytics platform probabilistic programming
Bayes, von Neumann google it, authority
Hilary Url management OSEMN Chris Wiggins Columbia University
Adam Blogging platform Programming system
John D. Cook programming philosopher
David Video captioning Deep learning Krizhevsky competition, Stanford
Yann Social networking Deep learning Scientists hesitant academics
Hannah Tech research Social problems Social scientists astronomers
Hilary Url management Arab spring researchers experts
Vaclav Online dating Study PNAS top-rated journal
58
Name Setting Story
Motivation Proposition Presentation
John Photo sharing Technique R. A. Fisher titans and blessing
Tristan Analytics platform probabilistic programming
Bayes, von Neumann google it, authority
Hilary Url management OSEMN Chris Wiggins Columbia University
Adam Blogging platform Programming system
John D. Cook programming philosopher
David Video captioning Deep learning Krizhevsky competition, Stanford
Yann Social networking Deep learning Scientists hesitant academics
Hannah Tech research Social problems Social scientists astronomers
Hilary Url management Arab spring researchers experts
Vaclav Online dating Study PNAS top-rated journal
59
Name Setting Story
Motivation Proposition Presentation
John Photo sharing Technique R. A. Fisher titans and blessing
Tristan Analytics platform probabilistic programming
Bayes, von Neumann google it, authority
Hilary Url management OSEMN Chris Wiggins Columbia University
Adam Blogging platform Programming system
John D. Cook programming philosopher
David Video captioning Deep learning Krizhevsky competition, Stanford
Yann Social networking Deep learning Scientists hesitant academics
Hannah Tech research Social problems Social scientists astronomers
Hilary Url management Arab spring researchers experts
Vaclav Online dating Study PNAS top-rated journal
60
Name Setting Story
Motivation Proposition Presentation
John Photo sharing Technique R. A. Fisher titans and blessing
Tristan Analytics platform probabilistic programming
Bayes, von Neumann google it, authority
Hilary Url management OSEMN Chris Wiggins Columbia University
Adam Blogging platform Programming system
John D. Cook programming philosopher
David Video captioning Deep learning Krizhevsky competition, Stanford
Yann Social networking Deep learning Scientists hesitant academics
Hannah Tech research Social problems Social scientists astronomers
Hilary Url management Arab spring researchers experts
Vaclav Online dating Study PNAS top-rated journal
61
Name Setting Story
Motivation Proposition Presentation
John Photo sharing Technique R. A. Fisher titans and blessing
Tristan Analytics platform probabilistic programming
Bayes, von Neumann google it, authority
Hilary Url management OSEMN Chris Wiggins Columbia University
Adam Blogging platform Programming system
John D. Cook programming philosopher
David Video captioning Deep learning Krizhevsky competition, Stanford
Yann Social networking Deep learning Scientists hesitant academics
Hannah Tech research Social problems Social scientists astronomers
Hilary Url management Arab spring researchers experts
Vaclav Online dating Study PNAS top-rated journal
62
Name Setting Story
Motivation Proposition Presentation
John Photo sharing Technique R. A. Fisher titans and blessing
Tristan Analytics platform probabilistic programming
Bayes, von Neumann google it, authority
Hilary Url management OSEMN Chris Wiggins Columbia University
Adam Blogging platform Programming system
John D. Cook programming philosopher
David Video captioning Deep learning Krizhevsky competition, Stanford
Yann Social networking Deep learning Scientists hesitant academics
Hannah Tech research Social problems Social scientists astronomers
Hilary Url management Arab spring researchers experts
Vaclav Online dating Study PNAS top-rated journal
63
Name Setting Story
Motivation Proposition Presentation
John Photo sharing Technique R. A. Fisher titans and blessing
Tristan Analytics platform probabilistic programming
Bayes, von Neumann google it, authority
Hilary Url management OSEMN Chris Wiggins Columbia University
Adam Blogging platform Programming system
John D. Cook programming philosopher
David Video captioning Deep learning Krizhevsky competition, Stanford
Yann Social networking Deep learning Scientists hesitant academics
Hannah Tech research Social problems Social scientists astronomers
Hilary Url management Arab spring researchers experts
Vaclav Online dating Study PNAS top-rated journal
64
Name Setting Story
Motivation Proposition Presentation
Michael Digital marketing MapReduce Iterative work Claudia
Jeremy Data science platform MapReduce Pragmatism Claudia
Riley Accommodation Responsibility Fueled by data Square
John Social network Technique No hand-tuning John Langford
Rachel Mass media Skills Coding no scientist or managers
Michael Data science training Skills Recursion coding reviews
Riley Accommodation Hiring Practice Test day
Jake Data science institute Hiring Uncertainty Examples
Imag
inar
yR
eal
65
Name Setting Story
Motivation Proposition Presentation
Michael Digital marketing MapReduce Iterative work Claudia
Jeremy Data science platform MapReduce Pragmatism Claudia
Riley Accommodation Responsibility Fueled by data Square
John Social network Technique No hand-tuning John Langford
Rachel Mass media Skills Coding no scientist or managers
Michael Data science training Skills Recursion coding reviews
Riley Accommodation Hiring Practice Test day
Jake Data science institute Hiring Uncertainty Examples
Imag
inar
yR
eal
66
Name Setting Story
Motivation Proposition Presentation
Michael Digital marketing MapReduce Iterative work Claudia
Jeremy Data science platform MapReduce Pragmatism Claudia
Riley Accommodation Responsibility Fueled by data Square
John Social network Technique No hand-tuning John Langford
Rachel Mass media Skills Coding no scientist or managers
Michael Data science training Skills Recursion coding reviews
Riley Accommodation Hiring Practice Test day
Jake Data science institute Hiring Uncertainty Examples
Imag
inar
yR
eal
67
Name Setting Story
Motivation Proposition Presentation
Michael Digital marketing MapReduce Iterative work Claudia
Jeremy Data science platform MapReduce Pragmatism Claudia
Riley Accommodation Responsibility Fueled by data Square
John Social network Technique No hand-tuning John Langford
Rachel Mass media Skills Coding no scientist or managers
Michael Data science training Skills Recursion coding reviews
Riley Accommodation Hiring Practice Test day
Jake Data science institute Hiring Uncertainty Examples
Imag
inar
yR
eal
68
Name Setting Story
Motivation Proposition Presentation
Michael Digital marketing MapReduce Iterative work Claudia
Jeremy Data science platform MapReduce Pragmatism Claudia
Riley Accommodation Responsibility Fueled by data Square
John Social network Technique No hand-tuning John Langford
Rachel Mass media Skills Coding no scientist or managers
Michael Data science training Skills Recursion coding reviews
Riley Accommodation Hiring Intelligence IQ tests
Jake Data science institute Hiring Methods Buckets
Imag
inar
yR
eal
69
Name Setting Story
Motivation Proposition Presentation
Michael Digital marketing MapReduce Iterative work Claudia
Jeremy Data science platform MapReduce Pragmatism Claudia
Riley Accommodation Responsibility Fueled by data Square
John Social network Technique No hand-tuning John Langford
Rachel Mass media Skills Coding no scientist or managers
Michael Data science training Skills Recursion coding reviews
Riley Accommodation Hiring Intelligence IQ tests
Jake Data science institute Hiring Methods Buckets
Imag
inar
yR
eal
70
Name Setting Story
Motivation Proposition Presentation
Michael Digital marketing MapReduce Iterative work Claudia
Jeremy Data science platform MapReduce Pragmatism Claudia
Riley Accommodation Responsibility Fueled by data Square
John Social network Technique No hand-tuning John Langford
Rachel Mass media Skills Coding no scientist or managers
Michael Data science training Skills Recursion coding reviews
Riley Accommodation Hiring Practice Test day
Jake Data science institute Hiring Uncertainty Examples
Imag
inar
yR
eal
71
Name Setting Story
Motivation Proposition Presentation
Michael Digital marketing MapReduce Iterative work Claudia
Jeremy Data science platform MapReduce Pragmatism Claudia
Riley Accommodation Responsibility Fueled by data Square
John Social network Technique No hand-tuning John Langford
Rachel Mass media Skills Coding no scientist or managers
Michael Data science training Skills Recursion coding reviews
Riley Accommodation Hiring Practice Test day
Jake Data science institute Hiring Uncertainty Examples
Imag
inar
yR
eal
72