Data and ethics Training

44
LSNTAP Data Ethics when Designing Civil Justice Interventions May 19 th , 2016

Transcript of Data and ethics Training

Page 1: Data and ethics Training

LSNTAP

Data Ethics when Designing Civil Justice Interventions

May 19th, 2016

Page 2: Data and ethics Training

Using Go To Webinar Calling with phone? Select Telephone and

enter your audio pin if you haven’t already.

• Calling through Computer? If you’re using a microphone and headset or speakers (VoIP), please select Mic & Speakers.

• Have questions? Yes! Please help us make this as relevant to you as possible. We’ll reserve the last 10 minutes for questions, but, feel free to add any questions in the Go to Meeting Question Box.

• Is this being recorded? Yes. LSNTAP will distribute the information after the training.

Page 3: Data and ethics Training

Make sure you get our infographic

after the training!

Page 4: Data and ethics Training

Speakers

Solon Barocas,

Research Associate, Center for Information Technology Policy at Princeton University

Ali Lange,

Policy Analyst, Center for Democracy and Technology's Consumer Privacy Project

Wilneida Negron

Digital Officer, Florida Justice Technology Center/Fellow at Data and Society Research Institute

Page 5: Data and ethics Training

AgendaIntroduction Data Ethics When

Designing Civil Justice Interventions

Wilneida

Topic 1 How Machines Learn to Discriminate

Solon

Topic 2 Digital Decision-Making Ali

Questions?

Page 6: Data and ethics Training

What’s Big Data?

Structured, semi-structured, unstructured data rom traditional and digital sources inside and outside your organization that provide the ability for ongoing discovery and analysis.

Page 7: Data and ethics Training

Big data has the power to improve lives, and often does. 

Page 8: Data and ethics Training

But absent of a human touch, its single-minded efficiency can lead to troubling patterns which can

isolate groups already at society’s margins.

Page 9: Data and ethics Training

Discover useful regularities in a dataset that are just preexisting patterns of exclusion and inequality.

Inherit the prejudice and biases of prior decision-makers.

Research has found that big data analytics can:

Page 10: Data and ethics Training

We and all systems we produce are biased.

Page 11: Data and ethics Training

What Now?

Page 12: Data and ethics Training

Develop a plan

We have a personal responsibility to ourselves and our clients to address the ethical, security, and privacy challenges that arise when working with data.

Page 13: Data and ethics Training

Quality: Have you accounted for biases at both the collection and analytics stages of big data’s life cycle?

Accuracy: Is your data representative? If not, take steps to address issues of under or over representation

Usability: Do you have the proper staffing to undertake data analytics?

.

Step 1: Know your dataFederal Trade Commission Recommends:

Page 14: Data and ethics Training

Outline reasons for collecting personal, community, or demographic identifiable information;

Identify communities which could be adversely affected, and how?

Can your data reinforce existing disparities in terms of: ethnicity, identity, gender, race, class, sexuality, disability, language, religion, size, citizenship status, geography, etc.?

Step 2: Examine data sensitivities for communities hoping to serve

Page 15: Data and ethics Training

Step 3: Know the consumer protection laws applicable to big data practices

Fair Credit Reporting Act Equal opportunity laws:

Equal Credit Opportunity Act

(“ECOA”) Title VII of the Civil Rights

Act of 1964 Americans with

Disabilities Act, Age Discrimination in

Employment Act Fair Housing Act, and Genetic Information

Nondiscrimination Act. Federal Trade Commission

Act State and local laws

Page 16: Data and ethics Training

Evaluate the data literacy of your clients;

Identify low-hanging fruit ways to educate your clients;

Be transparent and let them know how their data is being used.

Step 4: Include and empower your clients!

Page 17: Data and ethics Training

With increasing use of predictive analytics, triage algorithms, justice portals, expert systems, and document assembly will the civil justice community soon need:

Multi-disciplinary Data Ethics committee?

Institutional Review Boards?

Responsible Data Program Managers?

Wormhole into the future

Page 18: Data and ethics Training

How Machines Learn to Discriminate

Solon BarocasCenter for Information Technology Policy

Princeton University

Page 19: Data and ethics Training

Discrimination Law: Two DoctrinesDisparate Treatment

Formal

Intentional

Disparate Impact

Unjustified

Avoidable

“Protected Class”

Page 20: Data and ethics Training
Page 21: Data and ethics Training

Uncounted, Unaccounted, Discounted• The quality and representativeness of records might vary in ways that correlate

with class membership• less involved in the formal economy and its data-generating activities• unequal access to and less fluency in the technology necessary to engage online• more likely to avoid contact with specific institutions• less profitable customers or less important constituents and therefore less interesting

as targets of observation

• Convenience Sample• Data gathered for routine business or government purposes tend to lack the rigor of

social scientific data collection

• Analysts may not have some alternative or independent mechanism for determining the composition of the population

Page 22: Data and ethics Training
Page 23: Data and ethics Training

Dealing with Tainted Examples• Training data serve as ground truth• These would seem like well performing models according to standard

evaluation methods

• What the objective assessment should have been• Accepted and rejected candidates may not differ only in terms of protected

characteristics

• How someone would have performed under different, non-discriminatory circumstances• The difficulty in dealing with counterfactuals and correcting for past injustices

Page 24: Data and ethics Training

Settling on a Selection of Features• Does the feature set provide sufficient information to carve-up the population

in a way that reveals relevant variations within each apparent sub-group?• Unintentional redlining

• In other words: How does the error rate vary across the population?• Discrimination can be an artifact of statistical reasoning rather than prejudice on the

part of decision-makers or bias in the composition of the dataset

• Does the difficulty or cost involved in obtaining the information necessary to bring accuracy rates into closer parity justify subjecting certain populations to worse assessment?• Parity = Fair• Accurate = Fair

Page 25: Data and ethics Training

High

Low

Benefit Harm

• Equal treatment in the marketplace Common level of service and uniform price• Socialization of risk

• Discovering attractive customers and candidates in populations previously dismissed out of hand Financial inclusion• Evidence-based and formalized decision-making

• Less favorable treatment in the marketplace Finding specific customers not worth servicing (e.g., firing the customer)• Individualization of risk

• Underserving large swaths of the market Redlining• Informal decision heuristics plagued by prejudice and implicit bias

Gran

ular

ity o

f the

Dat

a

Effects on historically disadvantaged communities

Page 26: Data and ethics Training
Page 27: Data and ethics Training

Dealing with “Redundant Encodings”• In many instances, making accurate determinations will mean

considering factors that are somehow correlated with legally proscribed features• There is no obvious way to determine how correlated a relevant attribute or

set of attributes must be with proscribed features to be worrisome• Nor is there a self-evident way to determine when an attribute or set of

attributes is sufficiently relevant to justify its consideration, despite the fact that it is highly correlated with these features

Page 28: Data and ethics Training

Let’s not Forsake Formalization• These moments of translation are opportunities to debate the very

nature of the problem—and to be creative in parsing it• The process of formalization can make explicit the beliefs, values, and

goals that motivate a project

Page 29: Data and ethics Training

Solon Barocas and Andrew Selbst,“Big Data’s Disparate Impact,”California Law Review, Vol. 104,

2016

Solon BarocasCenter for Information Technology Policy

Princeton [email protected]

Page 30: Data and ethics Training

Digital Decisions:May 19, 2016

Advocacy Perspective on the Risks and Benefits of Data-Driven Automated Decision

Making

Page 31: Data and ethics Training

The Center for Democracy & Technology

About CDTThe Center for Democracy & Technology is a nonpartisan, nonprofit technology policy advocacy organization. The internet empowers, emboldens, and equalizes people around the world. We are dedicated to protecting civil liberties and human rights online.

CDT is known for: ● Convening industry representatives, researchers, government

officials, and civil rights advocates● Bringing academic rigor to advocacy work● Policy recommendations are informed by technical, as well as

legal, expertise

Page 32: Data and ethics Training

Project Summary

The Digital Decisions ProjectSophisticated statistical analysis is a pillar of decision making in the 21st Century, including employment, lending, and policing. Automated systems also mediate our access to information and community through search results and social media. These technologies are pivotal to day-to-day life, but the processes that govern them are not transparent.

CDT is working with stakeholders to develop guidance that ensures the rights of individuals, encourages innovation and design incentives that promote responsible use of automated decision-making technology.

32

Page 33: Data and ethics Training

Background

Automated Decision Making Systems● Are present in all sectors● Have a varying degree of importance or impact on individuals● What is unique about data-driven discrimination?

○ The speed and extent of the technology increase the potential for its obscurity to frustrate and disenfranchise people

Civil rights and privacy advocates have expressed concern that this erodes accountability and fairness.

Page 34: Data and ethics Training

Background - Disparate Impact and Nature of Harms

Some harms are more immediate for individuals, and others are cumulative and may be more visible when looking at the impact on a group or society at large.

● Insult to dignity● Discrimination on the basis of a protected class● Exacerbating and/or perpetuating historic inequality● Error disproportionately impacts a particular group

34

Page 35: Data and ethics Training

Background -- Seeking Solutions

● Stop High-Tech Profiling● Ensure Fairness in Automated Decisions● Preserve Constitutional Principles● Enhance Individual Control of Personal Information● Protect People from Inaccurate Data

Signatories: American Civil Liberties Union, Asian Americans Advancing Justice, Center for Media Justice, ColorOfChange, Common Cause, Free Press, The Leadership Conference on Civil and Human Rights, NAACP, National Council of La Raza, National Hispanic Media Coalition, National Urban League, NOW Foundation, New America Foundation’s Open Technology Institute, and Public Knowledge.

“Civil Rights Principles for the Era of Big Data”

Page 36: Data and ethics Training

Background

● Stakes are high for consumers (individuals)● Diversity of contexts makes hard-and-fast rules difficult to conceive or apply● Look to existing examples for guidance● Translate the Civil Rights Principles for the Era of Big Data into actionable

steps for private companies

36

Page 37: Data and ethics Training

Digital Decisions--Phases of Automation

Page 38: Data and ethics Training

Digital Decisions--Phases of Automation

Design: Identify Inputs● What is the source of your data? ● Was the data collected first-hand by humans? ● Did they have any perspective or incentive structure that may have influenced the collection of this data? ● If the data was collected directly from users, did they have an equal opportunity to provide data inputs in a machine

readable format? (There is a higher likelihood of error in a handwritten form vs a typed submission.)● How can you clean the data to ensure that this historic or collection bias does not influence your results for this purpose?● Is the data representative of the relevant population? Is any population missing or underrepresented? If so, can you find

additional data to make your data set more robust?● Are there any fields or features that should be explicitly prohibited from inclusion at the outset of your design process? For

example, are race, gender, and other sensitive characteristics automatically excluded from inputs or are there times when they are acceptable?

Page 39: Data and ethics Training

Digital Decisions--Phases of Automation

Build: Model Construction● Do your rules rely on generalizations or cultural assumptions rather than causal relationships? (Not sure? Ask yourself if

you would feel comfortable if the public saw your stated correlations.) ● Can you use pseudonymization techniques that avoid the needless scoring/targeting of non-suspicious individuals?● Have the tools you are using from libraries been tested for bias? Is there an audited or trustworthy source for the

necessary tools?● Are any of these criteria proxies for race, gender, or other sensitive characteristics? For example, zip code + 4 is often

strongly correlated with racial identity.● How much control of the statistical process is required to prevent your model from relying on proxies for protected

classes?● Are non-deterministic outcomes acceptable given the rules and considerations around transparency and “explainability”

that may be applicable?

Page 40: Data and ethics Training

Digital Decisions--Phases of Automation

Test: ● What is the acceptable error rate before going to market? ● Is the error rate evenly distributed across all demographics?● Identify reason for correlations: what factors are predominant in determining outcomes? ● Are unintended factors or variables correlated with race or other sensitive characteristics? ● Have you specifically tested your process on representative samples from a variety of racial, economic, and other diverse

backgrounds for disparate outcomes?● Are model outputs and algorithmic transactions being sufficiently logged as to enable appropriate diagnostics in the

event of data subject or regulatory challenge?● Is there a process in place for periodic assessments / reviews to ensure that (for dynamic models especially), the

modeling algorithm, features, and data inputs continue to reflect the evolving realities of the marketplace?

Page 41: Data and ethics Training

Digital Decisions--Phases of Automation

Implement:

● What is the impact on individuals of a false positive / false negative?● Is there a way for users to report that they feel they may have been treated unfairly (in order to capture big-picture

trends that may reveal discrimination problems)?● Don’t make claims about the power of the results that are bigger than what the process represents.● Is there a method for human review of model outcomes to minimize false positives?● Where does a human being sit in the analysis process? ● Does a person make a final determination as to an outcome that might negatively affect an individual?

Page 42: Data and ethics Training

Digital Decisions--Phases of Automation

Evaluate and Refine● Does the outcome provide contextual information that helps a user understand how the result was reached or is it a

more opaque output (such as a numerical score)?● Should there be fail-safes in place to ensure that potential systematic bias that may not be otherwise detected does

not have an endlessly compounding effect on consumers?● How does the result of your process feed back into the equation? ● Process any new data or altered logic model with same inquiry as original content.● Is there a person responsible for ensuring that all relevant parts of the institution are involved in creating this process?

For example, to check with relevant internal legal and policy teams as well as external stakeholders when applicable.

Page 43: Data and ethics Training

Digital Decisions--How can you use this?

Page 44: Data and ethics Training