Traceable AI - fedvte.usalearning.gov
Transcript of Traceable AI - fedvte.usalearning.gov
Traceable AI
Table of Contents
Notices ............................................................................................................................................ 2
Traceable ......................................................................................................................................... 3
Traceable ......................................................................................................................................... 4
AI transparency ............................................................................................................................... 5
Data transparency ........................................................................................................................... 7
Datasheets for Datasets (markdown) ............................................................................................. 8
Datasheets for Datasets .................................................................................................................. 9
AI auditability for Cyber ................................................................................................................ 10
Explainable AI (XAI) – DARPA ........................................................................................................ 11
Explainable AI (XAI) Program ........................................................................................................ 12
Traceable – Review ....................................................................................................................... 13
Notices
119
Copyright 2020 Carnegie Mellon University.
This material is based upon work funded and supported by the Department of Homeland Security under Contract No. FA8702-15-D-0002 with Carnegie Mellon University
for the operation of the Software Engineering Institute, a federally funded research and development center sponsored by the United States Department of Defense.
The view, opinions, and/or findings contained in this material are those of the author(s) and should not be construed as an official Government position, policy, or decision,
unless designated by other documentation.
NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS.
CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT
LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL.
CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.
[DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use
and distribution.
Internal use:* Permission to reproduce this material and to prepare derivative works from this material for internal use is granted, provided the copyright and “No Warranty”
statements are included with all reproductions and derivative works.
External use:* This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission.
Permission is required for any other external and/or commercial use. Requests for permission should be directed to the Software Engineering Institute at
* These restrictions do not apply to U.S. government entities.
DM20-0577
**119 Instructor: Let's talk about
Traceable
C I S A | C Y B E R S E C U R I T Y A N D I N F R A S T R U C T U R E S E C U R I T Y A G E N C Y
Traceable
61
**061 traceable AI.
Traceable
AI capabilities will be developed and deployed such that relevant personnel
possess an appropriate understanding of the technology, development
processes, and operational methods applicable to AI capabilities, including with
transparent and auditable methodologies, data sources, and design
procedure and documentation.
Traceable
62
**062 AI capabilities will be
developed and deployed such that
relevant personnel possess an
appropriate understanding of the
technology, development, processes
and operational methods applicable
to AI capabilities, including with
transparent and auditable
methodologies, data sources and
design procedures and
documentation. I realize that is quite
a bit to cover in one slide, and I'll be
getting into the details of this.
AI transparency
Not “unknowable”
Impart confidence and trust to personnel
Technology and methodologies are explained appropriately
Access provided to details
Rationale for decisions and recommendations are provided
AI transparency
63
**063 So AI transparency is about
understanding how the system
works. These AI systems are often
described as not knowable, that they
are unknowable, and that is not what
we want these systems to be. We
want these systems to be
understandable, to be knowable.
Another term that is used is black
box. I'm not going to be using that
term today because it is based on
racist imagery, but that is another
term that you'll commonly hear, and
people use this as a way of talking
about how complex these systems
are. But they don't have to be that
complex. They can be designed to
be knowable.
We want to impart confidence and
trust to personnel who are using
these systems, and the way to do
that is by giving them more
transparency into the system, how
the system is working, what the
system is doing.
Technology and methodologies need
to be explained appropriately and
access needs to be provided to those
details, and again, this is as
appropriate for the individual, their
permissions and their access, and
those types of considerations need to
be considered.
The rationale also needs to be
provided for decisions and
recommendations that the system is
making so that people really
understand what is going on. This is
the transparency aspect that will help
to make really good systems.
Data transparency
Understand data
Provenance
Creator’s motivation, composition and collection
Transparency improves with use of:
Datasheets for Datasets*
Model Cards for ML systems
Data transparency
64
*Datasheets for Datasets. Working Paper by Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer
Wortman Vaughan, Hanna Wallach, Hal Daumé III, Kate Crawford https://arxiv.org/abs/1803.09010
**064 Data transparency is also
important. This is about making sure
that people understand the data and
the information about the data. So
what is the provenance of the data?
What are the creator's motivations,
composition, and how did they even
collect the data? These types of
pieces of information can improve
trust of the system and help people
to understand the parameters of the
system and can help them also to
understand the limitations of the
system.
Transparency will improve with the
use of a variety of different support
pieces that are being developed by
researchers right now. So datasheets
for datasets, and I will talk about that
in more detail today, as well as
model cards for ML systems. Both
can provide methods of describing
the information that is being
presented, describing the data, so
that people have a clear idea of what
that information is, how it was
collected, how they can use it, and
what it was created for, what the
purpose was in the creation of that.
Datasheets for Datasets (markdown)
Datasheets for Datasets (markdown)
65
“markdown-datasheet-for-datasets” Josh Meyer.
GitHub: https://github.com/JRMeyer/markdown-datasheet-for-datasets/blob/master/DATASHEET.md
**065 And with datasheets for
datasets, it is basically a set of
questions that you answer as you're
thinking about your data, and this is
an example of what a datasheet
might look like. This markdown was
created by Josh Meyer, based on the
initial paper about datasheets for
datasets, and you can see some of
the questions.
For what purpose was this dataset
created? Just really being very
explicit about what this data is and
why it is in existence.
Datasheets for Datasets
Transparency and clarity
Motivation
Composition
Collection
Preprocessing / Cleaning / Labeling
Uses
Distribution
Maintenance
Datasheets for Datasets
66
Datasheets for Datasets. Working Paper by Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer
Wortman Vaughan, Hanna Wallach, Hal Daumé III, Kate Crawford https://arxiv.org/abs/1803.09010
**066 And there are a variety of
different sets of questions in
datasheets for datasets. They
include motivation, composition,
collection, preprocessing, cleaning
and labeling, uses, distribution and
maintenance, and all of these have a
set of questions with them that will
help your team to better describe
that data and this will then provide
more transparency to the people
using and accessing that data.
AI auditability for Cyber
Probe with hypothetical cases
Checks for bias, brittleness or potential distribution shift
Access to history of system operation
Logs
User access*
Records of user and purpose*
Mitigate harms of off-label use
Reinforce principle of Responsibility
AI auditability for Cyber
67
*Consider ethical principles when determining what data needs to be collected.
**067 AI needs to be auditable for cyber to
be helpful for individuals who are
using this technology. We need to
be able to probe the system with
hypothetical cases. We need to be
able to check for bias and brittleness
and potential distribution shift within
the data. We need to be able to
access the history of the system's
operation, and we need to keep logs,
in many cases, and those logs we
need to be careful with because there
can be ethical considerations that
need to be addressed.
So, for example, with the user's
access. How much information do
we really need to collect about the
user's access? With records of users
and purposes, again, how much do
we need to collect to be safe and to
be auditable, and how much would
be more than is necessary? We need
to also still be protecting our users.
We need to mitigate the harms of
off-label use. If someone is taking
data that we've created and using it
for a different purpose or taking the
entire AI system and using it for a
different purpose, how can we
reduce the harms that are
potentially--could come from that
use?
And then we need to reinforce the
principles of responsibility that we've
already talked about.
Explainable AI (XAI) – DARPA
**068 Explainable AI is an effort by
DARPA to really dig in deeper into
this problem and make systems that
are transparent and provide the
answers to these types of questions.
Why did you do that? Why not
something else? What did you--why
did you--when did you fail? What is
going on with the system? Really
understanding the basics about the
system helps to engender trust to the
users and generally is a helpful way
to make a system a system that is
understandable and explainable to
the end user.
Explainable AI (XAI) Program
Aims to create a suite of ML techniques that:
Produce more explainable models, while maintaining a high level of
learning performance (prediction accuracy); and
Enable human users to understand, appropriately trust, and effectively
manage the emerging generation of artificially intelligent partners.
Users can
Interpret what the AI system did
Understand AI system’s limitations
Explainable AI (XAI) Program
69
**069 And this program with DARPA
aims to create a suite of machine
learning technologies that produce
more explainable models while
maintaining a high level of learning
performance, so it will not reduce the
prediction accuracy. We need to,
well, DARPA needs to enable human
users to understand, appropriately
trust, and effectively manage the
emerging generation of artificially
intelligent partners. These are the
goals of this project that DARPA is
doing. They want users to be able to
interpret what the AI system did and
understand the AI system's
limitations, and again, this is by
design. They're designing these
systems to do that work and
encouraging others to do the same.
Traceable – Review
Relevant personnel possess an appropriate understanding
of the technology, development processes, and operational methods…
Transparency for Cyber
Data Transparency
Auditability for Cyber
Explainable AI – DARPA
Traceable – Review
70
**070 So to review, with traceable
technology, relevant personnel
possess an appropriate
understanding of the technology,
development processes and
operational methodologies, and this
is something that we can do. This is
a lot of work. Making systems
transparent is a lot of work, but the
benefit is that we get the trust of the
users, our personnel, and people who
are using the system, accessing the
data, feel that the system is
transparent, and therefore they are
more likely to trust the system
appropriately.
If it's auditable, we can actually track
what has happened and how it has
been used, and we want to make
systems that are explainable and
understandable, and we can use
DARPA's guidance as one way of
improving those systems.