The Kinect sensor as human-machine-interface in audio ... · and interpret Kinect data streams...
Transcript of The Kinect sensor as human-machine-interface in audio ... · and interpret Kinect data streams...
University of Music and Performing Arts GrazInstitute of Electronic Music and Acoustics
The Kinect sensor as human-machine-interface in audio-visual art projects
Matthias Kronlachner IOhannes m zmolnigStudent
[email protected] [email protected]
'
&
$
%
Introduction
For several years now, the entertainment and gaming industry
has been providing multifunctional and cheap human interface
devices which can be used for artistic applications.
Since November 2010 a sensor called KinectTMfor Microsoft’s
Xbox 360 is available. This input device is used as color camera,
4 channel microphone array, and provides - as an industry-first
- a depth image camera at an affordable price. Soon after the
release of the Xbox-only device, programmers from all over the
world provided solutions to access the data from a multitude of
operating systems running on ordinary computers.
Operating system independent ways are presented to access
and interpret Kinect data streams within Pure Data/Gema.
Overviews of case studies for interactive audio and visual art
projects are given.
All developed software is Open Source and working under Linux,
Mac OS X and Windows.
aPure Data is a visual programming environment used for computermusic and interactive
multimedia applications. Gem stands for Graphics Environment for Multimedia and extends
Pure Data to do realtime OpenGL based visualizations.
Kinect streams in Pure Data/Gemusing pix_freenect, pix_depth2rgba, pix_threshold_depth
gemhead
pix_texture pix_texture
gemhead
gemhead for rgb and for depth
rgb
pix_freenect 0 1 1
pix_separator
pix_texture
depth-raw
color-gradient mapping
angle $1
rectangle 2 1.5 rectangle 2 1.5 rectangle 2 1.5
pix_separator
t a a a
pix_threshold_depth
300 1200
pix_texture
rectangle 2 1.5
separator
separator
separator
translateXYZ 2 0 0 translateXYZ 6 0 0translateXYZ -6.2 0 0 translateXYZ -2.1 0 0
pix_depth2rgba 7000 pix_depth2rgba 1220
background subtraction
2
Fig. 1: RGB stream, raw depth stream, color gradient mapping, background subtraction
Representation of depth data
A solution for handling depth data is proposed using RGBA or
YUV colorspace.
For RGBA output the 16 bit depth data is divided into the upper
and lower eight significant bits and stored in the red (R) and
green (G) channel. The blue channel (B) is used for additional
information about the pixel. For example, if a user is present in
that specific pixel, the specific user-id is set.
R G B A
3/8 msb 8 lsb 0 or userid (OpenNI) 255
YUV422 (2 bytes per pixel)
11 bit/16bit depth values
For development and visualization purposes pix_depth2rgba is
used to map distance values onto a color gradient (Fig. 2).
Fig. 2: Gradient for displaying depth data - near to far
Various methods using CPU or GPU processing power can be
used to filter out certain regions of the depth image and use them
for tracking or as projection stencil.
Skeleton tracking with pix_openni
pix_openni gives access to higher level functions as multiple user, hand
and skeleton tracking (Fig. 3). The output rate depends on the framer-
ate of the depth sensor, appr. 30 Hz.
gemhead gemhead
tracking and info output
pd DRAW-SKELETON
pix_openni 0 1 1 1 0
pd draw-images
Fig. 3: Displaying skeleton retrieved by pix openni
'
&
$
%
vertimas - ubersetzen
The piece ubersetzen - vertimas for dancer, sound and projection features the
Kinect sensor to translate body movements on stage into sound and turns the
dancers body into a hyperinstrument. Additionally, the depth video stream
is used to gather the outline of the dancer and project back onto her body in
realtime (Fig. 4).
Therefore an data-flow filtering and analysis library has been developed,
providing quickly adjustable methods for translating tracking data into sound
or visual control data (Fig. 5).
Fig. 4: stage setup vertimas - ubersetzen
t a a
motion/median-3
get movement relative to body!
motion/gate_thresh 1 1
motion/rcv_2joints user1 r_foot torso 1 /skeleton/joint
s /sampler/15/player/15/gain s /sampler/15/player/15/speed
/ 2.1
motion/change-3d
motion/attack-release 100 1000
t f f
motion/comp-subtract2
motion/comp-subtract2
pd fit-range
filter zeros
get difference to last coordinates
get vector lengthmotion/comp-3d-length
truncate small values
3 point median filter
ar-envelope
retrieve r_foot and torso
Fig. 5: translating skeleton data to control data
IEM Computermusic Ensemble
ICE is a group of electronic musicians, each playing with a notebook and in-
dividual controllers. The target is to play contemporary music, adapted or
written for computermusic ensembles.
In March 2012 a network concert between Graz and Vilnius took place. One
Kinect sensor was used to track the skeleton of two musicians. The tracking
data allowed each musician to play his virtual instrument without handheld
controllers. Additionally the Kinect video stream showing the stage in Vilnius
was sent to Graz and projected on a canvas for the remote audience. (Fig. 6)
Kinect and Video Server
Instrument/Audio Server
Audience & stage Graz
OSC
AudienceCamera
RGB+depth
2 musician playing with one Kinect
GÉANTACOnet, LITNET
screen
4 speakers 1st order
Ambisonics
tracking data
Internet
to/from Graz
Fig. 6: ICE stage setup Vilnius
Head pose estimation
Based on a paper and software by Gabriele Fanelli[2], the external
pix_head_pose_estimation has been developed which takes the depth map
of the Kinect as input and estimates the Euler angles and position of multiple
detected heads in the depth map. The head tracking can be used to rotate an
Ambisonics soundfield for monitoring 3D sound environments via headphones
(Fig. 7).
RotationAmbisonics
Input
Kinect Head Pose Estimation
AmbisonicsBinauralDecoder
Headphones
R
L
Fig. 7: rotating sound field according to head movements
References
[1] M. Kronlachner, The Kinect distance sensor as human-machine-
interface in audio-visual art projects, project report, Institute of Elec-
tronic Music and Acoustics, Graz, 2012.
[2] G. Fanelli & T. Weise & J. Gall & L. Van Gool, Real Time
Head Pose Estimation from Consumer Depth Cameras, DAGM’11.
'
&
$
%
Examples and source code available
from the homepage!
www.matthiaskronlachner.com