Dmk audioviz

download Dmk audioviz

of 103

  • date post

    22-Jun-2015
  • Category

    Documents

  • view

    565
  • download

    0

Embed Size (px)

Transcript of Dmk audioviz

  • 1. WEAPONS GRADE AUDIO VISUALIZATION (and other pattern stunts) Dan Kaminsky Director Of Penetration Testing IOActive Inc.

2. Introduction 3. Alas Very easy to do the pretty Very hard to do the useful Two major categories of audio visualization Direct rendering of intensity or spectrum Pretty morphing shapesthatuhkinda guide the imagery. But not really. Can we do better? 4. Another Approach: Dotplots 5. Useful for various domains Started in genomics genes are the worst protocol the world has ever seen Ive been using it for analyzing all sorts of things Code Books Law Video Originally saw a paper applying to audio Could I make a WinAMP plugin that does this? 6. What Exactly Are We Doing Jonathan Helmans DotPlot Patterns: A Literal Look at Pattern Languages offers an introduction Instead of to, be, not etc, we use chunks of data from arbitrary files Instead of demanding perfect equality, we measure how similar the chunks are If most of the bytes are in most of the same places, its pretty similar, if most are different, pretty dissimilar 7. Demo: LudiVu 8. Intro to LudiVu Realtime spectral analyzer compares what youre listening to now with what youve been listening to the last few seconds Really simple similarity metric: Split available spectrum into three bands Bass: Red Midrange: Green Treble: Blue Take difference between source band and dest band. If very similar, add sim. Else sub. 9. Infrastructure Built on top of AVS WinAMP Advanced Visualization Studio OpenCV, hacker style lots and lots of image manipulation algorithms, stackable and hackable Very easy to alter the framebuffer after the fact 10. What are we getting? Two independent outputs One: When the same signal repeats, we can see it as a line Visual autocorrelation Two: Even if signals do not repeat, the shapes they form create a sort of visual hash Might be possible to do larger scale mapping, because the viz is the same The thought: This roughly feels like seeing as we hear Temporal in Audio = Spatial in Visual 11. Two Primary Modes Visual Hashes vs. Similarity Sequences are overlaid Vertical white lines on top of RGB layout Adding motion blur highlights similarity (white lines) at expense of visual hashes Actual mechanism: Leave n% of the old image around 12. Chemical Brothers 13. So Why Weapons Grade? This was just supposed to be a toy Then my friend suggested Why dont you try running this on Audio CAPTCHAs? Sequences of numbers spoken over noise There are only ten numbers Can I see the repeated numbers I hear? Yup 14. Repeated Digit 15. What about other domains? (Nine Inch Nails, Closer) 16. More Video Analysis: Cibo Matto / Michel Gondrys Palindromatic Sugar Water 17. Weve figured out what some of these patterns meanbut code 18. But some code just comes out strange. 19. Dotplots for Security / Code Analysis? A) Format Identification 1) Do different files appear different, and does the appearance reflect the existence of internal structure? 2) Do different instances of the same file format appear similar? 3) Does one format embedded in another make itself apparent? B) Fuzzer Guidance 1) Can we locate the actual byte offsets where one section ends and another begins? 2) Can we visualize and compare fuzzer operations via Dotplots? 20. Format Identification 1) Do different files appear different, and does the appearance reflect the existence of internal structure? 2) Do different instances of the same file format appear similar? 3) Does one format embedded in another make itself apparent? 21. Java Class Files 22. .NET Assemblies 23. CNNs Home Page 24. SMBTorture Traffic (Packets Note, Stop/Start Is Visible) 25. Kernel32.dll 26. Chromosome 22 (This is, after all, a genomics hack) 27. The Legend Of Zelda 28. Format Identification 1) Do different files appear different, and does the appearance reflect the existence of internal structure? Answer: Yes. They do. 2) Do different instances of the same file format appear similar? 3) Does one format embedded in another make itself apparent? 29. Books from Project Gutenberg: Consistent Despite Englishs low information content, lack of even mildly related strings causes little self-similarity across symbol clusters 30. US Code: Moderately Consistent Legalese is a massively structured dialect. Symbols appear in very distinct patterns that are more reminiscent of machine code than text. 31. HTML: Consistent HTML repeats smaller symbols (tags) and larger symbol clusters (via template engines) regularly. This shows up visually as a tightly repeating pattern. 32. Java Class Files (Compared): Mildly Consistent Binary code (be it bytecode or x86) tends to be very structured. Still, we are dependent on both the content and the compiler to generate distinct patterns. 33. x86: Consistent (In Sections) x86 tends not to be handwritten; as such complex instructions are emitted in a highly structured form. 34. Exception? 64 kilobyte graphical demonstration Run through a packer Compression removes patterns 35. NES Games 6502 Assembly Tends To Show Consistent Patterns, But 36. Mario Games Look Rather Different. 1) Output is highly dependent on the compiler 2) Output is highly dependent upon the actual content File formats are merely shells for actual content. You are analyzing the content; the format is just syntactic sugar. 37. Format Identification 1) Do different files appear different, and does the appearance reflect the existence of internal structure? Answer: Yes. They do. 2) Do different instances of the same file format appear similar? Answer: Somewhat. Similar content looks like itself, but youre measuring the fundamental entropy of the underlying content, not the format of the content itself. 3) Does one format embedded in another make itself apparent? 38. File Formats Contain Multiple Subformats Another Look At Kernel32.DLL These are all different parts of Kernel32. 39. Quickly Browsing Large Files: Tilt-Shift View Instead of measuring absolute Y against absolute X, make X relative Advance through the file going down, look back a number of bytes going right 40. Complain All You Want. Hex Still Sucks. 41. Format Identification 1) Do different files appear different, and does the appearance reflect the existence of internal structure? Answer: Yes. They do. 2) Do different instances of the same file format appear similar? Answer: Somewhat. Similar content looks like itself, but youre measuring the fundamental entropy of the underlying content, not the format of the content itself. 3) Does one format embedded in another make itself apparent? Answer: Yes. Multiple, distinct sections are clearly visible in a way that hex cannot show. 42. Fuzzer Guidance 1) Can we locate the actual byte offsets where one section ends and another begins? Why would we want to? Fuzzers break parsers. Many subformats to a format, many subparsers to a parser To a rough level of approximation, fuzzing a single subformat lets you stress a single subparser So once we split a file up, we can selectively attack one subparser at a time. 2) Can we visualize and compare fuzzer operations via Dotplots? 43. Simple Math We select an interesting blob from kernel32.dll. The blob is at pixel offset 507x507, and is a square around 570 pixels wide. Window size on viz was 32. 507*32 = The interesting section starts 16224 bytes into the file. 570*32 = The interesting section is 18240 bytes long. 44. Whats The Actual Data? dd if=kernel32.dll bs=1 skip=16100 | hexdump - | more 45. Using Hardcorr as a first knife to locate interesting-to-fuzz regions 46. Fuzzer Guidance 1) Can we locate the actual byte offsets where one section ends and another begins? Answer: Yes. We can quickly route from the image to the byte offset, through basic arithmetic. 2) Can we visualize and compare fuzzer operations via Dotplots? 47. Differentials Major use of dotplots in bioinformatics is to compare one genome against another Autocorrelation: Compare A to A Cross-Correlation: Compare A to B Most files are sufficiently dissimilar that not very interesting structure shows up Notable exception: Different versions of the same binary 48. Visual Bindiff! 49. MSVCR70.DLL v. MSVCR71.DLL 50. Fuzzers: Very Broken Patchers Mangle.C Single Bit Differences CFG9000 Large Scale Reordering 51. Fuzzer Guidance 1) Can we locate the actual byte offsets where one section ends and another begins? Answer: Yes. We can quickly route from the image to the byte offset, through basic arithmetic. 2) Can we visualize and compare fuzzer operations via Dotplots? Answer: Yes visual diffing effectively shows differences between files, including differences introduced by various flavors of fuzzers. 52. Other Structural Analysis Mechanisms Many physicists would agree that, had it not been for congestion control, the evaluation of web browsers might never have occurred. In fact, few hackers worldwide would disagree with the essential unification of voice-over-IP and public private key pair. In order to solve this riddle, we confirm that SMPs can be made stochastic, cacheable, and interposable. Rooter: A Methodology for the Typical Unification of Access Points and Redundancy 53. That was BS. That also got accepted into a con. Automatically generated from a context free grammar Ive been working too hard all these years Be quiet, or I will replace you with a very small shell script This talk is a bit of a remix Patterns and symbols are interesting me as of late Automatic determination of both is difficult, interesting, and unsolved Integration into human symbolic systems promises particularly interesting results So were going to explore a bit. 54. Language Is Cool Language: A p