Rake: Semantics Assisted Network-based Tracing Framework

download Rake: Semantics Assisted Network-based Tracing Framework

If you can't read please download the document

description

Rake: Semantics Assisted Network-based Tracing Framework. Yao Zhao (Bell Labs), Yinzhi Cao, Yan Chen, Ming Zhang (MSR) and Anup Goyal (Yahoo! Inc.) Presenter: Yinzhi Cao Lab for Internet and Security Technology (LIST) Northwestern Univ. - PowerPoint PPT Presentation

Transcript of Rake: Semantics Assisted Network-based Tracing Framework

  • Rake: Semantics Assisted Network-based Tracing FrameworkYao Zhao (Bell Labs), Yinzhi Cao, Yan Chen, Ming Zhang (MSR) and Anup Goyal (Yahoo! Inc.)Presenter: Yinzhi CaoLab for Internet and Security Technology (LIST) Northwestern Univ.

  • **Rake: Semantic Assisted Large Distributed System DiagnosisMotivationRelated WorkRakeEvaluationConclusions

  • **MotivationLarge distributed systems involve hundreds or thousands of nodesE.g. search system, CDN Host-based monitoring cannot infer the performance or detect bugsHard to translate OS-level info (such as CPU load) into application performanceApplication log may not be enoughTask-based approach is adopted in many diagnosis systemsWAP5, Magpie, Sherlock

    NameTitle

    NameTitle

    NameTitle

    NameTitle

    NameTitle

    NameTitle

    Team Title

    NameTitle

    Company Name

    Company NameDepartment Name

  • **Example of Message Linking in Search SystemURLURLSearch keywordURLSearch keywordDoc ID

    NameTitle

    NameTitle

    NameTitle

    NameTitle

    NameTitle

    NameTitle

    Team Title

    NameTitle

    Company Name

    Company NameDepartment Name

  • **Task-based Approaches The Critical Problem Message LinkingLink the messages in a task together into a path or treeBlack-box approachesDo not need to instrument the application or to understand its internal structure or semanticsTime correlation to link messagesProject 5, WAP5, SherlockWhite-box approachesExtracts application-level data and requires instrumenting the application and possibly understanding the application's source codesInsert a unique ID into messages in a taskX-Trace, Pinpoint

  • **Problems of White-box and Black-BoxWhite-boxInvasive due to source code modificationBlack-boxRely on time CorrelationAccuracy affected by cross traffic

    0

    1

    2

    3

    4

    0

    1

    2

    3

    4

    5

  • **RakeKey ObservationsGenerally no unique ID linking the messages associated with the same requestExist polymorphic IDs in different stages of the requestSemantic AssistedUse the semantics of the system to identify polymorphic IDs and link messages

  • *Architecture of Rake

  • **Message Linking ExampleURLURLSearch keywordURLSearch keywordDoc ID

    NameTitle

    NameTitle

    NameTitle

    NameTitle

    NameTitle

    NameTitle

    Team Title

    NameTitle

    Company Name

    Company NameDepartment Name

  • **Necessary SemanticsIntra-node linkingThe system semanticsInter-node linkingThe protocol semanticsNodePQRS

  • **Intra-Node LinkingFollow_IDs: The IDs will be in the triggered messages by this messageOne message may have multiple Follow_IDs for triggering multiple messagesLink_ID: The ID of the current messageMatch with Follow_ID previously seen**PQRSLink_IDFollow_ID=Query_ID=Response_ID

  • **Inter-Node LinkingQuery_IDs: The IDs will be in the response messages to this messageThe communication is in Query/Response style, e.g. RPC call and DNS query/response.Response_ID: The ID of the current message to match Query_ID previously seenBy default requires the query and response to use the same socket**PQRSLink_IDFollow_ID=Query_ID=Response_ID

  • Example of Rake Language (IRC)

    TCP 6667 Regular expression PRIVMSG\s+(.*) Same as Link ID No Return ID

  • **Complicated SemanticsThe process of generating IDs may be complicatedXML or regular expression is not good at complex computationsSo let user provide own functionsUser provide share/dynamic librariesSpecify the functions for IDs in XMLImplementation using Libtool to load user defined function in runtime

  • **Example for DNS

    UDP 53 udp[10] & 128 == 0 User Function dns.so Link_ID Link_ID Link_ID ..Extract the queried host

  • **Accuracy AnalysisOne-to-one ID TransformingExamplesIn search, URL -> Keywords -> Canonical formatIn CoralCDN, URL -> Sha1 hash valueIdeally no error if requests are distinctRequest ambiguousnessSearch keywordsMicrosoft search dataLess than 1% messages with duplication in 1sWeb URLTwo real http traces Less than 1% messages with duplication in 1sChat messagesNo duplication with timestamps

  • **Potential ApplicationsSearchVerified by a Microsoft guyCDNCoralCDN is studied and evaluatedChat SystemIRC is testedDistributed File SystemHadoop DFS is tested

  • **EvaluationApplicationCoralCDN HadoopExperimentEmploy PlanetLab hosts as web clientsRetrieve URLs from real traces with different frequencyMetricsLinking accuracy (false positive, false negative)Diagnosis abilityCompared ApproachWAP5

  • **CoralCDN Semantics

  • **Message Linking AccuracyUse Log-Based Approach to Evaluate WAP5 and Rake Linking in CoralCDN

  • **Diagnosis AbilityControlled ExperimentsInject junk CPU-intensive processesCalculated the packet processing time using WAP5 and RakeObviously Rake can identify the slow machine, while WAP5 fails.

  • *Semantics of Hadoop Get operation

  • *Abused IPC Call in HadoopIt is a problem that we found in Hadoop source code.Four getFileInfos are used here, while only one is enough.

    Text

  • *Running time of Hadoop steps

  • **DiscussionImplementation ExperienceHow hard for user to provide semanticsCoralCDN 1 week source code studyDNS a couple of hours Hadoop DFS 1 week source code study

  • **Conclusions of RakeFeasibilityRake works for many popular applications in different categoriesEasinessRake allows user to write semantics via XMLNecessary semantics are easy to obtained given our experience AccuracyMuch more accurate than black-box approaches and probably matches white-box approaches

  • **Q & A?Thanks!

  • *Backup

  • **Utilize Semantics in RakeImplement Different Rakes for Different Application is time consumingLesson learnt for implementing two versions of Rake for CoralCDN and IRCDesign Rake to take general semanticsA unified infrastructure Provide simple language for user to supply semantics

  • **Questions on SemanticsWhat Are the Necessary Semantics?In worst case, re-implement the applicationHow Does Rake Use the Semantics?Nave design is to implement Rake for each application with specific application semanticsHow Efficient Is the Rake with SemanticsCan message linking to accurate?Whats the computational complexity of Rake?

  • **Related WorkInvasivenessApplication Knowledge

    Non-InvasiveInvasiveNetwork Sniffing Interpo-sitionApp or OS LogsSource code modificationBlack-boxProject 5, SherlockWAP5FootprintGrey-boxRakeMagpieWhite-boxX-Trace, Pinpoint

  • *Semantics of Hadoop Grep operation

    *Yao Zhao, Yinzhi Cao, Anup Goyal, Yan Chen and Ming Zhang

    Whether we need to list all authors and affiliations

    ********************CURRY, T. W. Profiling and tracing dynamic library usage via interposition. In Proc. Summer USENIX 94 (June 1994), pp. 267278.

    *****