Mphil Assignment

31
Assignment on Paper 1 : Research Methodology “HOW TO PREPARE THE RESEARCH REPORT” M. Phil. (Computer Sceince) Submitted to: Submited by : Dr. Prateek Sharma Mr. Chetan Nagar

Transcript of Mphil Assignment

Assignment on

Paper 1 : Research Methodology

“HOW TO PREPARE THE RESEARCH REPORT”

M. Phil. (Computer Sceince)

Submitted to: Submited by :

Dr. Prateek Sharma Mr. Chetan Nagar

THE GLOBAL OPEN UNIVERSITY, NAGALAND

A. Preliminary section

1. Title page: Some basic considerations

The title page usually includes:

o The name of the topic o The name of the author o The relationship of the report to a course or degree requirement o The name of the institution where the report is submitted o The date and place of the presentation

Any research work starts with a title that will almost certainly change before the research is completed and reported. It is very wise, therefore, to think of an effective title that will be finally adopted. So it is a good idea to keep notes of alternative titles or ideas as you proceed in preparing and writing the research report. The title should catch the readers’ attention while informing them about the main thesis of the study. First impressions are strong and can attract attention. The title should be concise and should give a precise indication of what is to come. It should not claim more than what the study actually delivers. The title should be typed in capital letters, single spaced and centered between the right and left margins of the page.

2. Acknowledgement (if any)

An acknowledgement page is included if the writer has received unusual assistance in the conduct of the study. The author gives credit for external support received during the conduct of the study. Acknowledgement also expresses gratitude for the use of copyrighted or otherwise restricted materials. A doctoral candidate may choose to dedicate the dissertation to a person(s) who has had significant impact on his work.

3. Table of contents:

A good table of contents serves as an important purpose in providing an outline of the content of the report. The relationship between principal and minor divisions is indicated by capitalization of chapter numbers and titles, with subheadings in small letters and with capitalized principal letters.

B. Main body of the report

1. Introduction

As in the proposal, the introduction presents the problem addressed by the research.

Gives sufficient background information to allow readers to understand the results of the study.

It is written in such a way that readers will know the current status of research conclusions on the topic, the theoretical implications associated with the results of previous research on the subject, and the statement of a hypothetical resolution of the issues to be tested by the research described.

As in the proposal, the introduction should describe the nature and purpose of the study, present the guiding research questions, and explain the significance of and justification for conducting the study. Terms likely to be used throughout the paper should be defined in this section.

A statement of objectives is included and a research hypothesis

2. Review of Related literature

A literature review must be organized in relation to research topic you are developing. In the process you should synthesize results into a summary of what is and is not known; identify areas of controversy in the literature; formulate questions that need further research.

3. Materials and Methods (Methodology)

The methodology section is used to describe what the researcher did and how the study was conducted. One important purpose is to enable others repeat the experiment and verify the results if they wish to. In doing so, you should summarize the procedures in the execution of each of the stage of your work. This section should build on the description of methods outlined in the proposal. You should label subsections similar to those in the proposal. It may include subsections describe participants or subjects, another describing testing or measurement procedures undertaken with the participants, and a section describing limitations of the methodology. These are all done in the past tense or past perfect tense.

This section should present the following:

1. Procedures used and kind of design 2. Sources of data 3. Methods of gathering data 4. Description of data gathering instruments used

4. Analysis of data/Results

This section summarizes the data collected and details the statistical treatment of that data.

Present your results in a logical sequence using only observations pertinent to your stated objectives.

After a brief statement of the main results or findings of the study, the data are reported in sufficient detail to justify the conclusions.

Tables and illustrations may be used to report data when these methods are seen to present the data more clearly and economically.

Do not replicate observations in your tables. Give only means and measures of variability.

Use tables to present exact values and figures to show trends and relationships.

All tables and illustrations should be mentioned in the text, with appropriate titles or captions and enough explanations to make them readily identifiable.

Avoid repetition of numerical data from the tables and figures in the text.

5. Discussion

This section should reflect the implications of the study. Here the researcher evaluates the data and interprets the findings in the context of the research questions or hypothesis. He is guided by questions like the following.

What do my results mean and what are their implications? Should interpret your results clearly, concisely and logically. For

each objective, describe how your results relate to meeting the objectives.

Here, the major results are picked or summarized, evaluate, and interpreted with respect to the original research questions and hypotheses and related with previous works.

Theoretical and practical consequences of the results and the validity of conclusions may appropriately be discussed in this section.

The limitations of the study and suggestions for future work may also be included.

Emphasize on new results and suggest new lines of work or further research.

6. Conclusions and Recommendations

In this section you should describe briefly what you did, the main results and recommendations for further research or applicability. Implications what the findings of the research imply (consider suggestions).

7. References

At the end of your report you need to list all the sources cited in the text. Details regarding citations and references are given part four.

BIBLIOGRAPHIC CITATIONS

Introduction

The principle of fairness and the role of personal recognition within the reward system of science account for the emphasis given to the proper allocation of credit. In the standard scientific paper, credit is explicitly acknowledged in three places: in the list of authors, in the acknowledgments of contributions from others, and in the list of references or citations. Conflicts over proper attribution can arise in any of these places. Citations serve many purposes in a scientific paper. They acknowledge the work of other scientists, direct the reader toward additional sources of information, acknowledge conflicts with other results, and provide support for the views expressed in the paper. More broadly, citations place a paper within its scientific context, relating it to the present state of scientific knowledge. Failure to cite the work of others can give rise to more than just hard feelings. Citations are part of the reward system of science. They are connected to funding decisions and to the future careers of researchers. More generally, the misallocation of credit undermines the incentive system for publication. In addition, scientists who routinely fail to cite the work of others may find themselves excluded from the fellowship of their peers. This consideration is particularly important in one of the more intangible aspects of a scientific career-that of building a reputation. Published papers document a person's approach to science, which is why it is important that they be clear, verifiable, and honest. In addition, a researcher who is open, helpful, and full of ideas becomes known to colleagues and will benefit much more than someone who is secretive or uncooperative.

Features of citations

(a) Footnoting

Footnotes are very useful devices because they serve a number of purposes

They enable you to substantiate your presentation by citing other authorities

They also enable you to present explanatory statements that would interfere with the logic of your text

Traditionally, footnote citations are placed at the bottom of the page

They are separated from the text by a horizontal line from the text margin.

(b) Abbreviations

o You may use abbreviations in bibliographic and footnote citations if you want to conserve space. Examples: bk., bks. = book, books.

(c) Bibliography (Reference/Literature Cited)

Points to consider in preparing the references:

o The reference list at the end of the paper should list all works cited in the paper, and all items listed as references must have been cited in the text.

o Special attention should be given to ensure appropriate citations of less common sources, such as unpublished manuscripts.

o There are many ways of presenting the bibliography but be accurate and consistent in the way you list

o Follow guidelines required by the particular journal, proceeding, etc. They do have their own style of citations.

o Citing a source without having read/seen the original can lead to embarrassment and loss of credibility if the secondary source from which you gained the information is in error.

o Again, the APA Manual can provide guidance for ensuring accuracy in these details.

o General rule: Author (s). Year of Publication. Title of Work. Publication data.

(i) In-text references (citations)-References are citations of other works such as books, journal articles, or private communications. References in text are treated somewhat differently from references in the complete list at the end of a paper.

Use the author-date format to cite references in text. For example: as Smith (1990) points out,

For two-author citations, spell out both authors on all occurrences.

For multiple-author citations (up to five authors) name all authors the first time, then use et al., so the first time it is Smith, Jones, Pearson and Sherwin (1990), but the second time it is Smith et al., with a period after “al” but no underlining.

For six or more authors, use et al. the first time and give the full citation in references.

Include page reference after the year, outside quotes but inside the comma, for example: The author stated, “The effect disappeared within minutes” (Lopez, 1993, p. 311) , but she did not say which effect. Another example would be: Lopez found that “the effect disappeared within minutes” (p. 311). Notice also that the sentence is capitalized only if presented after a comma, as a complete sentence.

If two or more multiple-author references which shorten to the same “et al.” form, making it ambiguous, give as many author names as necessary to make them distinct, before et al. For example: (Smith, Jones, et al., 1991) to distinguish it from (Smith, Burke, et al., 1991).

Join names in a multiple-author citation with and (in text) or an ampersand (&) in reference lists and parenthetical comments. For example: As Smith and Sarason (1990) point out, the same argument was made by in an earlier study (Smith & Sarason, 1990).

If a group is readily identified by its initials, spell it out only the first time. For example, “As reported in a government study (National Institute of Mental Health [NIMH], 1991), blah blah...” and thereafter, “The previously cited study (NIMH, 1991) found that...

If the author is unknown or unspecified, use the first few words of the reference list entry (usually the title), for example: (“Study Finds,” 1992).

If citing multiple works by the same author at the same time, arrange dates in order. In general, use letters after years to distinguish multiple publications by the same author in the same year. For example: Several studies (Johnson, 1988, 1990a, 1990b, 1995 in press-a, 1995 in press-b) showed the same thing.

For old works cite the translation or the original and modern copyright dates if both are known, for example: (Aristotle, trans. 1931) or (James, 1890/1983).

Always give page numbers for quotations, for example: (Cheek & Buss, 1981, p. 332) or (Shimamura, 1989, chap. 3, p. 5).

For e-mail and other “unrecoverable data” use personal communication, for example: (V.-G. Nguyen, personal communication, September 28, 1993). These do not appear in the reference list.

Abbreviating within a reference

Here are approved abbreviations for use in a reference list:

chap. for chapter ed. for edition rev. ed. for revised edition 2nd ed. for second edition Ed. for Edited by (Eds.) for multiple editors Trans. for Translated by p. for page number, with a space after the period pp. for page numbers in encyclopaedia entries, multi-page

newspaper articles, chapters or articles in edited books, but not in journal or magazine article citations, where numbers alone should be used (see examples of reference formats).

Vol. for Volume vols. for volumes No. for Number

Pt. for Part Suppl. for Supplement, Tech. Rep. for Technical Report

Quotations: When a direct quotation is used, always include the author, year, and page number as part of the citation.

A. A quotation of fewer than 40 words should be enclosed in double quotation marks and should be incorporated into the formal structure of the sentence. Consider the following example:

Patients receiving prayer had “less congestive heart failure, required less diuretic and antibiotic therapy, had fewer episodes of pneumonia, had fewer cardiac arrests, and were less frequently incubated and ventilated” (Byrd, 1988, p. 829).

B. A lengthier quotation of 40 or more words should appear (without quotation marks) apart from the surrounding text, in block format, with each line indented five spaces from the left margin.

(ii) Lists of References

General Rule:

Pagination: The References section begins on a new page. Format: The references lists are organized alphabetically by

surnames of first authors. Most reference entries have three components: Authors: Authors are listed in the same order as specified in the

source, using surnames and initials. Commas separate all authors. When there are seven or more authors, list the first six and then use “et al.” for remaining authors. If no author is identified, the title of the document begins the reference. The first author always starts with its surname followed by initials. The rest of authors are listed following the first author either starts by initials followed by surnames or vice versa.

Year of Publication: In parentheses following authors, with a period following the closing parenthesis. Or without parenthesis following authors, with a period following authors and after it. If no publication date is identified, use “n.d.” in parentheses or without parenthesis following the authors.

Source Reference: Includes title, journal, volume, pages (for journal article) or title, edition, city of publication, publisher (for book). [Note: Italicize titles of books, titles of periodicals, and periodical volume numbers.]

Assignment on

Paper 2 : Recent Advances in Computer Science

“APPLICATION VIRTUALIZATION

And

DESKTOP & SERVER VERTULIZATION”

M. Phil. (Computer Sceince)

Submitted to: Submited by:

Dr. Prateek Sharma Mr. Chetan Nagar

THE GLOBAL OPEN UNIVERSITY, NAGALAND

Definition

Application virtualization is an umbrella term that describes software technologies that improve portability, manageability and compatibility of applications by encapsulating them from the underlying operating system on which they are executed. A fully virtualized application is not installed in the traditional sense[1], although it is still executed as if it is. The application is fooled at runtime into believing that it is directly interfacing with the original operating system and all the resources managed by it, when in reality it is not. Application virtualization differs from platform virtualization in that in the latter case, the whole operating system is virtualized rather than only specific applications.

Description

Limited application virtualization is used in modern operating systems such as Microsoft Windows and Linux. For example, IniFileMappings were introduced with Windows NT to virtualize (into the Registry) the legacy INI files of applications originally written for Windows 3.1.[2] Similarly, Windows Vista implements limited file and Registry virtualization so that legacy applications that try to save user data in a system location that was writeable in older versions of Windows, but is now only writeable by highly privileged system software, can work on the new Windows system without the obligation of the program having higher-level security privileges (which would carry security risks).

Full application virtualization requires a virtualization layer. This layer must be installed on a machine to intercept all file and Registry operations of virtualized applications and transparently redirect these operations into a virtualized location. The application performing the file operations never knows that it's not accessing the physical resource it believes it is. In this way, applications with many dependent files and settings can be made portable by redirecting all their input/output to a single physical file, and traditionally incompatible applications can be executed side-by-side. Examples of this technology for the Windows platform are Ceedo, InstallFree, Citrix XenApp, Novell ZENworks Application Virtualization, Endeavors Technologies Application Jukebox, Microsoft Application Virtualization, Software Virtualization Solution, and VMware ThinApp.

A common misconception is that a runtime environment is application virtualization. However a runtime layer is required for an application to be able to execute, while a virtualization layer is not.

Technologies

Technology categories that fall under application virtualization include:

Application Streaming . The application is delivered in a package, that may include a subset of OS files and configuration settings. Running the package requires the installation of a lightweight client application. Packages are usually delivered over a protocol such as HTTP or RTSP. Application virtualization is commonly paired with application streaming to deliver applications on demand.

Desktop Virtualization /Virtual Desktop Infrastructure (VDI). The application is hosted in a VM or blade PC that also includes the operating system (OS). These solutions include a management infrastructure for automating the creation of virtual desktops, and providing for access control to target virtual desktop. VDI solutions can usually fill the gaps where application streaming falls short.

Benefits of application virtualization

Allows applications to run in environments that do not suit the native application (e.g. Wine allows Microsoft Windows applications to run on Linux).

May protect the operating system and other applications from poorly written or buggy code.

Uses fewer resources than a separate virtual machine.

Run applications that are not written correctly, for example applications that try to store user data in a read-only system-owned location.

Run incompatible applications side-by-side, at the same time and with minimal regression testing against one another.

Maintain a standard configuration in the underlying operating system across multiple computers in an organization, regardless of the applications being used, thereby keeping costs down.

Implement the security principle of least privilege by removing the requirement for end-users to have Administrator privileges in order to run poorly written applications.

Simplified operating system migrations.

Accelerated application deployment, through on-demand application streaming.

Improved security, by isolating applications from the operating system.

Enterprises can easily track license usage. Application usage history can then be used to save on license costs.

Fast application provisioning to the desktop based upon user's roaming profile.

Allows applications to be copied to portable media and then imported to client computers without need of installing them.

Limitations of application virtualization

Not all software can be virtualized. Some examples include applications that require a device driver and 16-bit applications that need to run in shared memory space.

Some types of software such as anti-virus packages and application that require heavy OS integration, such as Windowblinds or StyleXP are difficult to virtualize.

Only file and Registry-level compatibility issues between legacy applications and newer operating systems can be addressed by application virtualization. For example, applications that don't manage the heap correctly will not execute on Windows Vista as they still allocate memory in the same way, regardless of whether they are virtualized or not.For this reason, specialist application

compatibility fixes ("SHIMs") may still be needed, even if the application is virtualized.

DESKTOP VIRTUALIZATION

Desktop virtualization is the concept of separating a personal computer desktop environment from the physical machine through a client-server computing model. The resulting "virtualized" desktop is stored on a remote central server, instead of on the local storage of a remote client; thus, when users work from their remote desktop client, all of the programs, applications, processes, and data used are kept and run centrally, allowing users to access their desktops on any capable device, such as a traditional personal computer, notebook computer, smartphone, or thin client.

Virtual desktop infrastructure (VDI) is the server computing model enabling desktop virtualization, encompassing the hardware and software systems required to support the virtualized environment.

Technical definition

Desktop virtualization is encapsulating and delivering either access to an entire information system environment or the environment itself to a remote client device.The client device may be based upon an entirely different hardware architecture than that used by the projected desktop environment, and may also be based upon an entirely different operating system.

The desktop virtualization model allows the use of virtual machines to let multiple network subscribers maintain individualized desktops on a single, centrally located computer or server. The central machine may be at a residence, business, or data center. Users may be geographically scattered, but all may be connected to the central machine by a local area network, wide area network, or via the public Internet.

Uses

A simple use for desktop virtualization is remote administration where the controlling computer will work almost the same as on a duplicate desktop, except that the actions of the controlling computer may be almost unnoticeable on the remote computer display. This is different than a simple remote desktop software in that several people can use the same computer at once, without disturbing each others work. This could be useful for several administrators doing different tasks on the same server. It can also be used for using hardware attached to the controlled computer, without disturbing a person who may already be using the computer.

However, a major use is for spreading the resources of one machine to several users. In some cases it is cheaper to buy one large computer or server, and several thin clients or dumb terminals, than purchasing a complete computer for each workstation. The controlling thin client computers only need to be powerful enough to run the remote controlling software, therefore it can be a very simple and cheap computer. When one uses such a "thin client" or "dumb terminal", they may not even know that their software is actually running on another computer. If one already has enough computers, but they

are not powerful enough, only one new computer may be needed, and the old ones used as thin clients.

Advantages and disadvantages

The shared resources model inherent in desktop virtualization offers advantages over the traditional model, in which every computer operates as a completely self contained unit with its own operating system, peripherals, and application programs. Overall hardware expenses may be reduced as resources can be shared and allocated to users on an as-needed basis. The integrity of user information is improved because all data can be maintained and backed up in the data center. Other potential advantages include:

Simpler provisioning of new desktops

Reduced downtime in the event of server or client hardware failures

Lower cost of new application deployment

Desktop image management capabilities

Longer refresh cycle for client desktop infrastructure

Secure remote access to the enterprise desktop environment

Limitations of desktop virtualization include:

Potential security risks if the network is not properly managed

Some loss of user autonomy and privacy

Challenges in setting up and maintaining drivers for printers and other peripherals

Difficulty in running certain complex applications such as multimedia

Increased downtime in the event of network failures

Complexity and high costs of VDI deployment and management

Hosted virtual desktops

Hosted virtual desktops are desktop virtualization services provided through an outsourced, hosted subscription model. Hosted virtual desktop services generally include a managed desktop client operating system configuration. Security may be physical, through a local storage area network, or virtual through data center policies. Transferring information technology infrastructure to an outsourced model shifts accounting for the associated costs from capital expenses to operating expenses.

According to a report by Gartner, hosted services accounted for more than 500,000 desktop units as of March 2009, but will grow to 49 million desktop units by 2013, and may make up as much as 40% of the worldwide professional personal computer market by revenue.

SERVER VIRTULIZATION

Server virtualization is the masking of server resources, including the number and identity of individual physical servers, processors, and operating systems, from server users. The server administrator uses a software application to divide one physical server into multiple isolated virtual environments. The virtual environments are sometimes called virtual private servers, but they are also known as guests, instances, containers or emulations.

There are three popular approaches to server virtualization: the virtual machine model, the paravirtual machine model, and virtualization at the operating system (OS) layer.

Virtual machines are based on the host/guest paradigm. Each guest runs on a virtual imitation of the hardware layer. This approach allows the guest operating system to run without modifications. It also allows the administrator to create guests that use different operating systems. The guest has no knowledge of the host's operating system because it is not aware that it's not running on real hardware. It does, however, require real computing resources from the host -- so it uses a hypervisor to coordinate instructions to the CPU. The hypervisor is called a virtual machine monitor (VMM). It validates all the guest-issued CPU instructions and manages any executed code that requires addition privileges. VMware and Microsoft Virtual Server both use the virtual machine model.

The paravirtual machine (PVM) model is also based on the host/guest paradigm -- and it uses a virtual machine monitor too. In the paravirtual machine model, however, The VMM actually modifies the guest operating system's code. This modification is called porting. Porting supports the VMM so it can utilize privileged systems calls sparingly. Like virtual machines, paravirtual machines are capable of running multiple operating systems. Xen and UML both use the paravirtual machine model.

Virtualization at the OS level works a little differently. It isn't based on the host/guest paradigm. In the OS level model, the host runs a single OS kernel as its core and exports operating system functionality to each of the guests. Guests must use the same operating system as the host, although different distributions of the same system are allowed. This distributed architecture eliminates system calls between layers, which reduces CPU usage overhead. It also requires that each partition remain strictly isolated from its neighbors so that a failure or security breach in one partition isn't able to affect any of the other partitions. In this model, common binaries and libraries on the same physical machine can be shared, allowing an OS level virtual server to host thousands of guests at the same time. Virtuozzo and Solaris Zones both use OS-level virtualization.

Server virtualization can be viewed as part of an overall virtualization trend in enterprise IT that includes storage virtualization, network virtualization, and workload management. This trend is one component in the development of autonomic computing, in which the server environment will be able to manage itself based on perceived

activity. Server virtualization can be used to eliminate server sprawl, to make more efficient use of server resources, to improve server availability, to assist in disaster recovery, testing and development, and to centralize server administration.

Assignment on

Paper 3 : Computer Vision

“VIDEO FINGER PRINTING”

M. Phil. (Computer Sceince)

Submitted to: Submited by :

Dr. Prateek Sharma Mr. Chetan Nagar

THE GLOBAL OPEN UNIVERSITY, NAGALAND

Definition:

Video fingerprinting is a technique in which software identifies, extracts and then compresses characteristic components of a video, enabling that video to be uniquely identified by its resultant “fingerprint”. Video fingerprinting is a new and emerging technology that has proven itself to be significantly more effective at identifying and comparing digital video data than either of its predecessors, hash value comparisons and digital watermarking.

Video fingerprinting analysis may be based on any number of visual video features including, but not limited to, key frame analysis, color and motion changes during a video sequence.

Limitations of hash value comparisons

Normally, digital data are compared based on hash values that are directly derived from the digital components of a file. However, such methods are incomplete as they can only determine absolute equality or non-equality of video data files or parts. More often than not, differences in a video codec and digital processing artifacts may cause small differences in the digital components without changing the video perceptually. Thus, when employing hash methods, a comparison for absolute equality may fail even when two video segments are perceptually identical. Moreover, hash value comparisons are also of little value when one wishes to identify video segments that are similar (but not identical) to a given reference clip. The limitations of the equality / inequality dichotomy inherent to hash value techniques render “similar searching” impossible.

Also, digital video fingerprinting enables to recognise video excerpts (either shorter videos, or mashups for instance) and videos with a different resolution compared with the original (smaller or larger) as well as recognise videos that have been modified slightly (blurring, rotation, acceleration or decceleration, cropping, insertions of new elements in the video), and videos where the audio track has been modified.

Principles behind video fingerprinting technology

Video fingerprinting methods extract several unique features of a digital video that can be stored as a fingerprint of the video content. The evaluation and identification of video content is then performed by comparing the extracted video fingerprints. For digital video data, both audio and video fingerprints can be extracted, each having individual significance for different application areas.

The creation of a video fingerprint involves the use of specialized software that decodes the video data and then applies several feature extraction algorithms. Video fingerprints are highly compressed when compared to the original source file and can therefore be easily stored in databases for later comparison. They may be seen as an

extreme form of lossy compression and cannot be used to reconstruct the original video content.

Video fingerprinting should not be confused with digital watermarking which relies on inserting identifying features into the content itself, and therefore changing the nature of the content. Watermarks must be inserted at the source in order to identify content and may be changed or removed at a later time by anyone. Video fingerprinting, however, can identify any content regardless of whether it has been previously manipulated.

Considering the huge number of videos currently available, the rapid rise of available videos thanks to the development of user generated content sites (UGC sites), video fingerprinting technologies face a huge scalability challenge.

Video fingerprinting applications

Video Fingerprinting is of interest in the Digital Rights Management (DRM) arena, particularly regarding the distribution of unauthorized content on the Internet. Video Fingerprinting systems enable content providers (e.g. film studios) or distributors (e.g. UGC sites) to determine the presence of unauthorized content within a database and to subsequently remove it. Moreover, video fingerprinting may be used for broadcast monitoring (e.g. advertisement monitoring, News monitoring) and general Media monitoring. For broadcast monitoring solutions in particular, there is high demand because content providers and content owners wish to detect when and where their video content appears on TV.

Video fingerprinting is also used by authorities to track the distribution of illegal contents such as happy slapping, terrorist and child abuse related videos. Another use is for companies to track the leak of confidential recordings or videos, or for celebrities to track the distribution on the Internet of unauthorized videos (for instance videos of themselves taken by amateurs using a camcorder or a mobile phone).

Fingerprinting visual content is similar to audio fingerprinting but uses a different technology. From a content provider's point of view, audio fingerprinting is not as reliable as visual fingerprinting. In most cases, audio tracks can be changed or manipulated with relative ease. For an example, consider online "mash-ups". Most “mash-ups” consist of unauthorized content that is compiled together and is set to a unique audio track. Since the audio track is different from the original version, the copyrighted material in the mash-ups would go undetected using only audio fingerprinting techniques.

This discrepancy has real applications in the global online community in terms of film distribution. Films shown in countries other than their country of origin are often dubbed into other languages. This change in audio renders the films virtually unrecognizable by audio fingerprinting technologies unless a copy of all known versions has been previously fingerprinted. Employing video fingerprinting, however, enables the content owner to fingerprint just once and have each subsequent version remain recognizable.

In computer vision and image processing the concept of feature is used to denote a piece of information which is relevant for solving the computational task related to a certain application. More specifically, features can refer to

the result of a general neighborhood operation (feature extractor or feature detector) applied to the image,

specific structures in the image itself, ranging from simple structures such as points or edges to more complex structures such as objects.

Other examples of features are related to motion in image sequences, to shapes defined in terms of curves or boundaries between different image regions, or to properties of such a region.

The feature concept is very general and the choice of features in a particular computer vision system may be highly dependent on the specific problem at hand.

Introduction

When features are defined in terms of local neighborhood operations applied to an image, a procedure commonly referred to as feature extraction, one can distinguish between feature detection approaches that produce local decisions whether there is a feature of a given type at a given image point or not, and those who produce non-binary data as result. The distinction becomes relevant when the resulting detected features are relatively sparse. Although local decisions are made, the output from a feature detection step does not need to be a binary image. The result is often represented in terms sets of (connected or unconnected) coordinates of the image points where features have been detected, sometimes with subpixel accuracy.

When feature extraction is done without local decision making, the result is often referred to as a feature image. Consequently, a feature image can be seen as an image in the sense that it is a function of the same spatial (or temporal) variables as the original image, but where the pixel values hold information about image features instead of intensity or color. This means that a feature image can be processed in a similar way as an ordinary image generated by an image sensor. Feature images are also often computed as integrated step in algorithms for feature detection.

Feature representation

A specific image feature, defined in terms of a specific structure in the image data, can often be represented in different ways. For example, an edge can be represented as a boolean variable in each image point that describes whether an edge is present at that point. Alternatively, we can instead use a representation which provides a certainty measure instead of a boolean statement of the edge's existence and combine this with information about the orientation of the edge. Similarly, the color of a specific region can either be represented in terms of the average color (three scalars) or a color histogram (three functions).

When a computer vision system or computer vision algorithm is designed the choice of feature representation can be a critical issue. In some cases, a higher level of detail in the description of a feature may be necessary for solving the problem, but this

comes at the cost of having to deal with more data and more demanding processing. Below, some of the factors which are relevant for choosing a suitable representation are discussed. In this discussion, an instance of a feature representation is referred to as a (feature) descriptor.

Certainty or confidence

Two examples of image features are local edge orientation and local velocity in an image sequence. In the case of orientation, the value of this feature may be more or less undefined if more than one edge are present in the corresponding neighborhood. Local velocity is undefined if the corresponding image region does not contain any spatial variation. As a consequence of this observation it may be relevant use a feature representation which includes a measure of certainty or confidence related to the statement about the feature value. Otherwise, it is a typical situation that the same descriptor is used to represent feature values of low certainty and feature values close to zero, with a resulting ambiguity in the interpretation of this descriptor. Depending on the application, such an ambiguity may or may not be acceptable.

In particular if a feature image will be used in subsequent processing, it may be a good idea to employ a feature representation which includes information about certainty or confidence. This enables a new feature descriptor to be computed from several descriptors, for example computed at the same image point but at different scales, or from different but neighboring points, in terms of a weighted average where the weights are derived from the corresponding certainties. In the simples case, the corresponding computation can be implemented as a low-pass filtering of the feature image. The resulting feature image will, in general, be more stable to noise.

Averageability

In addition to having certainty measures included in the representation, the representation of the corresponding feature values may itself be suitable for an averaging operation or not. Most feature representations can be averaged in practice, but only in certain cases can the resulting descriptor be given a correct interpretation in terms of a feature value. Such representations are referred to as averageable.

For example, if the orientation of an edge is represented in terms of an angle, this representation must have a discontinuity where the angle wraps from its maximal value to its minimal value. Consequently, it can happen that two similar orientations are represented by angles which have a mean that does not lie close to either of the original angles and, hence, this representation is not averageable. There are other representations of edge orientation, such as the structure tensor, which are averageable.

Another example relates to motion, where in some cases only the normal velocity relative to some edge can be extracted. If two such features have been extracted and they can be assumed to refer to same true velocity, this velocity is not given as the average of the normal velocity vectors. Hence, normal velocity vectors are not averageable. Instead, there are other representations of motions, using matrices or tensors, that give the true velocity in terms of an average operation of the normal velocity descriptors.

Feature vectors and feature spaces

In some applications it is not sufficient to extract only one type of feature to obtain the relevant information from the image data. Instead two or more different features are extracted, resulting in two or more feature descriptors at each image point. A common practice is to organize the information provided by all these descriptors as the elements of one single vector, commonly referred to as a feature vector. The set of all possible feature vectors constitute a feature space.

A common example of feature vectors appears when each image point is to be classified as belonging to a specific class. Assuming that each image point has a corresponding feature vector based on a suitable set of features, meaning that each class is well separated in the corresponding feature space, the classification of each image point can be done using standard classification method.

Another, and related example, occurs when neural network based processing is applied to images. The input data fed to the neural network is often given in terms of a feature vector from each image point, where the vector is constructed from several different feature extracted from the image data. During a learning phase, the networks can itself find which combinations of different features that are useful for solving the problem at hand.