Automatically Spotting Cross-language Relations

Post on 10-May-2015

630 views 2 download

Tags:

description

An algorithm (with code on GitHub) to identify cross-language relations. Welcome into polyglot software development!

Transcript of Automatically Spotting Cross-language Relations

Spotting automatically

cross-language relations

Federico Tomassetti (me)

Giuseppe Rizzo

Marco Torchiano

CREATE TABLE Persons ( ID int, FirstName varchar(255), LastName varchar(255), City varchar(255) ); String query = "select ID, FirstName, LastName, " + "City " + "from " + dbName + ".Persons"; try { ... while (rs.next()) { int id = rs.getInt("ID"); String firstName = rs.getString("FirstName"); String lastName = rs.getString("LastName"); String city= rs.getString("City"); } } catch (SQLException e ) { ...... }

data.sql

Person.java

CREATE TABLE Persons ( ID int, FirstName varchar(255), LastName varchar(255), City varchar(255) ); String query = "select ID, FirstName, LastName, " + "City " + "from " + dbName + ".Persons"; try { ... while (rs.next()) { int id = rs.getInt("ID"); String firstName = rs.getString("FirstName"); String lastName = rs.getString("LastName"); String city= rs.getString("City"); } } catch (SQLException e ) { (Hopefully it does not happen) }

data.sql

Person.java

…the complexive system, works, sometimes

If we would automatically identify

cross-language relations we could:

• Recognize them

• Support refactoring

• Validate them

• Navigate them

So I am aware that this ID is

related to something else

If we would automatically identify

cross-language relations we could:

• Recognize them

• Support refactoring

• Validate them

• Navigate them

If I change one, the others are

updated

If we would automatically identify

cross-language relations we could:

• Recognize them

• Support refactoring

• Validate them

• Navigate them

See broken relations as errors

If we would automatically identify

cross-language relations we could:

• Recognize them

• Support refactoring

• Validate them

• Navigate them

Click to see the other side of

the relation

CodeModels

ASTs

Embedded AST (prendo immagine da paper)

<ul id="types">

<li ng-repeat="t in types" ng-class="{'selected': t.id == type}">

<a ng-href="#/{{t.id}}">{{t.title}}</a>

</li>

</ul>

var types = [

{ id: 'sliding-puzzle', title: 'Sliding puzzle' },

{ id: 'word-search-puzzle', title: 'Word search puzzle' }

];

index.html

app.js

app.controller('slidingAdvancedCtrl', function($scope) {

$scope.puzzles = [

{ src: './img/misko.jpg', title: 'Miško Hevery', rows: 4, cols: 4 },

{ src: './img/igor.jpg', title: 'Igor Minár', rows: 3, cols: 3 },

{ src: './img/vojta.jpg', title: 'Vojta Jína', rows: 4, cols: 3 }

];

});

<div ng-repeat="puzzle in puzzles">

<h2>{{puzzle.title}}</h2>

</div>

<ul id="types">

<li ng-repeat="t in types" ng-class="{'selected': t.id == type}">

<a ng-href="#/{{t.id}}">{{t.title}}</a>

</li>

</ul>

var types = [

{ id: 'sliding-puzzle', title: 'Sliding puzzle' },

{ id: 'word-search-puzzle', title: 'Word search puzzle' }

];

index.html

app.js

app.controller('slidingAdvancedCtrl', function($scope) {

$scope.puzzles = [

{ src: './img/misko.jpg', title: 'Miško Hevery', rows: 4, cols: 4 },

{ src: './img/igor.jpg', title: 'Igor Minár', rows: 3, cols: 3 },

{ src: './img/vojta.jpg', title: 'Vojta Jína', rows: 4, cols: 3 }

];

});

<div ng-repeat="puzzle in puzzles">

<h2>{{puzzle.title}}</h2>

</div>

Context of a node:

all the descendants

+

the siblings and their descendants

Context of a node:

all the descendants

+

the siblings and their descendants

Some metrics we use:

• Number of shared values

• Min and max number of different values

• Tversky Index

𝑇𝑉 𝑋, 𝑌 =|𝑋∩𝑌|

|𝑋∩𝑌|+𝛼|𝑋−𝑌|+𝛽|𝑌−𝑋|

• Jaro, Jaccard, tf-idf and others

How to compare contexts:

1) Take all the values in the context (IDs, strings,

numbers)

+

2) Employ different metrics

How to combine those metrics:

Random Tree tells us

We built a golden set of 1200 candidate relations

(around 140 real relations, the other just same ID)

We train it with golden set

Random Tree find out the best way to combine those

metrics to decide if a pair is related or not

Rule to understand if two nodes with same ID are

connected

Output of Random Tree

How to evaluate it?

10-fold cross valiationn

What now?

Code available at:

https://github.com/orgs/CrossLanguageProject

• We want to build a larger golden set

• We want to integrate support in editors

What we have

• A tool that spot automatically cross-language relations

with a precision and recall > 90% (on a first in-house

dataset)

Code available at:

https://github.com/orgs/CrossLanguageProject

www.slideshare.net/FTomassetti

Spotting Automatically

Cross-Language Relations

Federico Tomassetti, Giuseppe Rizzo, Marco Torchiano

CSMR 2014, Antwerpen, Belgium

Preprint at:

http://www.di.unito.it/~rizzo/publications/Tomassetti_Rizzo-CSMRWCRE2014.pdf