Regular Object Types and X TATIC Based on: A Paper by Vladimir Gapeyev and Benjamin C. Pierce A...
-
date post
20-Dec-2015 -
Category
Documents
-
view
221 -
download
2
Transcript of Regular Object Types and X TATIC Based on: A Paper by Vladimir Gapeyev and Benjamin C. Pierce A...
Regular Object Types and
XTATIC
Based on: A Paper by Vladimir Gapeyev and Benjamin C. Pierce A Presentation of the paper by Benjamin C. Pierce
Presented by: Lena Lempert
Introduction Regular types have been proposed as a base
for statically typed processing of XML. However, regular types have only been
explored in special-purpose languages – languages with type system designed around regular types (XDuce, CDuce, Xquery).
Our objective To develop XTATIC language, which goal is
to bring regular types to a broader audience by offering them as a lightweight extension of a popular object-oriented language – C#.
Key ideas of XTATIC
XTATIC data model - a combination of: Tree-structured data model of XDuce Classes-and-objects data model of object oriented
language. Treats XML structures as objects.
FX : a core calculus for XTATIC A formal core of the XTATIC design is being developed. A tool for this investigation – a tiny language called FX. FX features are drawn from:
FJ – Featherweight Java The core of XDuce
Points of interest include: A smooth interleaving of the two data models A definition of “subtype” relation A natural encoding of XML documents using singleton classes
XTATIC exampleAn XML fragment:
<Person> <Name> Lena Lempert </Name> <Email> [email protected]</Email></Person><Person> <Name> Queen Elisabeth </Name> <Phone> +44 55 6666 </Phone></Person>
The corresponding XTATIC value
[ <Person> [ <Name> [ ‘Lena Lempert’ ] <Email> [‘[email protected]’ ]
], <Person> [
<Name> [ ‘Queen Elisabeth’ ],<Phone> [ +44 55 6666 ]]
]
A type for this expression:[ <Person> [ <Name> [ pcdata ],
(<Email> [ pcdata ] | <Phone> [ pcdata ] ) ] *]
| union
* repetition
, concetanation
XTATIC example (cont.) Sequence values can be examined
using type-based pattern matching. Example:
list – variable that contains sequence of the type given in previous slide
If Person has Email – extracts the email to pcdata variable e and uses it to extend the text in spamlist
Otherwise, the person must have Phone. Second case binds the whole entry to variable p and adds it to the phonebook sequence
Empty sequence
match (list) {case [ <Person>[ <Name>[pcdata], <Email>[pcdata e ] ], any rest ]:
spamlist = [ spamlist, ‘,‘, e ];case [ <Person>[ <Name>[pcdata], <Phone>[pcdata] ] p, any rest]]:
phonebook = [[ phonebook, p ]];case []: //.. }
Data model Data model of a language is:
The collection of values that programs in the language manipulate
The types of those values Fundamental relations such as value typing and
subtyping Our primary goal – combination of trees and
objects (and their types). Therefore we will concentrate on data model of FX, which is combination of data models of XDuce and FJ.
The XDuce Data Model XDuce – language of labels Consists of:
A set of label values A set of label types A denotation function [[·]] giving the set
[[ L ]] L of label values that are members of each type L The subtyping relation:
L1 ᆮ L2 (L1 is a subtype of L2) iff [[L1]] [[L2]] Simple choice of label language:
for each value l L , consider l to be a label type as well. Then l is the singleton type whose denotation contains just l.
A wildcard label type ~, denoting the whole set L.
The XDuce Data Model (cont.) Tree value t – consists of a label value and a sequence of children tree
values:
t ::= <(l)>[t1, …, tn] where n ≥ 0 XDuce types – regular types - regular expressions over an “alphabet”
consisting of tree types <(L)>[X]:T ::=
<(L)>[X] tree[] empty sequenceT, T concetanationT | T unionT* repetition
Subtyping relation for regular types:
T1 < T2 iff [[ T1 ]] [[ T2 ]]
The FJ Data Model FJ (Featherweight Java) is a tiny calculus
designed to capture the essential typing mechanisms of class-based object-oriented languages such as Java and C#.
Included: the core mechanisms of objects creation, field access, method invocation, inheritance.
Ommited: interfaces, overloading, static members, concurrency, and even assignment!
But how can we manage without assignment?…
Here’s the trick: Demand that the fields of an object be initialized from it’s constructor
arguments and never touched again. A class definition must have the form:
class C {
D1 f1; … Dn fn;
C (D1 x1, …Dn xn) { f1=x1; … fn = xn }
… method definitions…
} Now identify an object with the expression new C(a1, …, an) used to
create it – i.e., just treat new expressions as values.
The FJ Data Model (cont. 1) An FJ program consists of a collection of class declarations plus a single
expression to be evaluated. FJ types = class names C FJ values = objects
o ::= new C(o1, o2, …, on) (n ≥ 0)
The constructor arguments o1, o2, …, on (usually written just ō) must correspond exactly to the fields of class C. Example:
private:ab
private:ef
CC
DDd = new D(a, b, e, f)
The FJ Data Model (cont. 2) We say that a value C(ō) is a valid object of the class C if:
Its field values ō conform to the field types declared for C
The denotation of a class C: The set of all valid objects of the class C and its
subclasses Subtyping relation:
C1 ᆮ C2 (C1 is a subtype of C2) iff [[C1]] [[C2]]
The FX Data Model The interleaving of:
XDuce data model FJ data model
Observation 1: We can treat sequences of trees as objects
A special class Seq, whose subtypes are all the regular types.
All the tree values are transalated to the objects of the class Seq.
The FX Data Model (cont.) Observation 2:
We can treat the data model of classes and objects as a “label language” : Objects – labels in XDuce trees Classes – label types
FX values and types Values
a ::= FX valuenew C(ā) object[ ] delimited sequence
t ::= tree value<(a)> [ ]
Types
A ::= FX typeC Class name[X] Regular type name[T] Regular type
T ::= regular type<(A)> [X] tree type[] empty
sequenceT, T
concetanationT | T unionT* repetition
t
Full FX languageFull FX language
Regular expression sub-languageRegular expression sub-language
t
Program context A program context is a tuple:
Ctx = <Typenames, def, Classes, ᆮ :, fields, mtypes, mbody>
Where: Typenames – set of names for regular types def – a function that maps each name in Typenames to its definition Classes – a set of class names, containing special names Object and Seq ᆮ : - a transitive subclass relation, such that C ᆮ : Object for all C, and
such that Seq has no sub or super-types except Object. fields – a list F1 f1 … Fn fn, such that fields(Seq) and fields(Object) are
empty, and if C is a subclass of D, then fields(D) is a prefix of fields(C). mtypes – method type mbody – method body
FX type membership The syntax of values given until now allows ill-formed object values
new C(ā) , where actual field values ā do not conform to the field types declared for class C in program context.
To correct this, we introduce a type membership relation a A:value a is valid, if there is a type A, such that a A.
Type denotation (set of values of the type):[[ A ]] = { a | a A }
Denotation of Seq: Does not contain objects ( new Seq(ā)) Contains all valid sequence values
Subtyping in FX:
A1 is a subtype of A2 [[A1]] [[ A2]]
The FX Language Syntax The full-blown FX language syntax:
e := expression
x value variable
new C(ē) new object creation
e.f field access
e.m(ē) method call
<(e)>[ē] tree
[ē] sequence
match(e) { case [P]: ē} pattern match
The FX Language Syntax (cont.) Q := FX pattern
C class[X] pattern name[P] regular patternQ x FX var binding
P := Regular Pattern<(Q)>[P] tree[] empty sequenceP, P concetanationP | P alternativeT* type repetitionP x regular var binding
FX syntax - explanations and constraints Types (types of fields, or appearing in method signatures) – can be
regular types as well as classes. Variables – can hold any FX values, either objects or sequences. Only tree values can be members of sequence values.
[ [t], (new C(a)), [s]] – not allowed! Sequence expressions – nested sequences allowed!
The reason - we want the following expression to be legal:[ db.getPapers(“POPL”), db.getPapers(“ICFP”)]
If the method getPapers() returns values of sequence type. An object is never legal as a member of sequence. A tree expression <(e)>[d] is never allowed outside of sequence
parentheses[…].
Pattern matching Deconstruction of sequence values is done by matching them against
patterns using the match construct. Syntactically resembles C# switch Behaves like XDUCE match
match(d) {
case [P1]: e1;
case [P2]: e2;
…
case [Pn]: en;
}
A sequence
A sequence pattern
No “fall through”
Patterns matching (cont. 1) The syntax of FX patterns [P]:
As in XDUCE – a pattern is just a type annotated with variable binders.
A class pattern – has the form
C x (C – class name, x – variable to be bound). In pattern matching, we do not examine object fields for conformance
with the declared field types – only the class tag of the object is checked (in contrast to the type membership relation).
Similarly, in pattern matching, the validity of sequences is not checked. This is safe, as only valid objects and sequences exist at a run time.
Patterns matching (cont. 2) We can use a class pattern in the label possition in a
tree pattern. Classes can be types of labels in tree types. Allows to extract a label from a tree as an object, for a
later use in the program.
Properties We can now state for FX the standard results of
static type safety: Preservation Progress
Formal definitions: A value environment Σ A typing environment Γ Σ conforms to Γ ( Σ Γ), if:
dom(Σ) = dom(Γ) Σ(x) Γ(x), for all x
● - an environment with an empty domain
Properties (cont.) Expression e gets type A, in the typing environment
ΓΓ ├ e A
Expression e evaluates to value a, in the value environment Σ
Σ ├ e ↓ a Evaluation of e gets stuck in a finite number of
steps:
Σ ├ e ↓
Properties: proposition Proposition (pattern matching preserves validity):
Let a A Q – a pattern If
a ►Q => Σ and ►Q => Γ
Then A <: tyof(Q) and Σ Γ
tyof(Q) – type obtained from Q by erasing value binding annotations.
Properties: preservation and soundness Preservation:
For Σ Γ , if Γ ├ e A and Σ ├ e ↓ a
Then a A
Soundness: If ●├ e A Then not ●├ e ↓
XML in FX How the “leaf data” (PCDATA) can be
treated? We extend the C# data model by introducing
singleton classes for individual characters. The program context Ctx provides:
A class Char (standard C# character class) For each character c – a class CharC extending Char.
Each CharC contains a single object – new CharC()
XML in FX - pcdata We can define a regular type pcdata,
representing XML character data: def(pcdata) = (<(Char)>[])*
a sequence of trees, where each tree: Has no children Has a character object as its label
<(Object)>[pcdata] – a tree whose body contains only character data.
Why not use C# String? First reason:
pcdata representation opens the way to interesting uses of pattern matching for string regular expression processing. Since Chara is a subtype of Char – we can write types
that restrict text to a particular form. Example:
All character sequences starting with ‘a’ and ending with ‘b’:
<(‘a’)>[], pcdata, <(‘b’)>[]
Why not use C# String? (cont.) Second reason:
In XML, two character sequences following each other are indistinguishable from a single larger character sequence.
pcdata – satisfies this requirement[pcdata, pcdata] = [<(Char)>[]*, <(Char)>[]*] = [<(Char)>[]*] = [pcdata]
String – does not satisfy this requirement[<String>[], <String>[]] ≠ [<String>[]]
The encoding of XML documents in XTATIC
Encoding of XML tags Exactly the same intuition we used for characters!
A special class Tag For each tag <g> - a singleton class Tag<g>
Tag <g> is a subclass of Tag
a single object – new Tag<g>()
The encoding of XML documents in XTATIC - example
XML fragment:
<basket> <apple/> <banana/> </basket>
XTATIC value:< new Tag<basket>()>[< new Tag<apple>()>[],
<new Tag<banana>()>[] ]
XTATIC type:
<Tag<basket>> [ <Tag<apple>>[], <Tag<banana>>[] ]
Status FX language definition : more or less finished Prototype typechecker / interpreter for FX :
running Pattern match compilation : underway Run-time system: just starting Extension with attributes: underway
Some of the remaining challenges Run-time representation issues Exploring alternative pattern matching primitives Dealing with update operations on XML structures
Possible approaches: Leave type system alone; use run-time checking to maintain
safety Add types for mutable XML structures
Namespaces Additional XML features (e.g. from XML-Schema) Integration with polymorphizm (generics) Dealing with large XML structures (streaming)
Related work Current work at MS on integrating “native
XML types” with C# Work on adding regular expression types and
patterns to OCaml CDuce Relax-NG