Record Types in Scala: Design and Evaluation1123270/...more, Scala has its theoretical foundation in...

IN THE FIELD OF TECHNOLOGYDEGREE PROJECT ENGINEERING PHYSICSAND THE MAIN FIELD OF STUDYCOMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2017

Record Types in Scala: Design and Evaluation

OLOF KARLSSON

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

Record Types in Scala: Design andEvaluation

OLOF KARLSSON

Master in Computer ScienceDate: June 28, 2017Supervisor: Philipp HallerExaminer: Mads DamSwedish title: Record-typer för Scala: Design och utvärderingSchool of Computer Science and Communication

i

Abstract

A record type is a data type consisting of a collection of named fields that combines theflexibility of associative arrays in some dynamically typed languages with the safetyguarantees and possible runtime performance of static typing. The structural typing ofrecords is especially suitable for handling semi-structured data such as JSON and XMLmaking efficient records an attractive choice for high-performance computing and large-scale data analytics. It has proven difficult to implement record types in Scala however.Existing libraries suffer from either severe compile-time penalties, large runtime over-head, or other restrictions in usability such as poor IDE integration and hard-to-compre-hend error-messages.

This thesis provides a systematic description and comparison of both existing andpossible new approaches to records in Scala and Dotty, a new compiler for the Scala 3language. A novel benchmarking suite is presented, built on top of the Java Microbench-mark Harness (JMH), for measuring runtime and compile-time performance of recordsrunning on the Java Virtual Machine and currently supporting Scala, Dotty, Java andWhiteoak.

To achieve field access times comparable to nominally typed classes, it is conjecturedthat width subtyping has to be restricted to explicit coercion and a compilation schemefor such record types is sketched. For unordered record types with width and depth sub-typing however, hashmap-based approaches are found to have the most attractive run-time performance characteristics. In particular, Dotty provides native support for such animplementation using structural refinement types that might strike a good balance be-tween flexibility and runtime performance for records in the future.

ii

Sammanfattning

En record-typ är en datatyp som består av en en uppsättning namngivna fält som kom-binerar flexibiliteten hos associativa arrayer i vissa dynamiskt typade programmerings-språk med säkerhetsgarantierna och den potentiella exekveringshastigheten som fås avstatisk typning. Records strukturella typning är särskilt väl lämpad för att hantera semi-strukturerad data såsom JSON och XML vilket gör beräkningseffektiva records ett attrak-tivt val för högprestandaberäkningar och storskalig dataanalys. Att implementera recordsi programmeringsspråket Scala har dock visat sig svårt. Existerande bibliotek lider an-tingen av långa kompileringstider, långsam exekveringshastighet, eller andra problemmed användbarheten såsom dålig integration med olika utvecklingsmiljöer och svårför-stådda felmeddelanden.

Den här uppsatsen ger en systematisk beskrivning och jämförelse av både existeran-de och nya lösningar för records i Scala och Dotty, en ny kompilator för Scala 3. Ett nyttbenchmarkingverktyg för att mäta exekveringshastigheten och kompileringstiden av re-cords som körs på den virtuella Java maskinen presenteras. Benchmarkingverktyget ärbyggt på Java Microbenchmark Harness (JMH) och stöder i nuläget Scala, Dotty, Java ochWhiteoak.

För att åstadkomma körtider som är jämförbara med nominellt typade klasser an-tas att subtypning på bredden måste begränsas till explicita konverteringsanrop och enskiss till en kompileringsstrategi för sådana records presenteras. För record-typer med ic-ke ordnade fält och subtypning på bredden och djupet visar sig istället records baseradepå hashtabeller ha de mest attraktiva exekveringstiderna. Dotty tillhandahåller stöd fören sådan implementation med strukturella förfiningstyper som kan komma att träffa enbra balans mellan flexibilitet och exekveringshastighet för records i framtiden.

iii

Dedication

To Dag, for providing shelter in times of need and always reminding me of what en-gineering is all about. I would also like to thank my friends and family for invaluablesupport, and A3J - thanks for all the coffee!

Contents

Contents v

1 Introduction 11.1 Problem Description and Objective . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Research Question and Report Structure . . . . . . . . . . . . . . . . . . . . . 21.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Societal and Ethical Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 42.1 Definition of Record and Record Type . . . . . . . . . . . . . . . . . . . . . . . 42.2 Type Systems for Polymorphic Records . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 Structural Subtyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.2 Bounded Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.3 Other Forms of Parametric Polymorphism . . . . . . . . . . . . . . . . 8

2.3 The Scala Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Method 133.1 Qualitative Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Quantitative Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.1 Wreckage Benchmarking Suite Generator Library . . . . . . . . . . . . 143.2.2 Runtime Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2.3 Compile-Time Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.4 Statistical treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.4.1 Runtime Benchmarks . . . . . . . . . . . . . . . . . . . . . . . 193.2.4.2 Compile-Time benchmarks . . . . . . . . . . . . . . . . . . . . 20

4 Description of Existing Approaches 214.1 Scala’s Structural Refinement Types . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1.1 Basic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2 scala-records v0.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.2.1 Basic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2.2 Lack of Explicit Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2.3 Other Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3 scala-records v0.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.3.1 Basic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3.2 Explicit Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

v

vi CONTENTS

4.3.3 Other Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.4 Compossible . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4.1 Creation through Extension through Concatenation . . . . . . . . . . 314.4.2 Extension and (Unchecked) Update . . . . . . . . . . . . . . . . . . . . 334.4.3 Access and Select . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.4.4 Explicit Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.4.5 Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.4.6 Other Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.5 Shapeless 2.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.5.1 HList Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.5.2 Create . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.5.3 Field Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.5.4 Explicit Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.5.5 Subtyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.5.6 Parametric Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . 424.5.7 Other Type Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.5.8 HCons Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.6 Dotty’s New Structural Refinement Types . . . . . . . . . . . . . . . . . . . . 474.6.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.6.2 Basic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.6.3 Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.6.4 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.6.5 Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5 Comparison of Existing Approaches 565.1 Qualitative Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.2 Quantitative Evaluation using Benchmark . . . . . . . . . . . . . . . . . . . . 58

5.2.1 Runtime performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.2.1.1 Creation Time against Record Size . . . . . . . . . . . . . . . 585.2.1.2 Access Time against Field Index . . . . . . . . . . . . . . . . 585.2.1.3 Access Time against Record Size . . . . . . . . . . . . . . . . 595.2.1.4 Access Time against Degree of Polymorphism . . . . . . . . 60

5.2.2 Compile-Time Performance . . . . . . . . . . . . . . . . . . . . . . . . . 625.2.2.1 Create . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2.2.2 Create and Access All Fields . . . . . . . . . . . . . . . . . . . 62

6 Analysis and Possible new Approaches 656.1 Strengths and Weaknesses of Existing Approaches . . . . . . . . . . . . . . . 656.2 Design Space for Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.3 Record Type Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.4 Compilation Schemes for Subtyped Records . . . . . . . . . . . . . . . . . . . 68

6.4.1 P−W−D±: No Permutation, No Width Subtyping . . . . . . . . . . . 686.4.2 P−W+D±: Width Subtyping for Ordered Fields . . . . . . . . . . . . 696.4.3 P+W−D±: Unordered Records without Width Subtyping . . . . . . . 696.4.4 P+W+D±: Unordered Records with Width Subtyping . . . . . . . . . 69

6.4.4.1 Option 1: Searching . . . . . . . . . . . . . . . . . . . . . . . . 706.4.4.2 Option 2: Information Passing . . . . . . . . . . . . . . . . . . 70

CONTENTS vii

6.4.4.3 Option 3: Use the JVM . . . . . . . . . . . . . . . . . . . . . . 726.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.5 Benchmarks of Possible Data Structures . . . . . . . . . . . . . . . . . . . . . . 756.5.1 Access Time against Record Size . . . . . . . . . . . . . . . . . . . . . . 756.5.2 Access Time against Degree of Polymorphism . . . . . . . . . . . . . . 75

7 Discussion and Future Work 787.1 Subtyping and Field Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.2 Type-level Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.3 Not One but Three Record Types to Rule Them All? . . . . . . . . . . . . . . 797.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

8 Related Work 818.1 Theoretical Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818.2 Structural Types on the JVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

9 Conclusions 83

Bibliography 85

A Whiteoak 2.1 Benchmarks 89

Chapter 1

Introduction

Software is getting more and more complex and programming languages need to con-stantly evolve to help programmers cut through this complexity. In a perfect world itis effortless to develop systems in a short amount of time that are easy to understand,maintain and augment while at the same time being robust with few bugs, high run-time performance and low operating cost. In the real world however, there do not seemto be a silver bullet and these factors have to be weighted against each other. Differentprogramming paradigms tend to focus more on some aspects at the expense of others;Scripting languages emphasize rapid development and syntactic simplicity while com-piled languages tend to focus more on robustness and runtime efficiency.

Scala is a statically typed language with lightweight syntax that is designed to pro-vide a middle-ground between these two extremes. It is a multi-paradigm language com-bining the virtues of object-oriented and functional programming, and an advanced typesystem is combined with local type inference to lessen the syntactic burden [1]. Further-more, Scala has its theoretical foundation in the vObj calculus [2], recently replaced byDOT [3], which combines nominally typed classes and objects with structural typing. Itis therefore natural to consider the possibility of extending the Scala language with struc-turally typed records.

A record-type is a collection of named fields that combines the flexibility of associa-tive arrays in some dynamically-typed languages with the safety guarantees of static typ-ing. Structural typing opens up several possibilities for record polymorphism, includingwidth and depth subtyping, making records especially suitable for handling complex andsemi-structured heterogeneous data such as JSON and XML. Together with the safetybenefits and potential run-time performance of static typing, this makes records an at-tractive choice for high performance computing and large-scale data analytics and a po-tentially valuable addition to the Scala language.

1.1 Problem Description and Objective

Several attempts at implementing record-types in Scala have been made but each ap-proach seems to suffer from some weakness preventing it from gaining widespread use.Existing libraries suffer from either severe compile time penalties, large runtime over-head or other restrictions in usability such as poor IDE integration and hard-to-comprehenderror-messages [4, 5, 6]. The nature and reasons behind these weaknesses are poorlyunderstood however, and current knowledge mainly consists of bug-reports [7], online

1

2 CHAPTER 1. INTRODUCTION

wiki-pages [5] and blog-posts [8]. The objective of this thesis project is therefore to de-scribe and evaluate existing approaches to record types in Scala, provide a structuredanalysis of their strengths and weaknesses, and finally investigate the possibilities for anew approach addressing as many of the found weaknesses as possible.

1.2 Research Question and Report Structure

The main research question guiding the thesis is the following:

What are the possible approaches to record types in Scala and what are their respectivestrengths and weaknesses?

Here, possible approaches include both existing and novel implementations, and in orderto answer this question the thesis consists of the following parts:

The necessary theoretical background and overview of common record type featuresis covered in Chapter 2. Chapter 3 describes the method used to carry out the assign-ment - in particular, the construction of a novel benchmarking suite for records runningon the Java Virtual Machine is outlined. Chapter 4 contains an overview and descrip-tion of existing approaches to records in Scala. This is followed by Chapter 5 where theirqualitative features are summarized and the benchmarking suite is used to evaluate andcompare their runtime and compile-time performance. The determined strengths andweaknesses of existing approaches are analyzed in Chapter 6 and various possibilities fora new approach are evaluated both in terms of their supported feature set and their per-formance. A discussion of the results from Chapter 5 to 6 is found in Chapter 7, also out-lining interesting paths of future work that was not covered by the analysis of Chapter 6.Related works are found in Chapter 8, and finally the thesis is concluded in Chapter 9.

1.3 Contribution

The news value and contribution of the thesis follows from the following constituentparts:

• An overview of existing approaches to record types in Scala, displaying their re-spective feature set, strengths and weaknesses.

• A novel benchmark suite called Wreckage that is publicly available under an open-source license ensuring reproducible results and portability to other languages.

• An overview of possible new approaches and evaluation of their potential featuresand performance.

1.4 Societal and Ethical Aspects

Hopefully, the outcome of this thesis is a deepened knowledge about the design spacefor records and how the feature can be implemented in the Scala programming language.Although it is possible to hide from societal and ethical questions by noting that thiswork is theoretical in nature and the contribution is limited to a small corner of humanknowledge, it is worth thinking about the consequences of advancing knowledge and

CHAPTER 1. INTRODUCTION 3

technology in general. While some might argue that humanity is not ready to handlethe technology we develop in a responsible way and that the power of the tools we useshould be limited, it is also possible to argue that the quality of life has increased tremen-dously over the years thanks to technological advances. Todays society undoubtedlyfaces several challenges, but as much as technology can be said to be the cause it mightalso provide the solutions.

In the best case, this work will be a small step on the road towards better softwarethat is faster and more fun to develop, easier to maintain, less buggy and with lowerresource demands and operating costs. This is important for several reasons. First, theenergy consumption of IT-systems and data centers around the world is increasing [9].The study of how software can be made to run more efficiently is therefore importantfor allowing continued development in a future with reduced energy usage and lowerCO2 emissions. Second, hard-to-maintain and complex software is not merely a nuisanceto programmers but can be viewed as a cost to society as a whole. With less time spenton maintenance, more time can be spent on developing services that benefit people andmeet real needs. Lastly, as more and more of society’s infrastructure is computerized itis of great importance to ensure software robustness and minimize the risk of failure inmission-critical systems. Here, static typing provides at least a partial solution as it canprovide guarantees against certain errors that are caught during compile-time. Reduc-ing code complexity might also help as programs that are easier to read and understandprobably also contain less bugs.

As for any technology, increased computing power can certainly be used to do bothharm and good and many times it might even be hard to tell the difference. But as longas there is a potential for doing good I believe it is worth trying. Every step backwardscan be compensated by at least two steps forward, and what better way to enjoy thejourney than doing science and increasing our knowledge and understanding of life, theuniverse and everything?

Chapter 2

Background

This chapter provides a background on records and their corresponding type system fea-tures in theory and practice, as well as an overview of some characteristic features of theScala programming language.

2.1 Definition of Record and Record Type

A record, sometimes also called a labeled product, is a data type consisting of a collectionof labeled values called fields. Records provide a natural way of composing heteroge-neous data and come in many forms in the literature and in real-world programminglanguages. Given that Scala already supports nominally-typed class instances and objectsfor grouping labeled values together, the focus of this thesis is exclusively on structurallytyped records.

Structural typing means that a record type is fully determined by its collection ofnamed fields and the type of their corresponding values [10]. Thus, a record type doesnot have to be statically declared with any kind of name or qualifier in the program textbefore use, and the type of a record is not dependent on the data constructor used to in-stantiate it. It should be noted that not all programming languages that have a constructcalled a record define it in this way. Most notably Haskell, OCaml and F# have recordsthat are nominally-typed and more similar to Scala’s case classes in features and usage[11, 12, 13].

Without formalizing things too much, the following notation due to Pierce [10] willbe used to talk about records and their types in a language and implementation agnosticway: A record consisting of n fields labeled l1, l2, ..., ln holding values v1, v2, ..., vn oftype T1, T2, ..., Tn respectively will be written as

{l1 = v1, l2 = v2, ..., ln = vn}

with corresponding type{l1 : T1, l2 : T2, ..., ln : Tn}.

Fields are accessed through their labels using a familiar dot-notation. For example access-ing the name field of type String on a record r is written

r.name

and naturally returns the corresponding String value.

4

CHAPTER 2. BACKGROUND 5

Record types have been extensively studied and several type systems and calculi sup-porting record types have been proposed with varying capabilities. Besides being able tocreate records and access their fields, common record operations are: updating a record’svalue (potentially also changing its type), extending or restricting a record by adding orremoving fields, as well as relabeling existing values. Note that all record values are as-sumed to be immutable unless stated otherwise. That is, updating, extending, restrictingor relabeling a record does not change the value of the original record, but rather createsa separate updated copy from the existing one.

In particular, various mechanisms for supporting record polymorphism have been pro-posed and Section 2.2 provides an overview of some of these approaches and their sup-ported operations.

2.2 Type Systems for Polymorphic Records

To avoid code duplication it is often desirable to allow certain functionality to be definedonce and then used anywhere it is applicable. In the case of records this may be illus-trated by the following example due to Ohori [14] of a getter function in a simply typedlambda calculus:

λx. x.name

Without some kind of record polymorphism, this function would have to be defined forevery type of record we want to apply it to, like

getNameFromNameRec := λx : {name : String}. x.namegetNameFromNameAgeRec := λx : {name : String, age : Int}. x.namegetNameFromNameAgeHeightRec := λx : {name : String, age : Int, height : Float}. x.name...

which quickly become tedious and error prone. In object oriented languages the answerto this problem is often to use some form of subtyping, whereas functional programminglanguages instead lean towards using some form of parametric polymorphism [10]. Bothconcepts can be adapted to the case of record types.

2.2.1 Structural Subtyping

In a nominal type system every subtyping relation is established explicitly by the pro-grammer. A Dog is not a subtype of Animal unless the program somewhere says it is (inthe case of Scala by using the extends and with keywords). With structural subtyping,the subtyping relation is instead based on the very structure of the types in question. Ifan Animal type declares the field name of type String and age of type Int, any type con-taining these fields may be considered a structural subtype of Animal.

Following Pierce [10], the structural subtyping relation <: will be expressed usingthree different rules defining permutation, width and depth subtyping. The permutationsubtyping rule states that a record type is a subtype of another record type if it consistsof a permutation of the same fields.

{k1 : S1, k2 : S2, ..., kn : Sn} is a permutation of {l1 : T1, l2 : T2, ..., ln : Tn}{k1 : S1, k2 : S2, ..., kn : Sn} <: {l1 : T1, l2 : T2, ..., ln : Tn}

PERMUTATION

6 CHAPTER 2. BACKGROUND

This rule allows record types to be viewed as unordered collections of fields. For example{name : String, age : Int} and {age : Int, name : String} are subtypes of each other andcan be used interchangeably.

The next rule is width subtyping:

{l1 : T1, l2 : T2, ..., ln−k : Tn−k} <: {l1 : T1, l2 : T2, ..., ln : Tn}WIDTH

For ordered records this means that a record type is a supertype of another record type ifit is a prefix of the other record type. If combined with the permutation rule however,a high degree of flexibility is achieved where a record type is a supertype of anotherrecord type if contains any subset of its fields. For example, the type {name : String}becomes applicable to all records containing a name field of type String in any position.

The third rule, depth subtyping, recursively applies the subtyping relation to a recordtype’s fields:

for each i Si <: Ti{l1 : S1, l2 : S2, ..., ln : Sn} <: {l1 : T1, l2 : T2, ..., ln : Tn}

DEPTH

With these three rules in place, we can define our getter function once and for all as

getName := λx : {name : String}. x.name

and then apply it to any record containing a name field of type String or a subtype ofString.

Casting, Coercion and Equality Not all type systems that support some kind of struc-tural subtyping do it in its most general form as described above, but the type conver-sion from a type to a structural supertype may be more or less restricted. In, for exam-ple, OCaml an object1 of type {name : String, age : Int} may only be assigned to a refer-ence of type {name : String} by applying an explicit coercion operator and afterwards itis not possible to down-cast to get the hidden fields back [12]. Since the coercion is stat-ically type-checked to respect the structural subtyping relation however, OCaml can stillbe said to support some kind of limited structural subtyping.

This thesis follows the terminology used by Pierce [10] regarding casts and coercion;Casting is defined as the operation of changing the type of a value without changing theunderlying value itself. As such, it is a purely static operation only affecting the type-level of a program. Coercion on the other hand lets a value of a certain type be applied ina context requiring another type by actually creating a new value of the target type fromthe original value.2

Type-casts can either change a type from a subtype to a supertype, known as up-castor widening, or from a supertype to some subtype, known as down-cast or narrowing.In Scala (and also the lambda calculus with subtyping developed in [10]) up-casts alwayssucceed and can be both explicit or implicit, whereas down-casts may generate a runtimeexception and must always be explicit using .asInstanceOf[T] [15].

1In OCaml, objects are structurally typed.2Using this terminology, casting is performed in Scala either implicitly by assignment or explicitly by using

the .asInstanceOf[T] method, whereas coercion is performed using some conversion method of the form .toT

(at least for reference values, both actually perform coercion for primitive values) [15].


Coercion is not bound to follow some class-hierarchy (we may for example coercethe string "12" to the integer 12) and may or may not discard data in the process (forexample by coercing the float 12.34 to the integer 12). This in turn affects subsequentequality checks. Consider the following example where a record r containing the fieldsname and age is coerced (here denoted by the as operator) and assigned to a reference s

of a type containing a name field only.

r := {name = "Mme Tortue", age = 123}s := r as {name : String}s == r // ?

If the coercion discards the age data, it is natural for the equality check to fail. If the co-ercion on the other hand keeps the runtime age data around and merely hides it fromthe static type, it is presumably up to the language specification to decide what shouldhappen.

2.2.2 Bounded Quantification

In the presence of subtyping, the subtyping relation can be used to express a form ofparametric polymorphism called bounded quantification as described by Cardelli andWegner [16]. In contrast to universal quantification where the type parameter rangesover the whole universe of types, bounded quantification restricts the type parameterto only range over the subtypes of a given type bound. Knowing a base type for the typeparameter it is then possible to do operations on the polymorphically typed arguments,for example access fields.

The getName function from before can be applied to all types that are subtypes of{name : String}, and with bounded quantification this can be expressed as

getName := λR <: {name: String}. λr:R. r.name

Since R may only be instantiated to subtypes of {name : String} it is safe to do the fieldaccess on the r parameter and the function type-checks.

This form of polymorphism has the benefit that the type parameter can capture thefull type of a function’s argument and refer to it later, for example in the return type.Consider the following function that selects the record with highest age.

oldest := λR <: {age: Int}. λa:R. λb:R. if (a.age >= b.age) a else b

If this function is applied to the arguments

a := {name="Achilles", age=24}

b := {name="Mme Tortue", age=123}

the parameter R will capture both the name and the age field allowing the return type tobe the full type signature {name: String, age: Int}. This would not be possible usingstructural subtyping on the function arguments as all static information about any addi-tional fields except age would be lost.


2.2.3 Other Forms of Parametric Polymorphism

Using bounded quantification it is possible to let a type parameter capture a record typewhile keeping some information about the present fields so that they can be accessed ina type safe way. Several type systems have been proposed to provide similar functional-ity without relying on subtyping.

Wand [17] introduced the notion of a row variable to achieve extensible record typeswith polymorphism in a context without subtyping. A row is defined as a set of fieldsrepresented as a partial function ρ from labels to types, and a record type is written asa product over this set, Πρ. A row can be extended with a new field l : T or have anew type T associated with an existing field labeled l by extending the partial function,written as ρ [l← T]. For example, the expression r with name := "Achilles" extendsthe record r with a name field and given that r has record type Πρ the extended recordhas type Πρ [name← String]. Row extension is also used to express that certain fields arepresent, similar to bounded quantification. For example the getName function above hasthe type ρ [name← String]→ String in Wand’s system.

It was later shown that the proof for complete type inference was incorrect in theoriginal paper [18], but the idea of using row variables to represent unknown recordfields has seen many applications since. In, for example, OCaml polymorphic object typesare expressed using an anonymous row variable denoted by .. (ellipsis). Such a type iscalled open and <name: string; .. > represents an object containing an arbitrary numberof methods in addition to name: string [19].

Ohori [14] developed another typed lambda calculus for polymorphic records and im-plemented it as an extension for Standard ML (SML) called SML#. Ohori’s system usekinded quantification to restrict the set of types a type parameter ranges over. The quan-tification ∀t :: k restricts the type parameter t to range only over the record types repre-sented by the kind k, and a record kind is defined as a set of fields {{l1 : T1, ..., ln : Tn}}.Using this system the getName function above has type ∀t :: {{name : String}}.t → String,where t ranges over the kind of all record types containing a name field of type String.

2.3 The Scala Language

Scala is a statically typed language that runs on the Java Virtual Machine (JVM). It ismulti-paradigm and provides the usual object oriented abstractions such as classes withinheritance and a form of interfaces called traits, as well as functional concepts such asfirst class functions, algebraic data types and pattern matching.

Classes, case classes, objects and traits Classes are declared with the class keyword.Member fields are declared to be mutable with var and immutable with val. Methodsare defined with def. All statements in a class declaration body are part of the class con-structor, allowing concise class declarations such as the following:

class Person(_name: String, _age: Int){val name = _namevar age = _agedef birthday(): Unit = {age = age + 1}

}


Unit is a type with only one member (), analogous to void in C-style languages. Typeascriptions are placed to the right of a colon character :, but can in many cases be leftout thanks to local type inference. The class is instantiated using the new keyword, forexample val p = new Person("Achilles", 24).

Scala also has a special kind of class called case class providing equality by valueand pattern matching by default. A case class Person with immutable public values name

and age can be declared as

case class Person(name: String, age: Int)

and instantiated by the expression Person("Mme Tortue", 123) without using new. Theconstructor parameters are public vals by default, and the arguments determine caseclass equality and allows pattern matching:

p match {case Person("Mme Tortue", age) => "Hello, you "+age+" year old turtle!"case Person("Achilles", _) => "I used to be an adventurer like you..."

}

In addition to classes, Scala also has singleton objects declared by the object key-word. Each class has a companion object with the same name where static membersassociated with the class can be defined.

object Person {def birthday(p: Person) = Person(p.name, p.age+1)

}

A trait is like an interface but with optional default implementations. A class can in-herit from a single parent class and several traits using the extends and with keywords.For example a Cat class can inherit from a Animal parent class and mix in behavior fromthe Purrer and Hunter trait as follows:

class Animal { def eat() = ... }trait Purrer { def purr() = ... }trait Hunter { def hunt(prey: Animal) = ... }class Cat extends Animal with Purrer with Hunter

The type Animal with Purrer with Hunter is called a compound type.Algebraic data types are implemented in Scala using traits and case classes. For ex-

ample a Tree data type consisting of a sum of types Node and Leaf where Node is a prod-uct of two Trees can be implemented as:

trait Treecase class Node(left: Tree, right: Tree) extends Treecase class Leaf() extends Tree


Type parameters and variance Scala classes can be parameterized by adding a type pa-rameter in square brackets:

case class Box[T](x: T)

By default the class type is invariant in the type parameter, so for a class A and B where B

is a subtype of A there is no subtyping relation between Box[B] and Box[A]. By adding a+ (plus) modifier a class is declared covariant in the type parameter. Thus, implementingthe Box class as

case class Box[+T](x: T)

makes the type Box[B] a subtype of Box[A]. Similarly, a class is made contravariant byadding a - (minus).

Generic functions and Bounded Quantification It is also possible to parameterize func-tions using the same square bracket notation:

def createBox[T](x: T) = new Box[T](x)

Bounded quantification is achieved by specifying upper or lower bound using the <: and>: operators respectively, for example:

def putCatInBox[T <: Cat](a: T) = { a.purr(); new Box(a) }

Here, the type parameter T ranges only over subtypes of Cat so that it is safe to call Purr()on the argument a. Note that the last expression in a block is the return value by default.

Implicit arguments A function can take an extra argument list with implicit arguments.These arguments can be omitted by the caller and are inserted automatically by the Scalacompiler.

def goForAHunt(cat: Cat)(implicit prey: Rat) = cat.hunt(prey)

A value is eligible for being inserted as an implicit argument if it is declared by the im

plicit keyword:

implicit val rat = new Rat()goForAHunt(cat) // rat inserted automatically

The implicit resolution process looks for such implicit arguments in the current scopeand in the companion object of the Rat class. If no valid implicit argument is found or ifseveral valid implicits are found it is a compile-time error.


Implicit conversions Scala also allows values to be implicitly converted to other valuesby declaring implicit conversion functions, for example turning a Person into a Cat:

implicit def metamorphosis(x: Person): Cat = ...

If such an implicit conversion is in scope, it is possible to call methods from the Cat classon an instance of a Person and the compiler will automatically insert a conversion fromPerson to Cat before calling the method:

val p = Person("Mme Tortue", 123)p.purr() // compiled to metamorphosis(p).purr()

Type classes Implicits can be used to codify type classes in Scala. Consider the typeclass Adder[T] that defines a binary add operation taking two instances of T and returnsthe sum of type T:

abstract class Adder[T]{def add(a: T, b: T): T

}

A function that sums a list of elements implementing this type class can be defined as

def sumListOfAdders[T](l: List[T])(implicit adder: Adder[T]): T = {l.reduce( (x, y) => adder.add(x, y) )

}

where => denotes a lambda function. Any type can be made a member of the Adder typeclass by providing a suitable implementation of the Adder class. For integers it might beimplemented as:

implicit object IntAdder extends Adder[Int] {def add(a: Int, b: Int): Int = a + b

}

If the sum function is applied to a list of integers, implicit resolution will look in the cur-rent scope, in the companion object of the Adder class and in the companion object of theInt class for an implementation of Adder[Int]. Given the IntAdder implementation theimplicit resolution succeeds and the list can be summed.

Def Macros Def macros are methods that are expanded into abstract syntax trees (ASTs)inlined at the call site during compilation. Macros are divided into whitebox and black-box. The main difference is that whitebox macros allow the type of the expanded expres-sion to be more specific than the declared return type whereas black box macros do not.This allows whitebox macros to interact with the typer in interesting ways. It is for ex-ample possible to implement a record as a hash map from String labels to Any values


and then let field selection be implemented as a whitebox macro that refines the returntype to the specific type of the accessed field. See the scala-records and Compossible li-brary described in Chapter 4 for examples of this technique.

Implicit materializer macros Macros can also be used to instantiate implicits duringimplicit resolution. One application is to materialize type class implementations depend-ing on the type parameter. For example, Adder implementations can be materialized bydefining an implicit macro returning an Adder[T];

implicit def materializeAdder[T]: Adder[T] = macro materializeAdder_impl[T]

This way it is not necessary to provide a separate implementation of Adder for every pos-sible T, but the macro can inspect the type parameter T and provide a suiteable imple-mentation at compile-time as needed.

Dotty Dotty [20] is an experimental compiler for Scala, implementing new languageconcepts and features that will eventually replace Scala 2. For this thesis, important changesinclude the introduction of singleton types, intersection types, and a new compilationscheme for structural refinement types.

Singleton types are types with only one inhabitant. For example the string literal"foo" is a member of the singleton type String("foo") that is a subtype of String butwith only this single member. Scala currently assigns such types to literal constants dur-ing typing and they can be assigned to values returned by whitebox macros, but is is notpossible to express these types explicitly in program text. In Dotty it is possible to ex-press these types by simply writing the literal constant in the type namespace, for exam-ple Box["onlythisstring"] [21, 20].

As the name suggests, the intersection of two types A and B is a type whose mem-bers are restricted those included in both A and B. In Dotty, type intersection is expressedusing the & operator and replaces the compound with statement in current Scala. Typeintersection is commutative and recursive in covariant type members so that for exampleList[A] & List[B] is equivalent to List[B] & List[A] and List[A & B] [20].

Scalas current structural refinement types are described in Section 4.1 and the newcompilation scheme for Dotty is described in Section 4.6.

Chapter 3

Method

The main research question guiding this thesis is the following:

What are the possible approaches to record types in Scala and what are their respectivestrengths and weaknesses?

Here, "possible approaches" include both existing and novel implementations, and in or-der to answer this question the thesis is divided into three main parts:

Chapter 4, Description of Existing Approaches: A detailed description of existing ap-proaches to records in Scala covering their implementation, syntax and supportedfeatures.

Chapter 5, Comparison of Existing Approaches: A structured qualitative comparison ofthe existing approaches, as well as a quantitative comparison of their runtime andcompile-time performance using a novel benchmarking suite.

Chapter 6, Analysis and Possible new Approaches: An analysis of the determined strengthsand weaknesses of existing approaches, followed by an evaluation of possible newapproaches addressing as many of these weaknesses as possible.

The existing approaches to records in Scala covered by the first and second part are

• scala-records 0.3 [22]

• scala-records 0.4 [4]

• Compossible 0.2 [23]

• Shapeless 2.3.2 [24], as well as

• Scala’s built in anonymous classes with structural refinement types, and

• Records using Dotty’s new structural refinement types and Selectables [25].

The contents of the qualitative comparison are described in Section 3.1, and the methodused for the quantitative comparison is described in Section 3.2 below.

The focus of the last part, "Analysis and Possible new Approaches", is primarily onhow different forms of record subtyping and polymorphism interact with possible under-lying data structures and how this in turn affects runtime performance. New approachesare suggested and evaluated using the same benchmarking suite and methodology asused for existing approaches.

13

14 CHAPTER 3. METHOD

3.1 Qualitative Comparison

The qualitative comparison will summarize the results from the description in Chapter 4by looking at the following aspects:

• Access Syntax What is the syntax for field access?

• Equality semantics. Is equality by reference or by value?

• Type safety. Is field access typed, and is it a compile-time error to access nonexis-tent fields?

• Subtyping. Is field permutation, width subtyping and/or depth subtyping sup-ported?

• Explicit types. Can record types be expressed explicitly in program text?

• Parametric Polymorphism Is bounded quantification supported, or some otherform of parametric polymorphism?

• Extension, restriction, update, relabeling. Are some of these operations supportedfor monomorphic or polymorphic record types?

• IDE support. What is the library support in Eclipse and IntelliJ respectively?

• Other. What other features of interest do the libraries provide?

The answers to these questions are provided by documentation, source code inspectionand by REPL session examples in the descriptions. In Section 5.1 the results are compiledinto a structured feature matrix.

3.2 Quantitative Comparison

A novel benchmarking library called Wreckage [26] was built to be able to measure theruntime and compile-time performance of various approaches to records on the JVM. TheWreckage library is built on top of the Java Microbenchmark Harness (JMH) and is ca-pable of generating, building and running benchmarking code written in Scala, Dotty,Java and Whiteoak1. The Wreckage library is publicly available at https://github.com/obkson/wreckage.

3.2.1 Wreckage Benchmarking Suite Generator Library

Benchmarking code running on the JVM is a non-trivial task. The result of a benchmarkdoes not only depend on system factors such as the virtual machine it is run on, thegarbage collection algorithm in use and the heap size, but is also subject of nondeter-ministic just-in-time compilation, lazy class loading and optimization strategies such asdead-code elimination and loop unrolling [27, 28].

To overcome at least some of these difficulties the Wreckage library was built on topof the Java Microbenchmark Harness (JMH) developed by Oracle [29]. JMH is a widely

1Whiteoak [6] is Java extension that brings structural typing to the Java language, discussed in Sec-tion 6.4.4.3 and 8.2.

CHAPTER 3. METHOD 15

used framework for benchmarking on the JVM [30] that makes it possible to preventdead code optimization, garbage collection and other disturbing events from happeningduring benchmark.

The JMH documentation recommends putting the benchmarking source files in astandalone maven project that imports the code to be benchmarked as a library depen-dency. Then a custom build process using JMH bytecode generators builds and packagesthis project into an executable JAR-file containing everything needed to run the bench-marks. The Wreckage library respects this recommendation and is built as a source codegenerator capable of generating JMH benchmarking projects for records implemented inScala, Dotty, Java and Whiteoak.

To introduce some hopefully clarifying terminology, Wreckage can be described asa Benchmarking Suite Generator Library. For each record implementation that should bebenchmarked, a new scala project is created and the Wreckage library is imported. ABenchmarking Suite Generator is then created by subclassing the appropriate abstract JMHProjectBuilder class depending on used language (ScalaJMHProjectBuilder, DottyJMHProjectBuilder etc.) and implementing the missing methods needed to complete the im-plementation. The subclass should provide the follow missing pieces:

• A maven artifact identifier for the records library that should be benchmarked, al-ternatively a path to an unpublished JAR-file on the local file system.

• An implementation of a special RecordSyntax class, providing methods that de-scribe this record library’s particular syntax for record creation, field access, exten-sion, explicit type signatures etc.

• A list of benchmarks to include in the generated JMH benchmarking suite.

This generator is then compiled and run to generate a JMH Benchmarking Suite in theform of a standalone maven project containing a source file for each Benchmark. Fromthis point on, the project is built and packaged exactly as any other JMH project usingJMH’s custom build process to produce a standalone JAR that can be run to take themeasurements. The architecture and benchmark generation process is illustrated in Fig. 3.1.

The Wreckage library comes with prepared templates for a number of different bench-marks that do not depend on any particular record syntax. The templates contain allboilerplate needed in a source file to setup and run a benchmark with JMH, as well asthe methods that are to be benchmarked but with placeholders for all record operations.The JMHProjectBuilder class contains a main method that takes the provided informa-tion and injects it into the templates to create the final source files.

An alternative solution to source code generation, used by for example the scala-records-benchmarks suite [31], is to use macros to expand the benchmarks into the cor-rect abstract syntax trees for different record libraries during compilation. The main rea-son for choosing the slightly less elegant approach of generating source files that has tobe compiled in a separate compilation step is to make the benchmarks portable to otherJVM languages than Scala, such as Java, Whiteoak and Dotty. Furthermore, generatedsource files has the significant benefit of being easy to inspect and validate compared tomacro expansions.


Figure 3.1: The Wreckage Benchmarking Library architecture and benchmark generationprocess.


3.2.2 Runtime Benchmarks

The runtime benchmarks are micro benchmarks that measure the time it takes to executea single record operation such as record creation or field access in isolation. As one suchoperation typically takes less time than what is possible to measure accurately using thesystem clock, the execution time has to be measured as an average over multiple invoca-tions. JMH achieves this by calling the method as many times as possible during a spec-ified time bound, and then the total time2 is divided by the invocation count [29]. Onesuch sequence of invocations is called an iteration and accounts for one measurement.The benefit of this approach compared to running some predefined number of invoca-tions is that the total run time of the benchmarks become predictable and independent ofthe execution time of the benchmarked function (a really slow function is simply calledfewer times).

The Wreckage benchmarking library is currently capeable of generating the followingruntime benchmarks:

Creation Time against Record Size The time it takes to create a record is measuredas a function of the size of the created record. The record is created in a single expres-sion using field labels f1, f2,... up to the size of the record, and storing integer values1,2,....

The use of integer values will incur boxing and unboxing operations for some li-braries, which may affect the run time. On the other hand, numeric values are assumedto be a common payload in the kind of large scale scientific computations where runtime matter the most and so it seems reasonable to use such values in the benchmarks.

Access Time against Field Index Access time is measured as a function of the index ofthe accessed field. A record with 32 fields f1, f2, ..., f32 is created and used duringall measurements, and then the execution time is measured for accessing field f1 up tof32. For ordered records the index will correspond with the field’s position in the recordtype, whereas for unordered records the index merely identifies the field’s name.

Access Time against Record Size In the previous benchmark the record size was con-stant and the accessed field was varied. In this benchmark, a record of increasing size iscreated and for each record size the access time is measured for the field with the highestindex.

Access Time against Degree of Polymorphism For records that support subtyping, thedegree of polymorphism at a method call site is defined as the total number of differ-ent runtime record types that are represented among the receivers of the call. The gen-eral benchmarking technique is described by Dubochet and Odersky [32], and is hereimplemented as follows: An array of 32 records with different record types is created,but where all records have size 32 and a field named g1. The type of the array is de-clared as Array[{g1: Int}]. For each record type the other 31 fields are a set of n fieldsf1,f2,...,fn, and m = 31 − n fields h1, h2, ..., hm, and each record in the array hasa different n from 0 to 31. For records with ordered fields, the fields are stored in sortedorder as f1, f2,..., fn, g1, h1, h2, ..., hm.

2which may be slightly more than specified time bound to let the last invocation finish


To make a measurement of field access time at a call site with polymorphism degreed, the benchmark cycles over the first d records in this array during the measurementtime bound and in each invocation the field g1 is accessed on a record with a differenttype from the preceding invocation. Due to this cycling, each measurements also includea constant overhead of making an index increment modulo d and a record array accessin addition to the actual record field access.

3.2.3 Compile-Time Benchmarks

The Scala compiler is written in Scala and available as the Global class in the scala.tools.nsc

package. The compile-time benchmarks instantiates this Global compiler in an initialsetup phase and stores it in a global benchmark state. Before each iteration a new Global.Run

is instantiated and then the benchmark measures the execution time of running Run.compile

Sources on a prepared code snippet. Using this approach the compilation can be bench-marked by JMH as any other method or function, and there is no overhead of setting upthe compiler in the measured compile times.

The Wreckage benchmarking library is currently capeable of generating the followingcompile-time benchmarks:

Create The compile time is measured for a code snippet that creates a class containinga record:

class C {val r = {f1=1,f2=2,...}

}

Compile time is measured as a function of record size, and a linear factor is expected asthe length of the snippet also increases with record size.

Create and Access All Fields This benchmark extends the previous one with a field ac-cess operation for every field in the record:

class C {val r = {f1=1,f2=2,...}val f1 = r.f1val f2 = r.f2...

}

This is the same benchmark as is used by scala-records-benchmarks [31] to measure com-pile time, except that record creation is included in the snippet.

3.2.4 Statistical treatment

The raw JMH measurement data was treated in a post processing step to calculate meanexecution times with confidence intervals. The runtime and compile-time cases are de-scribed separately below.


3.2.4.1 Runtime Benchmarks

The runtime benchmarks measure steady state performance. This means that the first iter-ations are discarded as warm up runs allowing the JIT-compiled code to stabilize beforemaking measurements. The following approach suggested by Georges et al. [27] is usedto measure average steady state running time:

The total average steady state running time x̄ is calculated from n independent sam-ples from separate JVM processes, called VM forks in the following. Each such sampleis taken as follows: In VM fork i, a series of measurements xi,1, xi,2, ... are done. Start-ing from the the kth measurement, the coefficient of variation (CoV) is calculated on asliding window of the k previous measurements, defined as the standard deviation di-vided by the mean. When the CoV for such a window reaches below a threshold of 0.02,steady state is assumed and the mean of these k measurements is taken as the steadystate running time for this trial. That is, if steady state is detected for measurementsxi,j−k+1, ..., xi,j the mean x̄i is calculated as

x̄i =1

n

j∑l=j−k+1

xi,l

These k measurements are not statistically independent as they are run on the same JVMand are chosen based on their CoV. To get independent measurements the above processis instead repeated n times in separate VM forks, generating samples x̄1, x̄2, ..., x̄n. Theoverall average steady state running time is taken as the mean over these samples:

x̄ =1

n

n∑i=1

x̄i.

The standard deviation s is then calculated as usual as

s =

√√√√ 1

n− 1

n∑i=1

(x̄i − x̄)2.

Modeling these n measurements x̄1, x̄2, ..., x̄n as independent samples from the same dis-tribution with mean µ, the transformed variable

t =(x̄− µ)

s/√n

can be assumed to follow the Student’s t-distribution with n−1 degrees of freedom. Con-fidence intervals for a confidence level of 99.9 % may then be computed around x̄ as

(x̄− t0.9995,n−1s√n, x̄+ t0.9995,n−1

s√n

).

Here, t0.9995,n−1 is defined so that for a random variable T following the Student’s t-distribution with n− 1 degrees of freedom it holds that the probability

Pr[T ≤ t0.9995,n−1] = 0.9995.

In all experiments n = 10 VM forks was used.


The above scheme for dynamically detecting when steady state has occurred basedon the measurement data is somewhat at odds with how JMH is designed, since JMHonly allows a fixed number of warm up runs to be specified and a fixed number of mea-surements to be taken after that. Instead of using JMH’s built in warmup feature, a se-quence of 20 raw measurements was taken using JMH without any warmup, and thenthe above algorithm was run in a post-processing step on the raw data with k = 10. Ifsteady state was not reached by the end of the sequence, the mean over the last 10 mea-surements was taken anyway.

3.2.4.2 Compile-Time benchmarks

For compile-time benchmarks the steady state performance is less relevant, since com-pilation is typically a one-time job. For these benchmarks JMH’s Single Shot mode isused, which measures the execution time of a single invocation, without any precedingwarm up runs. The measured compile times are greater than the granularity of the sys-tem clock with good margin (seconds contra microseconds) and so there is no need totake the average across several invocations to get a single measurement. Several indepen-dent Single Shot trials are instead made in separate VM forks, allowing mean compiletime with confidence intervals to be calculated in the same fashion as described for run-time benchmarks.

Chapter 4

Description of Existing Approaches

This chapter provides an overview of the current major implementations of records forScala: scala-records [22, 4], Compossible [23] and Shapeless records based on HLists [24],as well as an implementation of records using Dotty’s new structural refinement typesand Selectable trait [25]. For each approach, the basic implementation strategy is de-scribed as well as the support and syntax of common records features.

The overview includes both scala-records v0.3 and v0.4 although the latter is essen-tially an improvement over the former in every respect. There are two reasons for this:First, the documentation [4] of scala-records v0.4 mentions weaknesses and problemsthat are actually fixed in version 0.4, but do apply to version 0.3. Thus, by including ver-sion 0.3 here the background of the claims in the official documentation can be betterunderstood. Second, the weaknesses are fixed in v0.4 by making significant changes inhow the record types are represented in Scala’s type system. Thus, it is of interest for apossible new approach to investigate and compare these two different representations.

Before describing any of these libraries though, the possibilities and limitations ofScala’s native support for anonymous objects and structural refinement types is investi-gated.

4.1 Scala’s Structural Refinement Types

Scala has had native support for structural typing since version 2.6 [33] making it pos-sible to cast any conforming class instance to a structural type and then calling methodsdeclared on that structural type. For example, the following is valid Scala code

class Turtle { def run() = println("Slowly crawling along the race track...") }class Achilles { def run() = println("Pushing it to the limit!") }type Runner = { def run(): Unit }def race(a: Runner, b: Runner) = {a.run()b.run()

}race(new Turtle(), new Achilles())

Here, Runner is a structural type declaring a run method. By width and depth subtypingboth Turtle and Achilles are considered structural subtypes of Runner and can thus both

21

22 CHAPTER 4. DESCRIPTION OF EXISTING APPROACHES

participate in the race, without declaring this subtyping relation nominally.By combining Scala’s anonymous classes and refinement types with structural typ-

ing, much of the functionally normally associated with records can actually be achievedwithout any library support at all.1

4.1.1 Basic Features

The following is a quick overview of the various record-like features supported by struc-turally typed anonymous classes.

Create An instance of an anonymous class can be created with record fields in the formof val definitions:

scala> val r = new {val name="Mme Tortue"; val age=314}r: AnyRef{val name: String; val age: Int} = $anon$1@403c3a01

The result type shown in the REPL is evidently a structural refinement of AnyRef.

Access Fields are accessed as usual with dot-notation:2

scala> val n = r.namen: String = Mme Tortue

Equality Equality is by reference which may or may not be what we want:

val r = new {val name="Mme Tortue"; val age=314}val s = new {val name="Mme Tortue"; val age=314}

scala> r == sres7: Boolean = false

Type safety We get all the usual type safety guarantees as for normal class instances.Field access is type checked:

scala> val n: Int = r.name<console>:12: error: type mismatch;found : Stringrequired: Int

and it is a compile error to access non-existent fields:

1In section 4.1.2 and 5.2.1.4 we will see why this thesis does not stop here however; the reflective calls usedto realize Scala’s structural typing comes with a non-negligible performance cost on the JVM.

2The code samples issue a feature warning unless the compiler option -language:reflectiveCalls is set orimport scala.language.reflectiveCalls imported.

CHAPTER 4. DESCRIPTION OF EXISTING APPROACHES 23

scala> val a = r.address<console>:12: error: value address is not a member of AnyRef{val name: String; val

age: Int}

Subtyping As noted above, any class can be cast to a structural type. Thus, the follow-ing up-cast works as expected:

scala> val r: {val name: String} = new {val name="Mme Tortue"; val age=123}r: AnyRef{val name: String} = $anon$1@544d57e

as well as for function arguments

scala> def getName(x: {val name: String}) = x.namegetName: (x: AnyRef{val name: String})String

scala> getName(r)res16: String = Mme Tortue

Bounded Quantification It is also possible to achieve parametric polymorphism withbounded quantification, exemplified by the oldest function

def oldest[R <: {val age: Int}](a: R, b: R): R = if (a.age >= b.age) a else bval t = new {val name="Mme Tortue"; val age=123}val a = new {val name="Achilles"; val age=24}

scala> oldest(a,t).nameres19: String = Mme Tortue

Least Upper Bounds The example of bounded quantification above used two recordswith identical fields. By casting the arguments records to their least upper bound (LUB),two heterogeneous records can be passed to function as well and the return type willhave as much information about the records preserved as possible:

val t = new {val name="Mme Tortue"; val age=123; val address="Zenos road 42, Elea"}val a = new {val name="Achilles"; val age=24}

scala> oldest(a,t)res22: AnyRef{val name: String; val age: Int} = $anon$1@4ebea12c

However, and perhaps surprisingly, this only works as long as one of the argument recordsis a direct supertype of the other. For example, the below does not work:


val t = new {val name="Mme Tortue"; val age=123; val address="Zenos road 42, Elea"}val a = new {val name="Achilles"; val age=24; val height=1.88}

scala> oldest(a,t)<console>:15: error: inferred type arguments [Object] do not conform to method

oldest's type parameter bounds [R <: AnyRef{val age: Int}]

Here, the LUB is inferred to be Object rather than {val name: String; val age: Int},and the type needs a little nudge in the right direction to see that there is a lower boundpossible:

scala> oldest(a: {val name: String; val age: Int},t)res28: AnyRef{val name: String; val age: Int} = $anon$1@32057e6

4.1.2 Implementation

Since the JVM does not support structural typing natively, Scala realizes this feature byusing reflection and polymorphic inline caches [32]. To be able to pass any conformingobject to a structural reference, the type of such references is erased to type Object dur-ing compilation. When a method is called on the object, Scala’s type system knows thatthe runtime class implements the method and that it can be safely called, but to convincethe JVM of this fact a reflective call is needed.

A method call a.f(b, c) where a is of a structural type, and b and c are of type B

and C respectively is thus mapped to:

a.getClass.getMethod("f", Array(classOf[B], classOf[C])).invoke(a, Array(b, c))

In [32] it is noted that such a reflective call is about 7 times slower than a regular calland that most of the time is spent in the getMethod lookup. Thus, to improve runtimeperformance a strategy using polymorphic inline caches is employed. The method han-dle is cached at each call site using the receiver’s class as key. The getMethod call is re-placed by cache lookup and reflection need only be performed the first time a method iscalled on a certain class.

The cache is implemented as linked list and so the lookup time grows linearly withthe degree of polymorphism at the call-site. For monomorphic and moderately polymor-phic call sites however, the caching mechanism is found to be satisfactory and a goodalternative to the generative technique used by Whiteoak v.1 [32, 6].

4.2 scala-records v0.3

The scala-records library uses structural refinement types on the type level and hashmaps on the value level. Whitebox macros are used to translate field access to direct


hash map lookups, instead of the reflective calls the Scala compiler would normally usefor the refinement types.

The essence of the approach is to translate record creation like

val r = Rec(name="Mme Tortue", age=123)

to a structural refinement of the trait Rec, adding name and age getter methods as well asa data container _data in the form of a HashMap:3

val r = new Rec {private val _data = HashMap[String,Any]("name"->"Mme Tortue", "age"->123)def name: String = macro selectField_impldef age: Int = macro selectField_impl// [other methods: toString, hashCode, dataExists etc...]

}

Since the name and age methods are implemented using macros, field access will not becompiled to reflective calls. Instead, field access such as

val n = r.name

is expanded by the selectField_impl macro to

val n = r._data("name").asInstanceOf[String]

Thus, reflection is avoided and we get the same runtime performance as for a HashMap.4

Supported Scala versions are 2.10.x and 2.11.x.


Create Records are created either by using named arguments

scala> val r = Rec(name="Mme Tortue", age=123)r: records.Rec{def name: String; def age: Int} = Rec { name = Mme Tortue, age = 123 }

or using arrow associations:

val r = Rec("name"->"Mme Tortue", "age"->123)r: records.Rec{def name: String; def age: Int} = Rec { name = Mme Tortue, age = 123 }

The resulting type of r is a refinement of Rec, as revealed by the REPL results above.

3This code is simplified and "re-sugared" for clarity. The compiler flag -Ymacro-debug-lite was used toinspect the real macro-expansion.

4Modulo a slight overhead from burying the hash lookup inside a series of interface calls in the actual im-plementation, see Section 5.2.


Access A record’s fields are accessed the same way as regular class fields using dot-notation:

scala> val n = r.namen: String = Mme Tortue

Equality Equality is by value:

scala> Rec(name="Mme Tortue") == Rec(name="Mme Tortue")res5: Boolean = true

Pattern Matching Pattern matching can be used to extract fields from a record (only inScala 2.11.x):

scala> val n = r match { case Rec(name) => name }n: String = Mme Tortue

Type-safety Representing the record as a structural refinement automatically gives basictype-safety; Return types are checked:


and it is a compile error to access non-existent fields:

scala> val a = r.address<console>:15: error: value address is not a member of records.Rec{def name: String;

def age: Int}

4.2.2 Lack of Explicit Types

Unfortunately, by implementing the fields as def-macros it is not possible to express recordtypes explicitly. The following compiles happily in both Scala 2.10.x and 2.11.x

scala> val s: Rec{def name: String; def age: Int} = rs: records.Rec{def name: String; def age: Int} = Rec { name = Mme Tortue, age = 123 }

but blows up at runtime when a field is accessed on the structurally typed reference s:

scala> s.name


warning: there was one feature warning; re-run with -feature for detailsjava.lang.NoSuchMethodException: $anon$1.name()

The feature warning reveals what goes on: After the assignment to s, the compiler nolonger translates field access to macro expansion, but rather calls the name method us-ing reflection (hence the feature warning). Since def-macros do not generate any actualcode at the declaration site, the JVM is right - there really is no such method declaredon the class of s. The issue is known as SI-7340 and is currently open (for all supportedversions of Scala 2.10 and 2.11, that is, up to Scala 2.10.6 and 2.11.11)

Effects on Subtyping The lack of ability to express types explicitly puts rather severelimitations on how the library can be used. Subtyping expressions and up-casts such asthe following will not work since the parent type cannot be expressed:

scala> val s: Rec{def name: String} = r

This also affects the possibility to define functions with record parameters:

def getName(x: Rec{def name: String}) = x.name

scala> getName(r)java.lang.NoSuchMethodException: $anon$1.name()

As long as the record types are inferred rather than stated explicitly, however, subtyp-ing works as expected. For example, the following works:

scala> var s = Rec(name="Achilles")s: records.Rec{def name: String} = Rec { name = Achilles }

scala> s = rs: records.Rec{def name: String} = Rec { name = Mme Tortue, age = 123 }

scala> s.nameres5: String = Mme Tortue

Thus, the subtyping itself works - we are just not allowed to express it explicitly inclient code.

Effects on Bounded Quantification There is no documented support for parametricpolymorphism, and the following basic application of Scala generics also breaks due toSI-7340:

def getName[R <: Rec{def name: String}](x: R) = x.name

scala> getName(r)java.lang.NoSuchMethodException: $anon$1.name()


4.2.3 Other Features

There is no documented, or otherwise known to the author, support for extension, re-striction, updating or renaming of fields. There are however other features worth men-tioning.

Case class conversion A record can be converted to a case class instance (explicitly aswell as implicitly if records.RecordConversions are imported), provided that the caseclass is of a structural supertype:

case class Tortoise(name: String, age: Int)

scala> val c = r.to[Tortoise]c: Tortoise = Tortoise(Mme Tortue,123)

and if the fields do not match it is a compile-time error:

var s = Rec(name="Achilles")scala> s.to[Tortoise]<console>:18: error: Converting to Tortoise would require the source record to have

the following additional fields: [age: Int].

The conversion is one-directional however, and a record cannot automatically be createdfrom a case class instance.

Backend Agnostic Another interesting feature of the scala-records library is that it isprepared so that it is easy to provide a custom backend for storing and fetching the ac-tual data. By extending the core classes of the library the default _data hash map seenabove may be overridden. An example use-case given in the documentation is to usescala-records as an interface for type safe database queries.

IDE Support Eclipse IDE has support for whitebox macros, and since the fields are de-clared as methods they are included in the autocompletion-feature. IntelliJ on the otherhand relies on static code analysis and does not support whitebox macros.

4.3 scala-records v0.4

In scala-records v0.4 the records type signature has changed, and the refinement type hasmoved inside a type parameter on the Rec trait. What was Rec{def name: String; def

age: Int} in scala-records v0.3 is now Rec[{def name: String; def age: Int}]. Mostimportantly, this solves the SI-7340 problem so that types can now be written explicitly,opening up true structural subtyping capabilities. The new approach works as follows:5

Record creation

val r = Rec(name="Mme Tortue", age=123)

5Again, the description is somewhat simplified for increased conceptual clarity.


is still translated to a structural refinement of the Rec trait, but it looks a bit different:

val r = new Rec[{def name: String; def age: Int}] {private val _data = Map[String, Any]("name"->"Mme Tortue", "age"->123)

}: Rec[{def name: String;def age: Int}]

Note that the name and age methods are no longer present, and the refinement merely in-jects the hash map holding the actual data. Instead, field selection is implemented throughan implicit conversion macro declared on the companion object:

object Rec extends Dynamic {// [... record creation using dynamics etc...]

implicit def fld[Fields](rec: Rec[Fields]): Fields= macro accessRecord_impl[Fields]

}

If a field is accessed like r.name the implicit macro expands the record reference r to anew structural refinement by inspecting the type signature in the Fields type parameter.This new refinement has the original record r embedded as a private value, but other-wise looks more like the old scala-records v0.3 refinement with name and age methodsdeclared. Since this expansion creates an object that implements the name-method, it isaccepted as a valid conversion by the implicit resolution algorithm and we get:

val n = (new {private val __rec = rdef name: String = macro selectField_impl[String];def age: Int = macro selectField_impl[Int]

}).name

This expression is then transformed in a second macro expansion of selectField_implinto the actual hash-lookup:

val n = r._data("name").asInstanceOf[String]

Again, yielding HashMap performance for field access.


Records are created as before, and the new type signature is visible in the REPL:

scala> val r = Rec(name="Mme Tortue", age=123)r: records.Rec[AnyRef{def name: String; def age: Int}] = Rec { name = Mme Tortue, age

= 123 }

Otherwise, all the basic features from scala-records v0.3 are unchanged.


4.3.2 Explicit Types

Now, record types can be written explicitly without a problem:

scala> val s: Rec[{def name: String; def age: Int}] = rs: records.Rec[AnyRef{def name: String; def age: Int}] = Rec { name = Mme Tortue, age

= 123 }

scala> s.nameres2: String = Mme Tortue

The s reference is not a structural refinement type anymore and does not even imple-ment the name method. Thus, the implicit conversion takes over at field access and it canbe achieved without reflection as described above.

Subtyping With SI-7340 out of the way, we get access to full structural subtyping:

def getName(x: Rec[{def name:String}]) = x.name

scala> val n = getName(r)n: String = Mme Tortue

Bounded Quantification There is no documented support for parametric polymorphism,but the following basic use of generics now works as expected

def getName[R <: Rec[{def name: String}]](x: R) = x.name


This allows us to implement the oldest function from before:

def oldest[R <: Rec[{def age: Int}]](a: R, b: R) =if (a.age >= b.age) a else b

val a = Rec(name="Achilles", age=24)val b = Rec(name="Mme Tortue", age=123)

scala> val o = oldest(a,b)o: records.Rec[AnyRef{def name: String; def age: Int}] = Rec { name = Mme Tortue, age

= 123 }

Note that the returned record type has the name-field intact!


Figure 4.1: Eclipse whitebox macro support is broken for scala-records v0.4


The feature set is, as far as the author can tell, otherwise unchanged. With one excep-tion: Unfortunately, the Eclipse IDE6 whitebox support is no longer enough to supportthe field access, see Fig 4.1. It appears as though the second macro expansion does notget the AST from the first expansion as input, but rather the AST from before the first ex-pansion, and therefore fails. This does not seem to be a fundamental issue however, andcan be fixed by providing a custom macro expansion implementation for Eclipse. IntelliJsupport is unchanged.

4.4 Compossible

Where scala-records used structural refinement types to represent record fields, Compos-sible instead use Scala’s notion of compound types. A field is represented as a tuple ofits label and its type, and a label is in turn represented by its singleton string type. Forexample the field age:Int is represented as a Scala type as Tuple2[String("age"), Int],where String("age") is the singleton type with only one inhabitant; the string "age".Current versions of Scala do not openly support singleton types [21], but the typer bydefault assigns them to constant literals and so they can be read and created using white-box macros. The collection of fields f1: T1, f2: T2, ..., fn: Tn is represented by the com-pound type (String("f1"), T1) with (String("f2"), T2) with ... with (String("fn"),

Tn), using customary tuple notation. Similar to scala-records v0.4, Compossible then rep-resents an actual record type as a common base class Record[+Fields] with the fieldcompound in the covariant type parameter, and with the actual data in a hash map. White-box macros are used to translate record creation, access and other record operations tovalue level operations on the hash map and type level operations on the type parameter.

4.4.1 Creation through Extension through Concatenation

Compossible records are extensible, and records are also created by starting from a recordwith a single field and then adding fields one by one. A record with a name and age field

6Scala IDE build of Eclipse SDK, v4.5.0


is created by the following syntax:

val r = Record name "Mme Tortue" age 123

Here, the method name is first called on the Record class’ companion object. This ob-ject extends Dynamic and so the call is translated to an applyDynamic call, in turn imple-mented by a macro:

object Record extends Dynamic{def applyDynamic[K <: String](key: K)(value:Any): Record[(String, Any)]

= macro createMacro[K]// [... other methods ...]

}

The createMacro then goes on to create a record with the name field stored in a HashMap,and we end up with:

val r = (new Record[(String("name"), String)](Map("name" -> "Mme Tortue"))) age 123

Now, the age field is added to the record by calling applyDynamic on the Record class:

class Record[+T <: (String,Any)](val values: Map[String, Any]) extends Dynamic{def applyDynamic[K <: String](key: K)(value:Any): Record[(String, Any)]

= macro appendFieldMacro[K]// [... other methods, incl "def &", see below ...]

}

This appendFieldMacro is in turn implemented, not as record extension, but as the moregeneral record concatenation, or merge, operation implemented by the & method. That is,a new record is created containing the single age field

val r = (new Record[(String("name"), String)](Map("name" -> "Mme Tortue")) &(new Record[(String("age"), Int)](Map("age" -> 123))

and then concatenated with the first record. The concatenation method is implementedon the Record class as

def &[O <: (String,Any)](other: Record[O])= new Record[T with O](values ++ other.values)

and performs the merge without using any macro magic, just relying on Scala compoundtypes and hash map merge.


4.4.2 Extension and (Unchecked) Update

The mechanism used internally by the library above to create a record with multiplefields is also the mechanism used to extend an existing record with additional fields. Therecord r from above can be extended using the & method like so:

scala> val s = r & (Record phone "+4670123456" address "Zenos road 42")s: Record[(String("name"), String) with (String("age"), Int) with

(String("phone"), String) with (String("address"), String)]= Record(Map(name -> Mme Tortue, age -> 123, phone -> +4670123456,

address -> Zenos road 42))

Due to the way the concatenation is implemented, this mechanism can also be usedto update an already existing field.

scala> val s = r & (Record age (r.age+1))s: Record[(String("name"), String) with (String("age"), Int) with (String("age"),

Int)]= Record(Map(name -> Mme Tortue, age -> 124))

scala> s.ageres9: Int = 124

However, the somewhat strange type signature above reveals that this update is actuallynot type-safe, since Scala does not employ the required overwrite semantics for com-pound types. By changing the type of an already existing label and at the same timekeeping the old type signature an exception can be triggered

scala> var r = Record age "very old"r: Record[(String("age"), String)] = Record(Map(age -> very old))

scala> r = r & (Record age 123)r: Record[(String("age"), String)] = Record(Map(age -> 123))

scala> r.agejava.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String

4.4.3 Access and Select

As already seen above, the syntax for field access is through dot-notation. Since the Recordclass extends Dynamic, access like r.name is translated to a selectDynamic call. This is inturn implemented by a lookupMacro that insects the type of the prefix record r, and if afield with the right label exists translates the access into hash map lookup like

r.values("name").asInstanceOf[String]

Thus, field access is type checked:



and it is a compile error to access non-existent fields

scala> val a = r.address<console>:14: error: Record has no key .address

Besides accessing a single field, it is also possible to select multiple fields at once,thus projecting a record to a new record with a subset of the fields of the original record.To achieve this, a class called select is provided that represents the projection. A select

object is built in a fashion similar to how records are created, but consists of a compoundof stand-alone labels rather than whole fields:

val r = Record name "Mme Tortue" age 123 phone "+4670123456" address "Zenos road 42"

scala> val s = (select name & phone)s: select[String("name") with String("phone")] = select@61c3767e

scala> r(s)res2: Record[(String("name"), String) with (String("phone"), String)]= Record(Map(name -> Mme Tortue, phone -> +4670123456))

As demonstrated above, a record is projected by calling apply on it with a select object.This allows a very compact syntax where the above projection could be written inline asr(select name & phone).


Since labels are represented by singleton string types, Compossible types cannot be writ-ten explicitly in the program text.7 Instead a special class RecordType is provided as ameans of generating record types. The type corresponding to the r record above may begenerated by first creating an instance of RecordType, and then getting a path dependenttype Type on this instance.

scala> val rt = (RecordType name[String] & age[Int] &8)rt: RecordType[(String("name"), String) with (String("age"), Int)] =

RecordType@4bbad28f

scala> type NameAndAge = rt.Typedefined type alias NameAndAge

7This will be possible in future Scala versions though, see [21].8Here, the & symbols are not concatenation methods as before but actually just used as dummy arguments

that must be provided to overcome a restriction in Scala’s syntax disallowing type application for postfixoperators.


scala> val r: NameAndAge = (Record name "Mme Tortue" age 123)r: NameAndAge = Record(Map(name -> Mme Tortue, age -> 123))

The above construction works similar to how a record and a select is created. By chain-ing applyDynamic calls implemented as macros, a RecordType instance is created with thecompound field representation in its type parameter. The corresponding record type canthen be accessed on the abstract Type member as it is defined to reflect the fields in theRecordType’s type parameter:

class RecordType[T <: (String, Any)] extends Dynamic{type Type = Record[T]

// [... other methods ...]}

4.4.5 Polymorphism

Equipped with a way of expressing types, the subtyping and parametric polymorphismsupport can be investigated.

Subtyping The compound types satisfy permutation, width and depth subtyping rela-tions, and using a RecordType instance we can for example define the getName functionfrom before as:

val rt = (RecordType name[String] &)def getName(r: rt.Type) = r.name

scala> getName(Record name "Mme Tortue" age 123)res8: String = Mme Tortue

Bounded Quantification Parametric polymorphism with bounded quantification is notsupported however. The field access macro inspects the type parameter to determine if afield is present, but using bounded quantification the type of the accessed record is justan abstract parameter R and the macro actually breaks down with an exception:

val rt = (RecordType age[Int] &)

scala> def oldest[R <: rt.Type](a: R, b: R): R = if (a.age >= b.age) a else b<console>:14: error: exception during macro expansion:java.util.NoSuchElementException: head of empty list

...def oldest[R <: rt.Type](a: R, b: R): R = if (a.age >= b.age) a else b

^

This does not seem to be a fundamental issue with using compound types however.Field access could presumably be implemented by an implicit materialization macro


similar to the one scala-records v0.4 uses, or by detecting that the record type is in facta generic type parameter in the access macro and in that case inspect the parameter’s de-clared upper type bound.

Least Upper Bounds The LUB inference problem of refinement types also applies tocompound types. It works as long as one type is a direct supertype of the other;

val a = Record name "Achilles"val t = Record name "Mme Tortue" age 123scala> if (true) t else ares3: Record[(String("name"), String)] = Record(Map(name -> Mme Tortue, age -> 123))

But if the LUB is some different type, things break down.

val a = Record name "Achilles" height 1.88val t = Record name "Mme Tortue" age 123scala> if (true) t else a<console>:15: error: type arguments [Product with Serializable] do not conform to

class Record's type parameter bounds [+T <: (String, Any)]

Again, the typer needs a little help from a friend:

val rt = (RecordType name[String] &)scala> if (true) t else a: rt.Typeres5: rt.Type = Record(Map(name -> Mme Tortue, age -> 123))


Equality Compossible does not implement any particular equality check, and is there-fore by reference:

val r = Record name "Mme Tortue"val s = Record name "Mme Tortue"scala> r == sres13: Boolean = false

Case class and Tuple conversion In contrast to scala-records, a Compossible record canbe created from a case class but no converted to the same. A record can however be con-verted to a tuple.

case class Person(name: String, age: Int)val p = Person("Mme Tortue", 123)scala> val r = Record.fromCaseClass(p)r: Record[(String("name"), String) with (String("age"), Int)]= Record(Map(name -> Mme Tortue, age -> 123))


scala> val t = Record.tuple(r)t: (String, Int) = (Mme Tortue,123)

IDE Support Due to the use of whitebox macros, the situation is the same as for scala-records; Eclipse can infer the types correctly whereas IntelliJ lacks support. In contrastto scala-records, Compossible lacks autocompletion in Eclipse since field access is imple-mented through selectDynamic rather than field selection on a refinement.

4.5 Shapeless 2.3.2

Shapeless is an extensive library for generic programming in Scala with a broad array ofuse-cases.9 At the core of many of the library’s features is a rich implementation of theheterogeneous list (HList) data type, and one of the many features built on top of thisdata structure is an implementation of extensible records. To cover the entire shapelesslibrary is well outside the scope of this thesis, and the following overview focuses exclu-sively on the parts of the library involving these records.

4.5.1 HList Records

An HList is, as the name suggests, a linked list where each element may have a uniquetype. A minimal implementation can be realized in Scala using traits and case classes asfollows:

trait HListcase class HCons[+H, +T <: HList](head: H, tail: T) extends HListcase class HNil extends HList

Each HCons element contains a value in the head field and a link to the rest of the list inthe tail field. A simple instance with a string element of value "Mme Tortue" and an in-teger element of value 123 can be constructed as

HCons("Mme Tortue", HCons(123, HNil()))

with resulting type

HCons[String, HCons[Int, HNil]]

By tagging each element head type with a label, a record like data type can be con-structed. Shapeless provides a trait KeyTag[L, V] where L is a type level representationof a field’s label and V is the field value’s type. Using string singleton types as labels, thefield (age: Int) is represented by a KeyTag as:

9see for example https://github.com/milessabin/shapeless/wiki/Built-with-shapeless for a list ofprojects that use shapeless in one way or the other.


Int with KeyTag[String("age"), Int]

Here the first Int is the type of the field, and the tagging is accomplished by creating acompound type with the KeyTag. By linking such tagged values, a record can be created.For example, the record {name="Mme Tortue", age=123} is represented on the value levelexactly as the HList above:

HCons("Mme Tortue", HCons(123, HNil()))

but with labeled type

HCons[String with KeyTag[String("name"), String],HCons[Int with KeyTag[String("age"), Int]

,HNil()]

]

Note however that the above implementation does not provide a way of actually creatingthe records so that they get the suggested typing, nor how to access the fields by label.Shapeless fills in these missing pieces by a clever use of type classes, implicit conversionsand whitebox macros.

4.5.2 Create

A shapeless record can be created in several different ways. First off, the labels are notactually limited to strings only, but can be any type that has a singleton type representa-tion, such as integers, symbols10 and objects. To keep this presentation simple, only thecase of string labels will be treated however. Given this choice of label type, one way ofcreating a record is by using Shapeless’ arrow operator ->>.

scala> val r = ("name" ->> "Mme Tortue") :: ("age" ->> 123) :: HNilr: ::[String with KeyTag[String("name"),String], ::[Int with

KeyTag[String("age"),Int], HNil]] = Mme Tortue :: 123 :: HNil

Several things are worth noting here. First, Shapeless version of the HCons class aboveis named ::, analogous to Scala’s built in (homogeneous) list constructor. Second, therecord fields are linked using a constructor method with the same name :: that may bewritten as a right associative infix operator due to Scala’s convention for method namesending with colons. And lastly, the arrow operator is called on each string label with thefield value as an argument, although Scala’s Strings do not define this operator. Thisis instead defined by Shapeless using an implicit materializer macro that converts eachstring label to an instance of a class named SingletonOps that do implement the ->>

method. The return value of this method is the field value tagged by the appropriate Key

10Symbols do not have singleton types in current Scala, but Shapeless provides a workaround where single-ton symbol types are represented as wrapped String singleton types


Tag; In the case of the age-field:

123.asInstanceOf[Int with KeyTag[String("age"), Int]]

The infix method :: then links the values together to create the final record.

Alternative record creation Another way of creating a record is to use the Dynamic trait’sapplyDynamicNamed method, exactly as scala-records:

val r = Record(name="Mme Tortue", age=123)r: ::[String with KeyTag[tag.@@[Symbol,String("name")],String], ::[Int with

KeyTag[tag.@@[Symbol,String("age")],Int],HNil]] = Mme Tortue :: 123 :: HNil

However, as can be seen in the resulting type signature, this instead creates a record us-ing symbols as labels.

Equality Since the records are built from case classes, equality is automatically by value:

scala> ("name" ->> "Mme Tortue") :: HNil == ("name" ->> "Mme Tortue") :: HNilres3: Boolean = true

4.5.3 Field Access

Field access is expressed as function application directly on the record with the label’sstring literal as key:

scala> val a = r("age")n: Int = 123

or equivalently by calling get on the record:

scala> val n = r.get("age")n: Int = 123

Again, the record does not implement these methods, but the :: element that r refersto is implicitly converted to an instance of a class named RecordOps, that do implementthem:

class RecordOps[L <: HList](val l : L) {def get(k: Witness)(implicit selector : Selector[L, k.T]): selector.Out =selector(l)

def apply(k: Witness)(implicit selector : Selector[L, k.T]): selector.Out =selector(l)

// ... other}


To be able to call these methods however, the string literal "age" has to be converted toan instance of Witness and implicit resolution has to find a suitable instance of Selector[L, k.T] for the record type L and witness parameter T, both described below.

The Witness This trait bridges the gap between label literals and their singleton typelevel representation. The Witness trait is declared as

trait Witness { type T; val value: T {} }

and holds both the type level representation of a field in the abstract type T and its valuelevel representation in the value field. Implicit materialization macros are used to createWitnesses from label literals. In the example above, the "age" literal is implicitly con-verted to an instance of

Witness { T = String("age"); value = "age" }

The Selector The Selector[L, K] trait implements a type class providing the methodapply(l: L) that takes a record of type L and returns the value for the field with labelK, casted it to the right field type. When an implicit selector of type Selector[L, K] isneeded, an implicit materialization macro instantiates it provided that the label K is presentin the record L. Otherwise an implicit not found error "No field $K in record $L" isgenerated.

In the field access example above, the following selector is created:

Selector[::[String with KeyTag[String("name"),String],::[Int with KeyTag[String("age"), Int]

,HNil]

],String("age")] {

type Out = Intdef apply(l: ::[String with KeyTag ... , HNil]): Int

= HList.unsafeGet(l, 1)}

where Hlist.unsafeGet(l, i) gets the element at index i from record l.

Putting it all together When r("age") is called, r is converted to a RecordOps that im-plements the apply method. The label "age" is converted to a Witness holding its typelevel representation. Furthermore, an implicit selector of type Selector[::[String with

KeyTag..., HNil], String("age")] is materialized by a macro. The selector has an applymethod that takes an HList as argument and returns the value stored at index 1, castedto an Int. RecordOps(r).apply("age") calls this selector with r, and 123: Int is returned.


Type Safety As noted above, the return types of field access is casted to the right type,giving type safety for field access:

scala> val n: Int = r("name")<console>:23: error: type mismatch;found : Stringrequired: Int

Furthermore, the implicit resolution only works for the Selector trait if the accessed fieldexists on the record type, and so it is a compile-time error to access nonexistent fields:

scala> val a = r("address")<console>:23: error: No field String("address") in record ::[String with... HNil]


Since Shapeless record types rely on singleton types for the labels, they cannot be writtenexplicitly in the program text. Shapeless provide at least two different ways to circum-vent this difficulty. First, it is possible to create Witnesses for the labels and get the typerepresentation from the type parameter T:

val ageLabel = Witness("age")val nameLabel = Witness("name")type rt = ::[String with KeyTag[nameLabel.T,String],

::[Int with KeyTag[ageLabel.T,Int], HNil]]

scala> val r: rt = ("name" ->> "Mme Tortue") :: ("age" ->> 123) :: HNilr: rt = Mme Tortue :: 123 :: HNil

As this is rather verbose and cumbersome to write Shapeless also provide another wayof expressing explicit types, using backticks to embed a record type in a path dependenttype:

type rt = Record.`"name" -> String, "age" -> Int`.T

scala> val r: rt = ("name" ->> "Mme Tortue") :: ("age" ->> 123) :: HNilr: rt = Mme Tortue :: 123 :: HNil

Since the Record object extends Dynamic the above path will result in a call to selectDy

namic with the embedded record type as a string argument. This method is then imple-mented by a whitebox macro that creates a type carrier in the form of an dummy instanceof unit (), casted to an anonymous refinement type with the desired record type in a pa-rameter T. The following pseudo-code illustrates the end result:

type rt = ( ().asInstanceOf[{ type T = {name: String, age: Int}}] ).T


In contrast to the RecordType class used by Compossible and the approach using Wit-nesses above, these embedded types can be expressed inline in type expressions, for ex-ample directly in a function type signature:

def getName(r: Record.`"name" -> String, "age" -> Int`.T): String = r("name")

However, the embedded types are limited to "Standard" types; it is not possible to referto fields holding custom class types or nested records.

class A

scala> type rt = Record.`"a" -> A`.T<console>:20: error: Malformed literal or standard type A

4.5.5 Subtyping

The HLists are fundamentally ordered, and so permutation subtyping is not provided.The limited form of width subtyping described in section 2.2.1 is not provided either, asthe :: class is not a subtype of HNil:

scala> val s: Record.`"name" -> String`.T = ("name" ->> "Mme Tortue") :: ("age" ->>123) :: HNil

<console>:20: error: type mismatch;found : ::[String with KeyTag[String("name"),String], ::[Int with ... , HNil]]required: ::[String with KeyTag[String("name"),String], HNil]

The elements are covariant in their value types however, so depth subtyping is provided

class Aclass B extends Aval fld = Witness("fld")

scala> val r: ::[A with KeyTag[fld.T, A], HNil] = ("fld" ->> new B) :: HNilr: ::[A withKeyTag[fld.T,A], HNil] = B@9c2b45e :: HNil

4.5.6 Parametric Polymorphism

Without permutation or width subtyping, it is not meaningful to express parametric poly-morphism through bounded quantification. The solution is to instead use the Selector

type class, and provide it along with argument records:

val nameLabel = Witness("name")def getName[L <: HList](r: L)(implicit sel: Selector[L, nameLabel.T]): sel.Out =

r("name")



At each site the function is applied to a record, the implicit resolution will provide animplicit Selector capable of accessing the name field of all HLists of type L, provided thatthe name field exists. When the field is then accessed on the record in the function body,this selector will be in scope for the implicit resolution process described above for fieldaccess. This way, field selection can be carried out in the polymorphic context exactly asin the monomorphic case above where all fields were known.

Note that this approach has strong similarities to the one suggested by Ohori [14]:The implicit selector can be viewed as a constraint or predicate on the type parameter,and the index to use for field selection is actually embedded in the implicit selector as aclosure for each call site, similar to Ohori’s indexing abstractions.

4.5.7 Other Type Classes

Shapeless does not only provide a type class for field selection, but support various otherrecord operations such as extension, restriction, update, relabeling, merge etc. All thesefeatures are implemented in the same consistent way, following the example of field ac-cess through the Selector class above: The method (select, update, ...) requires an implicitclass instance (Selector, Updater, ...) that implements the behavior for a particular recordtype (select field at index i, ...). This instance is in turn created by an implicit materializermacro, provided that the operation can be performed (field exists, ...).

The following is a short summary of the provided features and their syntax.

Extension Records can be extended by using the + operator:

scala> val s = r + ("address" ->> "Elea 42")s: ::[String with KeyTag[String("name"),String],

::[Int with KeyTag[String("age"),Int],::[String with KeyTag[String("address"),String],

HNil]]] = Mme Tortue :: 123 :: Elea 42 :: HNil

The corresponding type class is Updater, and a record can be extended in a polymor-phic context by passing along an implicit parameter of this type:

val addressLabel = Witness("address")type AddressStringField = String with KeyTag[addressLabel.T, String]

def addAddress[R <: HList](r: R)(implicit updater: Updater[R, AddressStringField]):updater.Out= r + ("address"->>"Elea 42")

scala> val s = addAddress(r)s: ::[String with KeyTag[String("name"),String],

::[Int with KeyTag[String("age"),Int],::[String with KeyTag[String("address"),String],

]]] = Mme Tortue :: 123 :: Elea 42 :: HNil


Note that the result type is represented as the path dependent type updater.Out on theUpdater instance.

Restriction Fields can be removed using the - operator:

scala> val anon = r - "name"anon: ::[Int with KeyTag[String("age"),Int], HNil] = 123 :: HNil

The corresponding type class is Remover, but for unclear reasons it is defined so that afunction taking such an implicit seem to require the following definition:

def removeName[Out <: HList, R <: HList](r: R)(implicit remover: Remover.Aux[R, nameLabel.T, (String, Out)]): Out =remover(r)._2

scala> val anon = removeName(r)anon: ::[Int with KeyTag[String("age"),Int],HNil] = 123 :: HNil

Update If a field already exists when using the extension operator +, the value will beupdated:

scala> val s = r + ("age" ->> (r("age") + 1))s: ::[String with KeyTag[String("name"),String],

::[Int with KeyTag[String("age"),Int], HNil]] = Mme Tortue :: 124 :: HNil

If the type of the new value is different from the old one, the new value will be storedlast in the record while keeping the old one:

scala> val s = r + ("age" ->> "very old")s: ::[String with KeyTag[String("name"),String],

::[Int with KeyTag[String("age"),Int],::[String with KeyTag[String("age"),String],

HNil]]] = Mme Tortue :: 123 :: very old :: HNil

This has the perhaps surprising consequence that when the label is subsequently ac-cessed, the old value which is "to the left" in the record will be returned instead of thenew:

scala> s("age")res30: Int = 123

To guard against this behavior, there is also a replace method that requires the newvalue to be of the same type, and otherwise there is a compile-time error:

scala> val s = r.replace("age","very old")<console>:24: error: could not find implicit value for parameter ev:Selector[::[String with KeyTag[String("name"),String],


::[Int with KeyTag[String("age"),Int],HNil]],

String("age")]{type Out = String}

This is achieved in a polymorphic context by taking both a Selector and an Updater forthe updated field:

def birthday[R <: HList](r: R)(implicit sel: Selector.Aux[R, ageLabel.T, Int],

updater: Updater[R, Int with KeyTag[ageLabel.T, Int]]): updater.Out= r + ("age" ->> (r("age") + 1))

scala> s = birthday(r)s: ::[String with KeyTag[String("name"),String],

::[Int with KeyTag[String("age"),Int],HNil]] = Mme Tortue :: 124 :: HNil

The selector has to be of type Selector.Aux instead of just Selector to specify that thetype of the age field is Int. The reason for this was not investigated further.

Relabel Using the renameField method an existing label can be changed to another one:

scala> val s = r.renameField("name", "nick")s: ::[String with KeyTag[String("nick"),String],


The corresponding type class is called Renamer:

val nickLabel = Witness("nick")def nameToNick[R <: HList](r: R)(implicit renamer: Renamer[R, nameLabel.T,

nickLabel.T]):renamer.Out= r.renameField("name", "nick")

scala> val s = nameToNick(r)s: ::[String with KeyTag[String("nick"),String],


scala> s("nick")res24: String = Mme Tortue

Merge Two records can be concatenated with overwrite from the right using the merge

method:


val r = ("name" ->> "Mme Tortue") :: ("age" ->> 123) :: HNilval s = ("name" ->> "Achilles") :: ("height" ->> 1.88) :: HNil

scala> val t = r.merge(s)t: ::[String with KeyTag[String("name"),String],

::[Int with KeyTag[String("age"),Int],::[Double with [String("height"), Double],

HNil]]] = Achilles :: 123 :: 1.88 :: HNil

The corresponding type class is Merger.

Other Other features include the possibility to convert a record to an HLists contain-ing only the labels using .keys, only the values using values or label-value pairs using.fields. A record can also be converted to its corresponding untyped Map[String,Any].

4.5.8 HCons Extension

Besides using from the Updater type class it is also possible to add more fields to an ex-isting record by using the :: (HCons) method. The following is a working approach toextending an HList record by adding a new element to its head:

val ageLabel = Witness("age")type AgeIntField = Int with KeyTag[ageLabel.T, Int]def addAge[R <: HList](r: R): AgeIntField :: R = ("age"->>123) :: rval r = "name" ->> "Mme Tortue" :: HNil

scala> val s = addAge(r)s: ::[AgeIntField,

::[String with KeyTag[String("name"),String],HNil]] = 123 :: Mme Tortue :: HNil

This extension is completely unchecked however, and already existing fields will natu-rally still be present in extended record:

val r = "age" ->> "very old" :: HNil

scala> val s = addAge(r)s: ::[AgeIntField,

::[String with KeyTag[String("age"),String],HNil]] = 123 :: very old :: HNil

Depending on the application, this might actually be a feature rather than a disadvan-tage; The new field will have precedence over the old field on subsequent field access:

scala> s("age")res3: Int = 123


and the old value can be restored by removing the added field:

scala> val t = s - "age"t: ::[String with KeyTag[String("age"),String], HNil] = very old :: HNil

This functionality is similar to the extensible records with scoped labels suggested byLeijen [34] where records are extended by adding new values to a stack for each labeland restricted by popping this stack.

4.6 Dotty’s New Structural Refinement Types

In Dotty, the way methods are called on structural refinement types has changed. Thenew implementation is described by Odersky [25] and summarized below.

4.6.1 Implementation

Let r be a value of structural refinement type C { fields } where C is a class type andfields is a set of declarations refining C. Furthermore, let f be a field that is a memberof fields but not a member of C. In current Scala, the structural field access r.f is com-piled into a reflective call as described in Section 4.1.2. In Dotty, field access is insteadtranslated into the following pseudo-code:

(r: Selectable).selectDynamic("f").asInstanceOf[T]

where Selectable is a trait defined as

trait Selectable extends Any {def selectDynamic(name: String): Anydef selectDynamicMethod(name: String, paramClasses: ClassTag[_]*): Any =new UnsupportedOperationException("selectDynamicMethod")

}

The cast to Selectable succeeds if C extends Selectable or if there exists some implicitconversion method from C to Selectable. Either way, the field access logic is handedover to the provided implementation of the selectDynamic method. This allows pro-grammable field access, and it is up to the implementation of the selectDynamic methodto take the accessed field’s name as a String parameter and return the correspondingvalue. This is very much like how the selectDynamic method works for the Dynamic markertrait in current Scala, although here the field access is type safe. Method calls that takearguments are instead translated into a call to selectDynamicMethod, returning a methodbased on the accessed method’s name and parameter class tags.

It is still possible to access structural members using Java reflection by importingthe implicit conversion method scala.reflect.Selectable.reflectiveSelectable. Thismethod converts any structurally typed object to a scala.reflect.Selectable that imple-ments the call like current Scala does.

As described by Odersky [25], the above compilation scheme for structural types al-lows a simple record class to be implemented as


case class Record(elems: (String, Any)*) extends Selectable {def selectDynamic(name: String): Any = elems.find(_._1 == name).get._2

}

By casting instances of this class to structural refinement types, fields can be accessedthrough the selectDynamic method. For example, a record r with fields name and age canbe created as:

val r = Record("name"->"Mme Tortue, "age"->123).asInstanceOf[Record{val name: String;val age: Int}]

Since the name and age fields are declared on the refinement type but are not members ofthe Record class, accessing for example the name field of r is translated into:

(r: Selectable).selectDynamic("name").asInstanceOf[String]

The selectDynamic method is called, the stored name value is found, and the return valueis casted to its statically known type String.

Allthough the above implementation clearly demonstrates the capabilities of the newstructural types, it is not very efficient. The access time of the elems list is linear in thenumber of stored fields, making a string comparison for every field until a match is found.Therefore, another implementation will be considered in the following where the elems

list is replaced by a hash map from the Scala collections library. The Record case class isdefined as follows:

case class Record(_data: Map[String, Any]) extends Selectable {def selectDynamic(name: String): Any = _data(name)

}

To avoid having to explicitly create an instance of a Map to pass as _data argument, thefollowing convenience method is provided on the companion object that creates a recordwith an immutable hash map as data-store:

object Record {def apply(_data: (String, Any)*) = new Record(_data = HashMap(_data: _*))

}

The following is a overview of the record-like features supported by this approach torecords in Dotty.


Create A record is created by calling apply on the companion object and casting theresult to a structural refinement of the Record case class:


scala> val r = Record("name"->"Mme Tortue", "age"->123).asInstanceOf[Record{val name:String; val age: Int}]

val r: Record{name: String; age: Int} = Record(Map(name -> Mme Tortue, age -> 123))

Note that the type signature is cleaner compared to current Scala with the val declara-tions removed on the refinement type.

Access Fields are accessed using dot-notation:

scala> r.nameval res1: String = "Mme Tortue"

Type-safety Type safety is provided by the structural refinement type and it is a compile-time error to access a non-existent field:

scala> r.address-- [E008] Member Not Found Error: <console>:12:2 -------------------------------12 |r.address

|^^^^^^^^^|value `address` is not a member of Record{name: String; age: Int}

For the name field that do exist, a call to r.name is translated into r.selectDynamic("name")

.asInstanceOf[String] so that the return type is type-checked:

scala> val n: Int = r.name-- [E007] Type Mismatch Error: <console>:11:13 ---------------------------------11 |val n: Int = r.name

| ^| found: String| required: Int

As noted by Odersky [25] however, the initial cast to a structural type presents a singlepoint of failure for this type-safety. If the cast is incorrect, everything breaks down:

scala> val e = Record("name"->"Mme Tortue", "age"->123).asInstanceOf[Record{val name:String; val age: String}]

val e: Record{name: String; age: String} = Record(Map(name -> Mme Tortue, age -> 123))

scala> e.agejava.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String

Equality Case class and Map equality is by value in Scala/Dotty and since Record is acase class with the data map declared as a parameter, this applies to records as well:


scala> Record("name"->"Mme Tortue").asInstanceOf[Record{val name: String}] ==Record("name"->"Mme Tortue").asInstanceOf[Record{val name:String}]

val res2: Boolean = true

4.6.3 Polymorphism

In this section the subtyping capabilities are investigated as well as the support for para-metric polymorphism.

Subtyping As for current Scala, the structural refinements support permutation, widthand depth subtyping:

scala> val s: Record{val age: Any} = rval s: Record{age: Any} = Record(Map(name -> Mme Tortue, age -> 123))

As before, it is therefore possible to define and call the getName function implemented as:

def getName(r: Record{val name: String}) = r.name

scala> getName(r)val res11: String = "Mme Tortue"

Least Upper Bounds The problem with inferring least upper bounds remains from cur-rent Scala; It works as long as one type is a direct supertype of the other (same r and s

as above):

scala> if (true) r else sval res21: Record{age: Any} = Record(Map(name -> Mme Tortue, age -> 123))

But otherwise all fields are lost:

val t = Record("name"->"Mme Tortue", "age"->123).asInstanceOf[(Record{val name:String; val age: Int})]

val a = Record("name"->"Achilles", "height"->1.88).asInstanceOf[Record{val name:String; val height: Double}]

scala> if (true) t else aval res22: Record = Record(Map(name -> Mme Tortue, age -> 123))

Bounded Quantification As for current structural refinement types, parametric poly-morphism with bounded quantification is supported:

def oldest[R <: Record{val age: Int}](a: R, b: R): R = if (a.age >= b.age) a else b


// val t: Record{name: String; age: Int}// val a: Record{name: String; age: Int}

scala> oldest(a,t).nameval res15: String = "Mme Tortue"

4.6.4 Extension

Disregarding the structural types for a moment, it is possible to add extra fields to arecord by using the built in add-operation on the data map:

scala> val s = Record(r._data + ("color"->"green"))val s: Record = Record(Map(name -> Mme Tortue, color -> green, age -> 123))

or even merge two records with overwrite from the right:

val t = Record("name"->"Mme Tortue", "age"->123)val a = Record("name"->"Achilles", "height"->1.88)

scala> val m = Record(t._data ++ a._data)val m: Record = Record(Map(name -> Achilles, height -> 1.88, age -> 123))

The question is how to represent extension on the type level. There is no documentedway of extending a record with additional fields in the description by Odersky [25], butDotty’s new intersection types provides at least a partial solution.

Extension by Intersection In Dotty, the compound type operator with is replaced bythe type intersection operator &. As noted in the background, type intersection is com-mutative and recursive in covariant type members [20]. For record types this impliesthe desired property that Record{val f1: T1} & Record{val f2: T2} is equivalent toRecord{val f2: T2} & Record{val f1: T1} and Record{val f1: T1; val f2: T2}. Un-fortunately, if the extension is in fact an update where the updated field gets a new type,the commutative and recursive property also means that Record{val f: T} & Record{val

f: S} is equivalent to Record{val f: T & S} instead of Record{val f: S}. This sectioninvestigates both the correct and the incorrect case.

The following experiment with intersection types reveals that Dotty’s implicit resolu-tion system is able to prove that the intersection of Record{val name: String; val age:

Int} and Record{val name: String; val height: Double} is equivalent to the mergedtype Record{val name: String; val age: Int; val height: Double}:

type Turtle = Record{val name: String; val age: Int}type Hero = Record{val name: String; val height: Double}type Merged = Record{val name: String; val age: Int; val height: Double}

scala> implicitly[Merged =:= (Turtle & Hero)]val res22: =:=[Merged, Turtle & Hero] = <function1>


This allows the merge of Mme Tortue and Achilles above to be typed as:

val t = Record("name"->"Mme Tortue", "age"->123).asInstanceOf[Turtle]val a = Record("name"->"Achilles", "height"->1.88).asInstanceOf[Hero]

scala> val m = Record(t._data ++ a._data).asInstanceOf[Turtle & Hero]val m: Turtle & Hero = Record(Map(name -> Achilles, height -> 1.88, age -> 123))

Since Turtle & Hero is equivalent to Merged, we should now be able to access name, ageand height on m. It works for the name field that is present on both Turtle and Hero butwhen accessing one of the non-overlapping fields, the Dotty REPL crashes:

scala> m.nameval res27: String = "Achilles"scala> m.ageexception while typing m.age of class class dotty.tools.dotc.ast.Trees$Select # 90491...[error] (run-main-3) java.lang.AssertionError: NoDenotation.owner...[error] (compile:console) Nonzero exit code: 1

unless m is given type Merged explicitly first:

scala> (m: Merged).ageval res0: Int = 123

Provided that the above bug is fixed in such a way that the above example works out inthe future, there is an even better alternative using the path-dependent types t.type anda.type on the merged records:

scala> val m = Record(t._data ++ a._data).asInstanceOf[t.type & a.type]val m: a = Record(Map(name -> Achilles, height -> 1.88, age -> 123))

scala> m.nameval res35: String = "Achilles"scala> m.ageexception while typing m.age of class class dotty.tools.dotc.ast.Trees$Select # 85855...

This also allows merge to be defined directly on the Record case class, hiding the unsafecast from client code:

case class Record(_data: Map[String, Any]) extends Selectable {def selectDynamic(name: String): Any = _data(name)def ++(that: Record) =

Record(this._data ++ that._data).asInstanceOf[this.type & that.type]}


scala> val m = t ++ aval m: a = Record(Map(name -> Achilles, height -> 1.88, age -> 123))

scala> implicitly[m.type <:< Merged]val res0: <:<[(Turtle(t) & Hero(a))(m), Merged] = <function1>

where the last line shows that the type of m is a subtype of Merged, as desired. It is harderto see how to define the extension operator accepting one key-value pair at a time how-ever, since there is no direct way of translating the new field label to its type representa-tion:

def +[T](kv: (String, T)) =Record(this._data + kv).asInstanceOf[this.type & Record{val ???: T}]

As mentioned above however, the ++ method is not entirely correct either. There isno check whether extension is actually an update and type intersection only works asexpected if the type of the updated field does not change. If the new value is of anothertype, the correct behavior would be to either refuse to perform the update at compile-time, or overwrite the old type with the new type from the right as is done for the val-ues. But type intersection is commutative and instead the intersection is applied recur-sively to the updated field, making it an intersection of the new and old type. This leadsto an incorrect cast with runtime errors down the line:

val o = Record("age"->"very old").asInstanceOf[Record{val age: String}]

scala> val e: Record{val name: String; val age: Int & String} = t ++ oval e: Record{name: String; age: Int & String} = Record(Map(name -> Mme Tortue, age

-> very old))

scala> e.agejava.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer

Section 7.2 provides some pointers for how this problem could be solved in the future.

Polymorphic Extension Provided that the field access bug is fixed for intersection typesand ignoring the fact that extension as defined above is unsound for type-changing up-dates, there is no difference between the monomorphic and the polymorphic case. Thefollowing function takes a record of parameterized type R as argument using boundedquantification and adds a color field:

def colorize[R <: Record](r: R): R & Record{val color: String} =r ++ Record("color"->"green").asInstanceOf[Record{val color: String}]

scala> colorize(t)val res17: Turtle & Record{color: String} = Record(Map(name -> Mme Tortue, color ->

green, age -> 123))


4.6.5 Update

Nothing prevents a record from having its values updated without changing the type:

scala> val u = Record(t._data + ("age"->124)).asInstanceOf[t.type]val u: Turtle = Record(Map(name -> Mme Tortue, age -> 124))

scala> u.ageval res11: Int = 124

Note that the update is without safety-guarantees though, as it is possible to pass anyvalue to the data map and the unsafe cast will succeed without complaints at compile-time.

Polymorphic Update Making sure that the new value is of the same type as the oldvalue, it would seem as though the following function is a safe application of record up-date in a polymorphic context:

def updateX[R <: Record{val x: A}](r: R): R =Record(r._data + ("x"->new A())).asInstanceOf[R]

As noted by Pierce [10], however, this is actually not correct. The type Record{val x: A}

is only an upper bound on the type of x and if depth subtyping is used to instantiate R

at the call site, the type of r.x might be any subtype of A. If r.x has some type B thatis a subtype of A, the return type of the function will be Record{val x: B}. But the newvalue for x is of type A and so the function actually makes an unsafe down-cast from A toB that results in a runtime error once the field is accessed:

val r = Record("x"->new B()).asInstanceOf[Record{val x: B}]

scala> updateX(r)val res22: Record{x: B} = Record(Map(x -> A@17dc96c6))

scala> updateX(r).xjava.lang.ClassCastException: A cannot be cast to B

Using the merge operation ++ for Records defined above does not help either. Since Record{x:

B} & Record{x: A} is equivalent to Record{x: B & A} which is equivalent to Record{x:

B}, the unsafe cast still passes without compile error:

def updateX[R <: Record{val x: A}](r: R): R = r ++ Record("x"->new A())

scala> updateX(r)val res9: Record{x: B} = Record(Map(x -> A@2b23be99))

Note that this is not an issue that applies specifically to this implementation of records,but to functional update under bounded quantification in general. The problem here


is rather that the update operation is based on an unsafe cast that makes it possible tooverride an otherwise sound type-system. The solution presented by Pierce [10] to allowsound record update under bounded quantification is to add a special mark to make arecord type invariant in the marked field types. That is, the depth subtyping rule is dis-abled for marked fields so that when R is instantiated above it is statically guaranteedthat r.x has exactly type A. Only marked fields are allowed to be updated. The corre-sponding pseudo-code for Dotty using annotations would be something along the linesof:

def updateX[R <: Record{@invariant val x: A}](r: R): R = r ++ Record("x"->new A())

stating that the Record type is invariant in the x field so that it is safe to update. It is alsosafe to let Record be contravariant in the updated fields, but then they are no longer safeto access instead.

Chapter 5

Comparison of Existing Approaches

This chapter summarizes the described features of existing approaches in a feature ma-trix and presents their runtime and compile-time performance obtained from the Wreck-age benchmarking suite.

5.1 Qualitative Comparison

The features of the existing approaches to records described in Chapter 4 are summa-rized in Table 5.1. The scala-records v0.3 library is omitted as it is superseded by v0.4 onevery point except Eclipse IDE support. The record implementation using Dotty’s newrefinement types is referred to as Dotty Selectable.

Type Safety refers to if field access is typed and if accessing a non-existent field is acompile error. Compossible’s entry is in parenthesis as it is type-checked in general, butit is possible to trip the type checker by making unsafe updates. Dotty Selectable’s entryis also in parenthesis as the type-safety relies on an unsafe initial cast to the refinementtype.

The subtyping support is expressed using the following naming convention: P meanspermutation subtyping, W means width subtyping and D means depth subtyping. Aplus after each letter means that it is supported, and a minus that it is not supported.For example, P+W+D+ means that all of permutation, width and depth subtyping issupported, and for P−W−D+ only depth subtyping.

Parametric Polymorphism refers to if and how it is possible to let a generic type pa-rameter capture a record type while keeping some usable information, for example ifcertain fields are present and can be accessed. Scala’s default way of achieving this isthrough bounded quantification, using a supertype in a subtyping relationship to expressthe information known about the parameterized type. This form of parametric polymor-phism is also supported for structural refinement types, making it available for bothanonymous refinement types, the phantom refinement types used by scala-records 0.4and Dotty Selectable. Shapeless records do not support permutation or width subtypingand so cannot use bounded quantification. Instead it is possible to express the fact thatcertain fields are present on a parameterized record type by demanding it to implementa corresponding Selector type class for each field.

Extension, Restriction, Update and Relabeling describes if these operations are sup-ported, and in that case if they are supported in a monomorphic context where the fullrecord type is known, or in a polymorphic context as well where only partial information

56

CHAPTER 5. COMPARISON OF EXISTING APPROACHES 57

Ano

n.Re

finem

ents

scal

a-re

cord

s 0.4

Com

poss

ible

0.2

shap

eles

s 2.3

.2

Dot

tySe

lect

able

Access syntax r.f r.f r.f r("f") r.f

Equality referece value reference value value

Type Safety X X (X) X (X)

Subtyping P+W+D+ P+W+D+ P+W+D+ P−W−D+ P+W+D+

Explicit types X X type carrier,not inline

type carrier,inline

X

ParametricPolymorphism

Boundedquantification


- Selectortype class


Extension - - monomorph. polymorph. (polymorph.)

Restriction - - - polymorph. -

Update - - (monomorph.) polymorph. (monomorph.)

Relabeling - - - polymorph. -

Eclipse IDE X (-) X X ?

IntelliJ IDE X - - - ?

to case class - X - X -

from case class - - X X -

Table 5.1: Feature matrix for existing approaches to records in Scala

58 CHAPTER 5. COMPARISON OF EXISTING APPROACHES

about a record’s type is available. Compossible’s entry for Update and Dotty Selectable’sentry for Extension and Update are in parenthesis as the operations are supported butare not type-safe in general.

The lack of Eclipse support for scala-records 0.4 is in parenthesis as it is not fullyworking, but seems easy to fix. The IDE support for Dotty was not investigated.

Lastly, the possibilites to convert records to and from case class instances is covered.

5.2 Quantitative Evaluation using Benchmark

JMH Benchmarks was generated for each evaluated approach using the Wreckage Bench-marking Library. The Benchmarks were then run using version 8 of the Java SE RuntimeEnvironment on a Java HotSpot™ 64-Bit Server VM with an initial heap size of 256 MBand maximum heap size of 4 GB. The host computer was a MacBook Pro with a 3,1 GHzIntel Core i7 processor.

Raw measurement data was collected using JMH’s JSON output format and thenpost-processed using a MATLAB® script, as described in Section 3.2.4.

Scala 2.11.8 and Dotty 0.1.1 were used in all benchmarks.

5.2.1 Runtime performance

The benchmarked approaches are Scala’s anonymous refinement types, scala-records 0.4,Compossible 0.2, Shapeless 2.3.2 and records using a hash map with Dotty’s new struc-tural refinement types (Dotty Selectable). Scala’s nominally-typed case classes was alsoincluded to provide a baseline of what performance is possible to achieve with classesusing virtual method calls on the JVM.

5.2.1.1 Creation Time against Record Size

Creation time was measured against record size in terms of number of fields. The resultsare presented in Fig. 5.1. Both anonymous refinements and case classes are compiled toJava classes on the JVM. As expected their creation times are overlapping and faster thanany of the record libraries using linked lists (Shapeless) or underlying hash maps (scala-records, Compossible, Dotty Selectable). Shapeless requires the least creation time of therecord libraries, followed by scala-records.

A Compossible record is created by extending it field by field and an immutable addoperation is performed on the underlying data map for every field, along with creationof intermediate record class instances. Although Scala’s immutable maps are implementedas Hash Tries with effectively constant time complexity for adding key-values [35], theruntime cost is shown to be significant compared to creating the complete hash mapfrom start, as is done by scala-records and Dotty Selectable.

It is unclear why the creation time actually goes down for Shapeless records withmore than 26 fields.

5.2.1.2 Access Time against Field Index

Access time was measured against the index of the accessed field. For ordered records(only Shapeless) the index corresponds to the index in the linked list, whereas for the un-


2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

1

2

3

4

5

6

7

8

9

Record Size

Cre

atio

nti

me

[ms]

Case ClassAnon. Refinementsscala-records 0.4Compossible 0.2Shapeless 2.3.2Dotty structural

Figure 5.1: Record creation time against record size in number of integer fields. Measuredas mean steady state execution time per created record and plotted with 99.9% confidenceintervals. The graph for Anonymous Refinement and Case Class are overlapping close tothe x-axis.

ordered approaches the index merely identifies the field name. The results are presentedin Fig. 5.2.

It is somewhat surprising that the cached reflection of Scala’s anonymous refinementtypes in many cases is the fastest of the tested approaches (except for the case class base-line). On the other hand, in this benchmark the call site is monomorphic and so reflec-tion will only be carried out once per VM fork and then the cached method handle willgive an immediate match for every subsequent call. As expected, the access time is alsoindependent of field index.

The hash map based approaches (scala-records, Compossible and Dotty Selectable)varies between two different access times depending on field index. A possible explana-tion is that hash lookup for certain keys require one more indirection in the Hash Triethan the others. The constant overhead of scala-records compared to Compossible isbelieved to be due to the fact that scala-records wraps the hash lookup inside an ex-tra interface call. It is unclear why Dotty Selectable’s hash lookup varies between scala-records’ ands Compossible’s access times.

As expected, the linked list data structure used by Shapeless shows a clear linear ac-cess time in the field index. In practice though, this approach is actually the fastest forthe first 6 fields and on par with the hash maps for at least the first 12 fields.

5.2.1.3 Access Time against Record Size

In the previous benchmark the record size was constant and the accessed field was var-ied. In this benchmark, a record of increasing size is created and the field with highest


2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

5

10

15

20

25

30

35

40

45

50

Field index

Acc

ess

tim

e[n

s]Case ClassAnon. Refinementsscala-records 0.4Compossible 0.2Shapeless 2.3.2Dotty structural

Figure 5.2: Record access time against field index on a record with 32 integer fieldsf1,f2,...,f32. Measured as mean steady state execution time per access operation onfield f1, f2, f4, f6, ...,f32. Plotted with 99.9% confidence intervals.

index is accessed. The results are presented in Fig. 5.3.Again, Shapeless has linear access time as the last index is the worst case from Sec-

tion 5.2.1.2 for each record size. For hash-based approaches the index again merely iden-tifies the field label. The same varying pattern between two different access times is ob-served as before with no noticeable increasing trend with record size. Anonymous refine-ments are also shown to have constant access time in record size.

5.2.1.4 Access Time against Degree of Polymorphism

Access time was measured against degree of polymorphism and the results are presentedin Fig. 5.4. Shapeless was not included as the benchmark implementation requires therecords to support permutation and width subtyping in order to store them in an arrayof least upper bound type {g1: Int}.

Anonymous refinement types clearly has linear access time in the degree of polymor-phism. This is expected as the inline cache is implemented as a linked list and confirmsthe results of Dubochet and Odersky [32].

The other approaches are also affected slightly by increasing polymorphism, but notas much. A possible explanation is that polymorphism interferes with JIT compiler op-timization. It is worth noting that although Compossible’s and Dotty’s hash lookup isfaster than cached reflection already from polymorphism degree 2, it is not until aroundpolymorphism degree 16 the linear curve really starts to diverge from the other.


2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

5

10

15

20

25

30

35

40

45

50

Record Size

Acc

ess

tim

e[n

s]

Case ClassAnon. Refinementsscala-records 0.4Compossible 0.2Shapeless 2.3.2Dotty structural

Figure 5.3: Record access time against record size in number of integer fields. Measuredas mean steady state execution time per access operation on records with 1, 2, 4, 6, ... upto 32 fields. For each size, the field with highest index was accessed. Plotted with 99.9%

confidence intervals.

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

5

10

15

20

25

30

35

40

45

50

Degree of Polymorphism

Acc

ess

tim

e[n

s]

Case ClassAnon. Refinementsscala-records 0.4Compossible 0.2Dotty structural

Figure 5.4: Record access time against degree of polymorphism on an array of differentrecords with 32 integer fields. Measured as mean steady state execution time per fieldaccess (including array indexing) and plotted with 99.9% confidence intervals.


20 40 60 80 100 120 140 160 180 200 220 2400

2

4

6

8

10

12

14

16

18

20

Record Size

Com

pila

tion

tim

e[s

]

Case ClassAnon. Refinementsscala-records 0.4Compossible 0.2Shapeless 2.3.2

Figure 5.5: Compilation times for a code snippet that creates a single record of varying size.Measured as single shot compile times for records with 1, 50, 100, 150, 200 and 250 fieldsand plotted with 99 % confidence intervals.

5.2.2 Compile-Time Performance

The benchmarked approaches are Scala’s anonymous refinement types, scala-records 0.4,Compossible 0.2 and Shapeless 2.3.2. Scala’s nominally-typed case classes was includedto provide a baseline. Dotty Selectable was not included in the compile-time benchmarks.

5.2.2.1 Create

The results can be seen in Fig 5.5. All approaches are found to have a more or less linearcompile time in record size, as expected. Although the absolute numbers are machine de-pendent and hard to generalize, it is worth noting that the compile times are quite high;It may take up to 20 seconds to compile a single expression creating a large Shapeless orCompossible record on a modern computer. The corresponding scala-records record take6-8 seconds and a case class of the same size around 1 second.

5.2.2.2 Create and Access All Fields

Figure 5.6 shows the compile times of a code snippet that creates a record and also ac-cess all its fields one at a time. This time Compossible has the fastest compile times ofthe library approaches, followed by Shapeless and scala-records. Case classes act as abaseline here as well with a constant overhead of 1 s and then increasing up to 2 secondsfor a record with 250 fields.

Clearly, Shapeless does no longer show the exponential compile times that was foundby Jovanovic et al. using scala-records-benchmarks [31, 5]. The latest version (2.3.2) isactually faster to compile than scala-records. This difference in asymptotic compile timewas traced back to a change in how the Selector (and Updater) type class is implemented


20 40 60 80 100 120 140 160 180 200 220 2400

5

10

15

20

25

30

35

40

45

Record Size

Com

pila

tion

tim

e[s

]

Case ClassAnon. Refinementsscala-records 0.4Compossible 0.2Shapeless 2.3.2

Figure 5.6: Compilation times for a code snippet that creates a record and accesses all itsfields. Measured as single shot compile times for records with 1, 50, 100, 150, 200 and 250fields and plotted with 99 % confidence intervals.

between Shapeless 2.2.5 and 2.3.0 [36] and verified by re-running the benchmark withthese versions. The results are found in Fig. 5.7 and confirms the findings of Jovanovicet al. [5] for Shapeless 2.2.5.

Before version 2.3.0 field access was achieved by instantiating a Selector instancethrough a recursive implicit resolution process over the accessed list, instead of using amaterializer macro directly as described in Section 4.5.3. This recursion incurs a linearnumber of implicit resolutions in the index of the access field which is then multipliedby the record size as every field is accessed in the benchmark. This would account forquadratic compile times however, and does not explain the super-quadratic compile timeimplied by the least squares fitted curves. A possible explanation is that the time of eachimplicit resolution also grows with record size, but the exact nature of this process wasnot investigated further.


2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

20

40

60

80

100

120

140

Record Size

Com

pila

tion

tim

e[s

]

Shapeless 2.3.0Shapeless 2.0.0quadraticexponential

Figure 5.7: Compilation times for different versions of the Shapeless library against recordsize, compiling a snippet that creates a record and accesses all fields. Measured for recordswith 1, 5, 10, 15, 20, 25 and 30 fields and plotted with 99 % confidence intervals. An expo-nential and a quadratic curve was fitted to the shapeless 2.2.5 data using the least squaresmethod at record size 1 to 25 to see how well the models predict the compile time at recordsize 30.

Chapter 6

Analysis and Possible new Approaches

In this chapter the results from Chapter 5 are analyzed and the design space for new ap-proaches to records in Scala is investigated. New approaches are suggested and evalu-ated using the same benchmarking suite and methodology used for existing approaches.

6.1 Strengths and Weaknesses of Existing Approaches

Overall, existing libraries was found to be in better shape than expected. Since version0.4, scala-records support explicit types which in turn enable records types to be used forfunction parameters and bounded quantification. Furthermore, Shapeless does no longershow the exponential compile times found by Jovanovic et al. [5].

Looking at the feature matrix of section 5.1, the one feature that scala-records 0.4lacks compared to Compossible is monomorphic extension. But there does not seemto be any fundamental difficulty in adding this feature to scala-records as well throughwhitebox macros, along with monomorphic restriction, update and relabeling.

Shapeless’ fields are ordered which limit the way records can stored in heterogeneouscollections and make it less straightforward to pass them as function arguments. On theother hand, Shapeless is the only library to provide extensive support polymorphic ex-tension, restriction, update, etc. through type classes. The problem of not being able toexpress the types explicitly is solved by using macro-parsed path dependent types pro-ducing type carriers that can be inlined. This solution is far from perfect however, as thetypes are restricted to "Standard types" and nested record types cannot be expressed.

Three weaknesses stand out as being common for a majority of the existing approaches:

1. Whitebox macros prevent static code analysis tools such as IntelliJ from being used,and reduces the expected lifetime of the library since whitebox macro support willbe dropped in the future [37]. (All approaches except Dotty.)

2. Suboptimal runtime performance for field access compared to case classes, usingreflection, hash maps or linked lists.

3. Poor support for monomorphic extension, restriction, update and relabeling and nosupport for polymorphic versions of the same. (All approaches except Shapeless.)

Regarding whitebox macros, all investigated libraries for Scala rely on them in oneway or the other to be able to represent record types using existing Scala type primitives,

65

66 CHAPTER 6. ANALYSIS AND POSSIBLE NEW APPROACHES

bridge the gap between this representation and the value level, and provide a clean syn-tax. Although the drawbacks of relying on whitebox macros are clear it is hard to seeany other possibility for current versions of Scala, besides writing a compiler plugin thatadds new record syntax to the language and augments the typer. But this would ulti-mately have the same drawbacks as the whitebox macros: limited lifetime and poor in-tegration with statical analysis tools such as the IntelliJ IDEA. At the same time, currentversion of Scala do support whitebox macros, they work with the Eclipse IDE, and theycan be excellent tools for experimenting with possible new approaches to records withoutforking the compiler. A successful whitebox approach can always be transformed into anative solution incorporated in future versions of Scala or Dotty later.

By this argument, whitebox macros will not be considered an issue in the followinganalysis, and the focus will be solely on how point 2 and 3 above might be addressed.Especially field access will be investigated; if and how it is possible to achieve better run-time performance than using a hash map for unordered approaches and linked lists forordered.

6.2 Design Space for Records

The quest for faster field access and polymorphic extension reveals that the data struc-ture chosen to represent a record’s values is tightly coupled to what type level represen-tation is used and what operations to support on that type representation. The problemof designing a new approach to records for Scala is reduced to the following four ques-tions:

1. What type level representation is chosen for record fields and types?

2. What value level data structure is chosen to store record values?

3. What subtyping rules should apply to record types?

4. What type-level operations should be allowed?

Unfortunately, the possible answers to these questions do not provide an orthogonal ba-sis for the design space of records in Scala and the answer to one question affects thepossible choices of the others. This thesis will refrain from giving subjective pointersas to what particular combination of features that is desirable for a new approach torecords, and the focus is instead on providing a background and practical tools to aidsuch a decision in the future.

First, the possible answers to question 1 is limited to a selection of six different typerepresentations in Section 6.3, "Record Type Representations". Next, question 2 and 3 aretackled together in Section 6.4, "Compilation Schemes for Subtyped Records". A selec-tion of seven different possible data structures for storing records is then benchmarked inSection 6.5, "Benchmarks of Possible Data Structures". Lastly, question 4 is postponed toChapter 7, "Discussion and Future Work" where possible solutions for supporting recordoperations such as extension and update are discussed and interesting paths for furtherwork are outlined.

CHAPTER 6. ANALYSIS AND POSSIBLE NEW APPROACHES 67

6.3 Record Type Representations

So far, three different ways of representing record types in Scala have been presented:The structural refinement types used by anonymous classes, scala-records and DottySelectable, the compound types used by Compossible and the tagged HLists used byShapeless. By combining these approaches with the option of putting them in a phan-tom type parameter (as is done by scala-records 0.4 and Compossible) one can obtain atotal of six different type representations, each with its own characteristics:

1a) Refinement types

Scala syntax: Rec{val f1: T1; val f2: T2 ...}

Examples: scala-records 0.3 [22], Dotty Selectable [25]

1b) Phantom refinement types

Scala syntax: Rec[{val f1: T1; val f2: T2 ...}]

Examples: scala-records 0.4 [4]

2a) Compound types

Scala syntax: Rec with Field["f1", T1] with Field["f2", T2] ...

Examples: "Add records to Dotty" [38]

2b) Phantom compound types

Scala syntax: Rec[Field["f1", T1] with Field["f2", T2] ...]

Examples: Compossible 0.2 [23]

3a) HList records

Scala syntax: Field["f1", T1] :: Field["f2", T2] :: ... :: HNil

Examples: Shapeless 2.3.2 records [24]

3b) Phantom Type List (TList) records

Scala syntax: Rec[Field["f1", T1] :: Field["f2", T2] :: ... :: TNil]

Examples: "Type Lists and Heterogeneously Typed Arrays" [39]

The Scala syntax above should be understood as pseudo-code for a record type withfields labeled f1,f2,... of types T1, T2,... using each approach. A Field["f1", T1] isa pseudo-code type level representation of a field with label f1 and type T1. One possibleconcrete implementation is

trait Field[L <: String, V]

using singleton string types to represent the label, and with V as the type of the fieldvalue. Compossible use this approach with Tuple2 instead of a custom Field trait, whereasShapeless’ version is called KeyTag. The HList approach is fundamentally different fromthe others though, as the Fields are not only type level representations of fields but mustalso hold the actual values. Shapeless solves this by implementing the Field type as V

with KeyTag[L,V], but there are other possibilities as well, for example using a case class:


case class Field[L <: String, V](val value: V)

This is also the reason for the distinction between HLists and TLists above; an HList is alist of heterogeneous elements whereas a TList is a list of types.

The following analysis will be limited to the above selection of type level representa-tions and their characteristics.

6.4 Compilation Schemes for Subtyped Records

Regardless of programming language and type system, the record fields and their val-ues must be stored in some kind of underlying data-structure on the target platform. Thechoice of this data-structure naturally affects what kind of operations that can be per-formed on the records, what level of structural subtyping that can be achieved, and atwhat runtime cost. In this section, the relation between the value level and the subtypingrelation is investigated.

Although the term structural subtyping commonly refers to the combination of allthree of permutation, width and depth subtyping, records in other languages as well asthe existing approaches in Scala shows that this must not be the case. scala-records andCompossible support all three, whereas shapeless only have depth subtyping through thecovariance of the elements. SML on the other hand only allow permutation but no widthor depth subtyping [14].

Just to do it utterly thorough, the 23 possible ways of combining the subtyping rulesare investigate below. For each set of subtyping rules the following questions will beasked:

• What type level representation can be used to achieve this subtyping relation?

• What runtime performance is possible to achieve for field selection?

The naming scheme from the qualitative comparison will be used to denote the variouscombinations of the subtyping rules: P means permutation subtyping, W means widthsubtyping and D means depth subtyping. A plus after each letter means that it is sup-ported, and a minus that it is not supported.

6.4.1 P−W−D±: No Permutation, No Width Subtyping

Example: ShapelessBoth refinement types and compound types provide automatic permutation subtyping,and so the only options for representing ordered records are the HList and TList ap-proaches. To optionally restrict the depth subtyping, the fields can be made invariantin the value type parameter. With ordered fields and no width subtyping a record canbe viewed as a tuple where the indices are aliased with labels. The compiler always hascomplete knowledge of the fields a record contains and it is possible to translate field ac-cess to direct indexing. Thus, the TList approach can store the values in an Array and getefficient constant time field access. For the HList however, the direct indexing will resultin a linear time pointer chase to the corresponding element as demonstrated Shapeless.


6.4.2 P−W+D±: Width Subtyping for Ordered Fields

As noted in Section 2.2.1 it is possible to define width subtyping without field permu-tation as a slicing operation where a record type is a supertype of another record typeif it is a prefix of the other. Shapeless records do not satisfy this subtyping relation, butit could presumably be achieved for both HList and TList based approaches by lettingHCons (TCons) extend HNil (TNil). Depth subtyping is again controlled by the varianceof the list elements. The indexing scheme described above is not affected by the subtypeslicing, as the static type will always be a proper prefix of the dynamic type, startingfrom the 0th index.

6.4.3 P+W−D±: Unordered Records without Width Subtyping

Example: SMLIn the absence of width subtyping the compiler again has complete static knowledgeabout the fields a record contains, and although the fields are unordered it is possibleto achieve constant time fields access. A naive approach would be to give every field ineach record type an arbitrary index, store the values in this order in an array, and trans-late every field access to indexing into this array. The problem with this approach is sep-arate compilation, as we cannot guarantee that the indexing will be the same across dif-ferent compilation units. The solution is simple however: introduce a canonical orderingof the fields, for example sorting them in alphabetical order. Then a particular field namewill always have the same index in a given record type, and a record declared in onecompilation unit can be safely accessed by field index in another without confusion [14].

Scala refinement types and compound types have width and depth subtyping by de-fault. By putting these representations as phantom types in an invariant type parameterhowever, the desired level of subtyping is achieved. It does not seem possible to only re-strict the width subtyping this way however, and the depth subtyping restriction comeswith the package, making the P+W−D+ combination infeasible.

Another possibility is to use a sorted TList in the type parameter. If the sorting isdone automatically all explicit permutations in client code will be represented by thesame list of sorted fields in the background, making the records appear as unorderedwhile restricting width subtyping. Depth subtyping could be controlled by the varianceof the elements as before. Note though that the user would have to be prevented fromcreating such types explicitly in a way that interferes with the sorting invariant. For ex-ample, an HList-based implementation must be complemented with other means of cre-ating the records than using the HCons constructor. One possibility is to let record typesbe expressed using Shapeless’ parsed path dependent types (see Section 4.5.4).

6.4.4 P+W+D±: Unordered Records with Width Subtyping

Example: scala-records, CompossiblePossible type level representations for this typing scheme are: the refinement types usedby scala-records 0.3, the phantom refinement types used by scala-records 0.4, the phan-tom compound types in a type parameter used by Compossible, as well as the com-pound of field types suggested for future records in Dotty by Odersky [38]. Depth sub-typing can be switched off for the compound type representations by making the Fieldrepresentations invariant in the value type. The HList and TList approaches are not suit-


able as they are inherently ordered, and the presence of width subtyping makes the sort-ing approach unusable.

The combination of unordered fields and width subtyping complicates the choice ofdata structure considerably. Again, consider the getName function, here expressed usingpseudo-code for records:

def getName(r: {name: String}) = r.name

Since any record containing a name field of type String can be passed to this function, itis in general unknown at what index name might be stored in the record at hand. A the-oretical solution suggested by Cardelli [40] is to give every field a globally unique indexand let every record be represented by a potentially very large and sparse value arraycapable of containing every field ever declared in the code-base. But besides breakingseparate compilation, this is of course not practical from a memory perspective.

Giving up on achieving some kind of statically known indexing, there are two differ-ent approaches taken in the literature and practical implementations [14, 34]:

1. Resort to runtime searching for the field.

2. Pass in some extra information with the argument record.

But in the case of Scala there is also a third alternative:

3. Use some approach provided by the JVM platform1

These options will be considered in turn below and a selection of approaches from eachcategory is then benchmarked in Section 6.5.

6.4.4.1 Option 1: Searching

Using common data-structures, the following asymptotical performance of field lookupcan be achieved:

• Unordered list or array with linear search: O(n)

• Sorted array with binary search, as suggested by Leijen [34]: O(log2 n)

• Scala’s immutable HashMap with effectively constant lookup time [35]: O(log32 n)

Here a sorted array may be advantageous if it is somehow known when the static typematches the runtime type exactly, and in that case get the constant time field access out-lined in Section 6.4.3. Otherwise a HashMap seems to be the most attractive alternative.

6.4.4.2 Option 2: Information Passing

An approach inspired by how Golang solves its structural interface typing [41] is the fol-lowing: Let a record consist of a "fat pointer" containing

• a reference to an arbitrarily ordered array of values (the record data),

1Presumably using some strategy from 1 or 2 under the hood.


• some unique id (for example a hash), identifying this particular field order (the"runtime-type"), and

• a reference to an itable.

The itable is a mapping from field names to indices in the value array. It can be imple-mented as an array that is sorted on the field names in alphabetical order, exactly asthe simple SML-style records described above. When a record subtype is casted to somestructural supertype, an itable is created containing the supertype’s fields sorted in orderand that maps the fields to their indices in the value array of the record at hand. Thisitable-creation can be done in linear time, and the itable can then be globally cached onthe (runtime type, static type)-pair. Thus, subsequent casts can be done in effectivelyconstant time if the cache is implemented as a hash map. But maybe more interesting,this approach allows constant time field access by simple array indexing. The itables canbe accessed just like ordered records by the sorted field name index, and that index canthen be used to access the desired value.

This approach also has similarities to the approach proposed by Ohori [14] and usedin SML# to solve constant time field access under parametric polymorphism. Althoughin [14] the itables are represented by lambda abstractions containing the lookup indices.

The Problem: Variant Generics The problem with this approach is that it requires arun-time operation at the point of the implicit up-cast from a subtype to some structuralsupertype. This in turn require every up-cast operation to have some explicit point in theprogram where it happens, and this is where Scala’s variant generics becomes a prob-lem. Consider for example Scala’s List data type which is covariant in its type argumentand let A and B be two types where B is a subtype of A. Then it is possible to pass a listof type List[B] to a reference of type List[A] by means of an implicit upcast. But if theup-cast requires some run-time operation to be performed on each of the elements of thelist, it is unclear where the compiler should insert them. Automatically mapping the co-ercion operation over collections would make a simple reference assignment a costly lin-ear time operation [42], and can potentially be disastrous for infinite lazy streams.

The Solution: Explicit width coercion (P+W coercedD±) By requiring every upcast to bean explicit coercion operation, the above problem can be avoided altogether. That is, sim-ply forbid casts from List[B] to List[A] if A and B are record types, and let the respon-sibility for iterating through collections fall on the programmer, making the coercion andthe linear performance hit explicit. The benefit is potentially huge: a form of staticallytype-checked structural subtyping for records with constant time field access.

One could argue that this is essentially the "Unordered records without width sub-typing (P+W−D±)" from section 6.4.3 above all over again, as a coercion operator canbe applied to change the type of those records as well. The difference lies in the effi-ciency of the cast. Whereas a coercion for the approach using itables is a one-time costthat is cacheable, arbitrarily coercing the sorted records of Section 6.4.3 would be a lin-ear time operation every time. The possible type level representations are the same how-ever, using refinement types or compound types in an invariant type parameter to lockdown width subtyping (and unfortunately also depth subtyping), or using sorted HListor TList based approaches.


6.4.4.3 Option 3: Use the JVM

In Scala there is also the possibility of letting the JVM handle the underlying field se-lection logic by using classes for data storage and the various types of virtual, interface,reflective or dynamic calls for field access. This section covers some of these possibilities.

(Cached) Method Reflection This is the approach used by Scala for general structuraltyping covered in Section 4.1. The problem with this is poor performance for polymor-phic and megamorphic call-sites, as shown in section 5.2.1.4. It’s unclear if megamorphiccall-sites are a real threat in practice though, as the findings of Hölzle et al. [43] suggestthat polymorphism degrees above 10 might be rare in real-life code.

(Cached) Field Reflection Method reflection on the JVM requires a method name lookupthat is also checked against the static type of the method parameters [32]. For storingand accessing record fields however, reflection can be made directly on the Java classfields instead. This lookup is potentially faster as it only involves name comparison.

One Interface per Field This method is suggested in the "Add Record To Dotty" dis-cussion post by Odersky [38]. For example, creating the record {name="Mme Tortue",

age=123} could be translated into:

trait FieldName[T](val name: T)trait FieldAge[T](val age: T)

class Rec$$anon$1(name: String, age: Int)extends FieldName[String](name)

with FieldAge[Int](age)

new Rec$$anon$1("Mme Tortue", 123)

The problem with this approach is that it breaks separate compilation. The same inter-faces may be generated in several different compilation units but will not be treated asthe same by the JVM. The different instances of the "same" interface will either causea name conflict, or get different symbols and cause the structural subtyping relation tobreak between records created in different compilation units. A possible solution is towait with interface generation until runtime, described next.

One Interface per Field (Generated at Runtime) To admit separate compilation the in-terfaces could instead be generated at runtime, using some bytecode generation library.This kind of generative approach has some problems however, as noted by Dubochetand Odersky [32]:

• Dependency on a bytecode generation framework.

• Needs access to the class-loader, which may not be permitted in e.g. web applica-tions for security reasons.


One Interface per Record Type + Runtime Generated Wrappers Whiteoak [6] solves theproblem of separate compilation another way: Instead of creating one interface per field-type-pair, an interface is generated at compile-time for each declared structural type inthe application. Then a wrapper class is created lazily at runtime for each (runtime class,interface)-pair as needed. The wrapper class implements the interface and delegates allmethod calls to the wrapped class. The wrapper class is then cached so that it will onlyhave to be created once for each combination of runtime class and structural type. How-ever, the initial wrapper class generation comes with a noticeable runtime cost and hasthe same dependency on a byte code generation framework and class loader access men-tioned above [32].

Furthermore, if the wrapper classes are created at first assignment to a structural ref-erence, this solution cannot support implicit structural subtyping for the same reasonas the solutions outlined under Option 2 above. Whiteoak instead generate the wrap-per classes at first field access, but then the performance benefit is unclear in the case ofrecords; If every field access requires a cache lookup in some data structure to fetch thewrapper class, this operation could be spent accessing the field value from a similar datastructure directly instead.

6.4.5 Summary

For each level of subtyping described above, the possible type level representations fromsection 6.3 are summarized in Table 6.1.

For ordered records, it is clearly possible to achieve constant time field access by trans-lating labels to array indices. This approach can also be used for unordered records byintroducing a canonical ordering of the labels and storing the values in this order, pro-vided that width subtyping is not allowed. The next step in subtyping flexibility is toallow a limited form of width subtyping, where every cast has to be an explicit coercion.Then it is possible to maintain a mapping from the current static type of the record tothe dynamic type of the record, making field access possible in constant time by two in-dexing operations whereas the cast itself is a linear time operation. The benefit of thisapproach over simply disallowing width subtyping completely and instead allowingrecords to be projected field by field to arbitrary supertypes is that the coercion is cacheable.

Once unrestricted permutation, width and depth structural subtyping is required theabove approaches cannot be used. Instead it seems as though the only alternatives are torely on runtime searching, hashing or using some native JVM data structure.

The above overview merely provide the theoretical asymptotic runtime performanceof the various compilation schemes, and to quantify the statements a further benchmarkwas carried out for a selection of interesting approaches. The Benchmarked data struc-tures are:

• Scala’s mutable ArrayBuffer

• Scala’s (linked) List

• Scala’s immutable HashMap

• One Scala trait per field

• Java field reflection

• Java method reflection

• Scala’s cached method reflection (us-ing the Anon. Refinements from Sec-tion 4.1)

The results for these data structures are presented in Section 6.5.


Refi

nem

entt

ypes

Rec{val

f1:T1;

...}

Phan

tom

Refi

nem

entt

ypes

Rec[{val

f1:T1;...}]

Com

poun

dty

pes

Recwith

Field["f1",

T1]with

...

Phan

tom

Com

poun

dty

pes

Rec[Field["f1",T1]

with

...]

HLi

stField["f1",

T1]::

...::

HNil

Phan

tom

TLis

tRec[Field["f1",T1]::

...::

TNil]

A

P−W−D− - - - - inv. fields inv. fields

P−W−D+ - - - - cov. fields cov. fields

P−W+D− - - - - HCons <: HNilinv. fields

TCons <: TNilinv. fields

P−W+D+ - - - - HCons <: HNilcov. fields

TCons <: TNilcov. fields

BP+W−D− - inv. t.p. - inv. t.p. auto sorted

inv. fieldsauto sortedinv. fields

P+W−D+ - - - - auto sortedcov. fields

auto sortedcov. fields

CP+W+D− - - inv. fields inv. fields - -

P+W+D+ default cov. t.p.cov. fields

cov. fields cov. t.p.cov. fields

- -

Table 6.1: In group A it is possible to achieve constant time field access by translating labelsto indices. In B it is possible to achieve constant time field access by sorting the labels andstore their values in this order. In C permutation is combined with subtyping, and othermethods from section 6.4.4 has to be used. "inv." is short for invariant, "cov." is short forcovariant, and "t.p." is short for type parameter.


The difference between field reflection and method reflection turned out to be smalland therefore cached field reflection was not benchmarked. The generative approach de-scribed by Gil and Maman [6] has been dropped in Whiteoak version 2.1 and the oldversion is no longer available for download. Although the Wreckage benchmarking li-brary has support for benchmarking Whiteoak 2.1, the details of the new compilationstrategy was not investigated further and the benchmarking results are not presentedhere. For the interested reader, the benchmarking results for Whiteoak 2.1 can instead befound in Appendix A.

6.5 Benchmarks of Possible Data Structures

6.5.1 Access Time against Record Size

In Fig. 6.1, field reflection is shown to be faster than method reflection but still signifi-cantly slower than any of the other approaches. A zoomed-in version of the same resultsfor the faster approaches is found in Fig. 6.2. The list’s access times grow linearly as ex-pected. Cached method reflection, array indexing and interface calls are shown to takeconstant time across record size, with cached reflection taking about 3 times longer thanthe other two. It is also worth noting that in this benchmark interface calls are shown tohave the same performance as Case class field access. The result for the hash map agreewith the results for Compossible and Dotty Selectable in Fig. 5.3. Again, the hash lookupis shown to be slightly faster than cached reflection for all but a few accessed keys.

6.5.2 Access Time against Degree of Polymorphism

The execution times for accessing a field at a polymorphic call site is shown in Fig. 6.3and Fig. 6.4. Again, both method and field reflection are orders of magnitude slowerthan any of the other approaches. To explain the large variance in access time for methodreflection the steady state access time measurements for each VM fork was included as ascatter plot. This reveals that steady state is detected at two different levels of JIT com-piler optimisation for different VM forks, one much slower than the other.

For low degrees of polymorphism, cached method reflection is shown to be around2 times slower than making an interface call, confirming the results of Dubochet andOdersky [32]. The execution times of Java interface calls grow linearly with the degreeof polymorphism however, and for polymorphism degree 32 the slow-down is down to afactor of 1.6. It is also worth noting that using a hash map is actually faster than makinginterface calls for polymorphism degrees higher than 10.

The array and hash map data structures are shown to have more or less constantaccess times across the varying degrees of polymorphism. The slightly faster executiontime at lower degrees of polymorphism can possibly be explained by more successful JITcompiler optimizations.

The list data structure’s linearly increasing access times can be explained by the factthat the accessed field’s maximal index is also growing with increasing degrees of poly-morphism. To discern the effect of polymorphism itself the accessed index would haveto be kept constant, but then the performance would be determined by the chosen indexinstead and not easily compared with the other approaches.


2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

20

40

60

80

100

120

140

160

180

200

220

240

260

Record Size

Acc

ess

tim

e[n

s]Method ReflectionField ReflectionCase ClassAnon. RefinementsOne trait per fieldArrayListHashMap

Figure 6.1: Access time against record size in number of integer fields for various datastructures on the JVM. Measured as mean steady state execution time per access operationon a record with 1, 2, 4, 6, ... up to 32 fields. For each size, the field with highest index wasaccessed. Plotted with 99 % confidence intervals. See Fig 6.2 for a zoomed-in view of thefaster approaches.

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

5

10

15

20

25

30

35

40

45

50

Record Size

Acc

ess

tim

e[n

s]

Case ClassAnon. RefinementsOne trait per fieldArrayListHashMap

Figure 6.2: Zoomed-in view of Fig. 6.1, without method reflection or field reflection ap-proaches.


2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

100

200

300

400

500

600

700


Acc

ess

tim

e[n

s]

Method ReflectionField ReflectionCase ClassAnon. RefinementsOne trait per fieldArrayListHashMap

Figure 6.3: Record access time against degree of polymorphism for various data structureson the JVM. Measured as mean steady state execution time per field access (including ar-ray indexing) for polymorphism degree 1, 2, 4, 6, ... up to 32. Plotted with 99 % confidenceintervals. See Fig 6.4 for a zoomed-in view of the faster approaches.

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

5

10

15

20

25

30

35

40

45


Acc

ess

tim

e[n

s]

Case ClassAnon. RefinementsOne trait per fieldArrayListHashMap

Figure 6.4: Zoomed-in view of Fig. 6.3, without method reflection or field reflection ap-proaches.

Chapter 7

Discussion and Future Work

7.1 Subtyping and Field Access

For unordered records with width subtyping, the benchmarking results in Section 6.5suggests that a hash map is the best choice for underlying data structure of the onescovered in this thesis. Cached reflection may provide faster field access if the call site ismonomorphic, but the difference is small and for polymorphic call sites the linear cachelookup time eventually outgrows the advantage. Perhaps surprisingly, hash lookup isalso faster than accessing the fields through interface calls for high enough degrees ofpolymorphism.

To achieve even faster field access, one possibility is to restrict width subtyping to anexplicit coercion operation. Then record values can be stored in an array and the com-piler can translate field access to direct array indexing, as outlined in Section 6.4.4.2. Theresults in Fig. 6.2 and Fig. 6.4 suggests that this solution would be on par with nativeclass field access and significantly faster than using a hash map. If it is acceptable to alsorestrict depth subtyping to coercion, this level of subtyping can be achieved in currentversions of Scala by storing an unordered field representation in an invariant type pa-rameter, see Tab. 6.1. The field representation can be the refinement types used by scala-records or some compound type of fields like the one Compossible uses. Another pos-sibility is to let a phantom TList represent the fields in a covariant type parameter, butthen this list must be automatically sorted to make record types appear unordered. Allof these approaches currently require whitebox macros to be realized in practice, but ac-cepting this fact it is also possible to provide type classes for additional record opera-tions such as extension, restriction, update and relabeling through implicit materializa-tion macros as is done by Shapeless.

Dotty’s new approach to structural types provides unordered record types with widthand depth subtyping and makes it possible to implement records with a hash map asunderlying data structure without using whitebox macros. However, the supported oper-ations are restricted to creation and field access as the structural refinement types do notallow type-safe record extension, restriction, update or relabeling to be expressed at thetype level. These operations are discussed next.

78

CHAPTER 7. DISCUSSION AND FUTURE WORK 79

7.2 Type-level Operations

As shown for Compossible in Section 4.4.2, the with operator is not suitable for repre-senting record extension when the extension is in fact an update that changes the typeof an already existing field. The same problem was shown for Dotty’s structural refine-ment types using the type intersection & operator in Section 4.6.4. In the presence of sub-typing the fact that the with and & operators does not overwrite the type of an updatedfield also prevents record update from being implemented in a sound way, as shown forDotty in Section 4.6.5. In the monomorphic context where all existing fields are known itmight be possible to implement record extension and record update as whitebox macrosthat make sure that the return types are correct. It is unclear how to use this approach ina polymorphic context however where only a subset of the present fields are known.

Shapeless implementation of record operations through type classes solves all of theabove problems. The type classes allow record extension, restriction, update and relabel-ing to be performed in a consistent way both in a monomorphic context where all fieldsare statically known and under parametric polymorphism. Furthermore, the Selector

type class for field access makes polymorphic field access possible for record types thatdo not support the permutation and width subtyping that is required for bounded quan-tification to work, such as records with ordered fields or coercion based width subtyping.

The type classes are encoded in Scala the standard way by letting evidence of typeclass membership be provided by an implicit parameter that implements the requiredfunctionality. Since the result type of the operations can be stored as a path dependenttype on the implicit instance there is no need express record extension using native typeoperations such as the with operator or type intersection. This presumably allows thetype classes to be implemented for any record type representation.

However, this approach currently relies on whitebox macros to be able to material-ize the implicit evidence for each required record type. This will not be possible in Dottywhere whitebox macros are no longer supported. An interesting line of future work istherefore to investigate the possibility of adding native compiler support for this implicitmaterialization in Dotty. This could potentially improve the compile time compared toexisting approaches in Scala using macros, but might also make certain runtime opti-mizations possible. For example, the implicits that are used solely as type carriers do nothave to be instantiated after typing is finished and can presumably be erased from thecompiled code. The Wreckage benchmarking library could be extended to help verifyingthese optimizations.

7.3 Not One but Three Record Types to Rule Them All?

At the end of the day, it might be the case that different record operations are neededat different stages of a program, and that there is no need to provide a record type sup-porting all possible operations with the best possible performance at all times. Insteaddifferent record type representations might be used depending on the situation:

• If fast record extension is needed but it does not matter if the record type is or-dered, a linked list approach seems like the best option as new fields can be addedto the head of the list in constant time.

• For maximal flexibility, Scala’s structural types provide permutation, width and

80 CHAPTER 7. DISCUSSION AND FUTURE WORK

depth subtyping as well as parametric polymorphism with bounded quantification.A hash map can be used as underlying data structure for acceptable field accesstimes.

• If fast field access is the main criteria, an approach capable of translating field ac-cess to direct array indexing is preferable. With the right compiler support this ispossible for ordered records but also for unordered records if width subtyping canbe restricted to explicit coercion.

Shapeless currently provides an answer to the first scenario, although it remains to beseen how much of Shapeless that can be ported to Dotty and what will instead be pro-vided by standard libraries and native compiler support for HLists in the future.

The second scenario is covered by the new structural refinement types in Dotty, whereasthe scala-records library seems like a viable approach with a similar feature set for cur-rent Scala.

The last scenario remains open however. The approach outlined in Section 6.4.4.2 canpresumably be realized for current versions of Scala using whitebox macros, whereas it isunclear if a native solution for Dotty can be implemented without substantial changes tothe language and the compiler.

7.4 Future work

Besides implementing native support for record type classes in Dotty, an interesting lineof future work is to implement the approach outlined in Section 6.4.4.2 for current ver-sions of Scala and provide efficient conversion methods between this implementationand for example Shapeless and scala-records. This would fill a gap in the design space ofrecords for Scala and could provide valuable insight into how records are used in prac-tice and which representation that is preferable in which situation.

Another important continuation of this work is to extend the Benchmarking suitewith more possible approaches to structural typing on the JVM. Especially, the invoke

dynamic bytecode instruction introduced in Java 1.7 was not covered in this thesis dueto limited time, but could potentially improve runtime performance over hash maps forunordered records with width subtyping.

Furthermore, the benchmarking suite would benefit from being extended with bench-marks of real-world use cases. Microbenchmarks should always be interpreted with agrain of salt, and it would be valuable to complement the results presented in this the-sis with benchmarks where the record operations are used in a context; For examplereading a stream of JSON data, manipulating it in some way and then serializing it toJSON again. Such a benchmark could also be used to investigate the potential benefitof switching between different record representations in different parts of the test pro-gram. Especially the difference in runtime performance between cached reflection andhash maps would be interesting to investigate further as it is surprisingly small in themicrobenchmarks presented in this thesis.

Lastly, this thesis did not cover possible Scala support for recursive record types, typeinference for record types or pattern matching. Nor was the dual of records called vari-ants covered.

Chapter 8

Related Work

8.1 Theoretical Foundations

Records have been extensively studied in programming language research, both in theirown right and as a theoretical foundation for encoding object oriented programming intopure lambda calculi.

Cardelli and Wegner [16] described the mechanism of bounded quantification overstructurally subtyped records and in [17] Wand introduced the notion of row variablesto achieve record polymorphism in a setting without subtyping. The presented proofof complete type inference was later shown to not be correct [18], but the idea of usingrow variables prevailed; For example OCaml uses a form of anonymous row variables toachieve object polymorphism as described by Rémy and Vouillon [19].

Ohori [14] extended Standard ML with polymorphic records using a kind system thatmakes it possible to annotate type parameters with fields that must be present, similar tobounded quantification but without relying on subtyping.

Pierce [10] provides a thorough introduction to typed lambda calculus extended withrecord types. A lambda calculus is developed with records supporting both structuralsubtyping and parametric polymorphism through bounded quantification. Records withboth ordered and unordered fields are treated, where the unordered records are achievedby defining field permutation as a subtyping rule. Pierce also considers the performanceconsequences of allowing record subtyping with unordered fields, and introduces a "co-ercion semantics" that inserts runtime coercions everywhere subtyping is used in a pro-gram. Knowing the exact order of the fields in a runtime record, a compilation schemetranslating field access to direct array indexing is suggested. However, the consequencesof combining this semantics with (co)variant collections is not treated.

Among the more recent work on records we find the extensible record calculus withscoped labels developed by Leijen [34]. This provides an interesting solution to the prob-lem of unchecked extension by implementing each field as a stack of values that is pushedfor extension and popped for restriction. Several possible implementation schemes arediscussed.

8.2 Structural Types on the JVM

Whiteoak [6] is a language extension that brings structural typing to Java. Any conform-ing Java class can be cast to a structural type, and when a method is called on such a

81

82 CHAPTER 8. RELATED WORK

structurally typed reference a wrapper class is generated at runtime that implementsthe interface corresponding to the structural type and delegates all method calls to theruntime class of the receiver. A caching scheme is implemented to amortize the runtimepenalty of class generation. In this way efficient structural method dispatch is made pos-sible on the nominally-typed JVM.

In [32] Dubochet and Odersky describe their implementation of structural typing onthe JVM used by the Scala compiler and compares it to Whiteoak’s approach. Scala usesreflection instead of bytecode generation to dispatch structural methods calls to the rightruntime class, and the method handles are cached inline at each call site for improvedruntime performance. Although Whiteoak is found to be faster in scenarios where theirglobal caching scheme works well, for example in very tight loops with low degree ofpolymorphism, the difference is conjectured to be small in practice. Scala’s approach isall in all found to be a good alternative to Whiteoak’s, considering that the reflectivetechnique is simpler to implement and maintain, does not incur a runtime dependencyon a byte code generation framework, and does not require access permissions to theclass-loader.

It should be noted that the compilation of structural types on the JVM as discussedby Gil and Maman [6] and Dubochet and Odersky [32] is a more general problem thanthe one considered in this thesis. There are several reasons for why it is possible to de-vise more efficient compilation schemes for records than for general structural types:First, the records considered in this thesis are exclusively viewed as data containers, thusavoiding problems with method dispatch. Second, records are declared as such at thecreation site and can be prepared for structural typing from scratch. Finally, the field se-lection problem is here simplified by considering a weaker form of subtyping using ex-plicit coercion.

Chapter 9

Conclusions

The goal of this thesis was to answer the question:

What are the possible approaches to record types in Scala and what are their respec-tive strengths and weaknesses?

To that end, existing and possible new approaches to records ave been described andcompared both qualitatively and quantitatively.

Six different existing approaches to records in Scala were described: scala-records 0.3,scala-records 0.4, Compossible 0.2, Shapeless 2.3.2 as well as Scala’s native anonymousstructural refinement types and Dotty’s new structural refinement types. The syntax andsemantics of basic features such as record creation, field access, type-safety and equalitywere investigated, as well as the support for structural subtyping and record polymor-phism.

To complement the qualitative evaluation with quantitative benchmarks of runtimeand compile-time performance, a novel benchmarking suite for records running on theJVM called Wreckage was presented. The Benchmarking suite is built on top of the JavaMicrobenchmark Harness (JMH) which is a widely used and trusted microbenchmarkingframework for the JVM.

Overall, the existing libraries were found to be in better shape than expected; Shape-less no longer suffers from the exponential compile times it used to and contrary to whatthe documentation says explicit types are now supported by scala-records 0.4. However,three common weaknesses were found among the investigated approaches: Dependencyon whitebox macros, suboptimal runtime performance compared to nominally typedclasses and poor support for record operations such as extension, restriction, update andrelabeling. As current versions of Scala do support whitebox macros and the featurescurrently provided by macros can be ported to native compiler support in Dotty in thefuture, the first point was not investigated further. Instead, the focus was put on findingways for improving the second and third point.

Various possible compilation schemes for record types with different subtyping ruleswere described along with their possible type-level representation and potential runtimeperformance. Seven different possible approaches for storing and accessing record val-ues on the JVM were then benchmarked using the presented Wreckage benchmarkingsuite: arrays, linked lists, hash maps, Scala classes with one trait per field, Java classesusing field reflection, Java classes using method reflection and Scala structural types us-ing cached method reflection.

83

84 CHAPTER 9. CONCLUSIONS

To achieve field access times comparable to nominally typed classes, it is conjecturedthat width subtyping has to be restricted to explicit coercion and a compilation schemefor such record types using an array as underlying datastructure was sketched. For un-ordered record types with width and depth subtyping however, the hash map was foundto have the most attractive runtime performance characteristics. For records using Dotty’snew structural refinement types, the hashmap-based implementation presented in Sec-tion 4.6 therefore seems like a good option.

Shapeless was found to provide a promising approach to type-safe extension, restric-tion, update and relabeling of records using type-classes and implicit resolution to guar-antee the correctness of the operations. Provided that these type classes can be imple-mented in Dotty, either by some kind of macros or by native compiler support, the newstructural types in Dotty might strike a good balance between flexibility and runtimeperformance for records in the future.

Bibliography

[1] Martin Odersky. What is Scala? https://www.scala-lang.org/what-is-scala.html.[Online; accessed 22-June-2017].

[2] Martin Odersky, Vincent Cremet, Christine Röckl, and Matthias Zenger. A nominaltheory of objects with dependent types. In 17th European Conference on Object-OrientedProgramming (ECOOP ’03), pages 201–224, 2003.

[3] Nada Amin, Samuel Grütter, Martin Odersky, Tiark Rompf, and Sandro Stucki. Theessence of dependent object types. In A List of Successes That Can Change the World,pages 249–272. Springer, 2016.

[4] Vojin Jovanovic, Tobias Schlatter, Plociniczak, et al. scala-records 0.4. https:

//github.com/scala-records/scala-records/tree/v0.4, . [Online; accessed 4-June-2017].

[5] Vojin Jovanovic, Tobias Schlatter, Plociniczak, et al. Why Scala records with struc-tural types and macros? https://github.com/scala-records/scala-records/wiki/

Why-Scala-Records-with-Structural-Types-and-Macros%3F, 2015. [Online; accessed22-May-2017].

[6] Joseph Gil and Itay Maman. Whiteoak: Introducing structural typing into Java. InProceedings of the 23rd Annual ACM SIGPLAN Conference on Object-Oriented Program-ming, Systems, Languages, and Applications (OOPSLA ’08), pages 73–90, 2008.

[7] Rob Norris. Issue #486 compilation time for record access depends on value types.https://github.com/milessabin/shapeless/issues/486, 2015. [Online; accessed 22-June-2017].

[8] Rob Norris. Why no one uses scala’s structural typing. http://www.

draconianoverlord.com/2011/10/04/why-no-one-uses-scala-structural-typing.

html, 2011. [Online; accessed 22-June-2017].

[9] Ward Van Heddeghem, Sofie Lambert, Bart Lannoo, Didier Colle, Mario Pickavet,and Piet Demeester. Trends in worldwide ict electricity consumption from 2007 to2012. Computer Communications, 50:64–76, 2014.

[10] Benjamin C Pierce. Types and programming languages. MIT press, 2002.

[11] Miran Lipovaca. Learn you a haskell for great good!: a beginner’s guide. no starch press,2011.

85

86 BIBLIOGRAPHY

[12] Yaron Minsky, Anil Madhavapeddy, and Jason Hickey. Real World OCaml: Functionalprogramming for the masses. O’Reilly Media, Inc., 2013.

[13] Don Syme, Anar Alimov, Keith Battocchi, Jomo Fisher, Michael Hale, Jack Hu, LukeHoban, Tao Liu, Dmitry Lomov, James Margetson, Brian McNamara, Joe Pamer,Penny Orwick, Daniel Quirk, Kevin Ransom, Chris Smith, Matteo Taveggia, DonnaMalayeri, Wonseok Chae, Uladzimir Matsveyeu, Lincoln Atkinson, et al. The F# 3.1language specification. fsharp.org, January, 2016.

[14] Atsushi Ohori. A polymorphic record calculus and its compilation. ACM Transactionson Programming Languages and Systems (TOPLAS ’95), 17(6):844–895, 1995.

[15] Martin Odersky, Philippe Altherr, Vincent Cremet, Gilles Dubochet, Burak Emir,Philipp Haller, Stéphane Micheloud, Nikolay Mihaylov, Adriaan Moors, Lukas Rytz,Michel Schinz, Erik Stenman, and Matthias. Zenger. Scala 2.11 language specifica-tion. scala-lang.org, March, 2006.

[16] Luca Cardelli and Peter Wegner. On understanding types, data abstraction, andpolymorphism. ACM Computing Surveys (CSUR), 17(4):471–522, 1985.

[17] Mitchell Wand. Complete type inference for simple objects. In Proceedings of theSymposium on Logic in Computer Science (LICS ’87), pages 37–44, 1987.

[18] Mitchell Wand. Corrigendum: Complete type inference for simple objects. In Pro-ceedings of the Third Annual Symposium on Logic in Computer Science (LICS ’88), page132, 1988.

[19] Didier Rémy and Jérôme Vouillon. Objective ML: An effective object-oriented exten-sion to ML. Theory and Practice of Object Systems (TAPOS), 4(1):27–50, 1998.

[20] Dotty documentation 0.1.1. http://dotty.epfl.ch/docs/. [Online; accessed 3-June-2017].

[21] George Leontiev, Eugene Burmako, Jason Zaugg, Adriaan Moors, Paul Phillips, andOron Port. Sip-23 - literal-based singleton types. http://docs.scala-lang.org/sips/

pending/42.type.html. [Online; accessed 29-May-2017].

[22] Vojin Jovanovic, Tobias Schlatter, Plociniczak, et al. scala-records 0.3. https:

//github.com/scala-records/scala-records/tree/v0.3, . [Online; accessed 4-June-2017].

[23] Jan Christopher Vogt. Compossible. https://github.com/cvogt/compossible. [On-line; accessed 15-May-2017].

[24] Miles Sabin et al. Shapeless. https://github.com/cvogt/compossible. [Online;accessed 15-May-2017].

[25] Martin Odersky. Rethink structural types #1886. https://github.com/lampepfl/

dotty/issues/1886, 2017. [Online; accessed 22-May-2017].

[26] Olof Karlsson. The Wreckage Records Benchmarking Library. https://github.com/

obkson/wreckage.

BIBLIOGRAPHY 87

[27] Andy Georges, Dries Buytaert, and Lieven Eeckhout. Statistically rigorous Java per-formance evaluation. ACM SIGPLAN Notices, 42(10):57–76, 2007.

[28] Vojtech Horky, Peter Libic, Antonin Steinhauser, and Petr Tuma. DOs andDON’Ts of conducting performance measurements in Java. In Proceedings of the 6thACM/SPEC International Conference on Performance Engineering, pages 337–340. ACM,2015.

[29] Oracle Corporation. Java Microbenchmark Harness (JMH). http://openjdk.java.

net/projects/code-tools/jmh/, 2017. [Online; accessed 16-May-2017].

[30] Petr Stefan, Vojtech Horky, Lubomir Bulej, and Petr Tuma. Unit testing performancein Java projects: Are we there yet? In Proceedings of the 8th ACM/SPEC InternationalConference on Performance Engineering, pages 401–412. ACM, 2017.

[31] Vojin Jovanovic. scala-records-benchmarks. https://github.com/scala-records/

scala-records-benchmarks. [Online; accessed 25-May-2017].

[32] Gilles Dubochet and Martin Odersky. Compiling structural types on the JVM: acomparison of reflective and generative techniques from Scala’s perspective. In Pro-ceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems (ICOOOLPS ’09), pages 34–41, 2009.

[33] Gilles Dubochet. Scala git commit cea527a9dc7cfef933ed911b8196858f412827b2.https://github.com/scala/scala/commit/

cea527a9dc7cfef933ed911b8196858f412827b2, 2007. [Online; accessed 14-May-2017].

[34] Daan Leijen. Extensible records with scoped labels. In Proceedings of the 2005 Sympo-sium on Trends in Functional Programming (TFP ’05), pages 297–312, 2005.

[35] Scala collections performance characteristics. http://docs.scala-lang.org/

overviews/collections/performance-characteristics.html. [Online; accessed 8-June-2017].

[36] Miles Sabin. Shapeless git commit d4c3c71933e8c4ab6bc1fcde17e92961f9c0f897.https://github.com/milessabin/shapeless/commit/

d4c3c71933e8c4ab6bc1fcde17e92961f9c0f897, 2015. [Online; accessed 25-May-2017].

[37] Martin Odersky. Scala, the road ahead. https://www.slideshare.net/Odersky/

scala-days-nyc-2016, 2016. Scala days NYC 2016.

[38] Martin Odersky. Add records to Dotty #964. https://github.com/lampepfl/dotty/

issues/964, 2015. [Online; accessed 22-May-2017].

[39] Jesper Nordenberg. Type lists and heterogeneously typed arrays. http://

jnordenberg.blogspot.se/2009/09/type-lists-and-heterogeneously-typed.html,2009. [Online; accessed 22-May-2017].

[40] Luca Cardelli. Extensible records in a pure calculus of subtyping. In Carl A. Gunterand John C. Mitchell, editors, Theoretical Aspects of Object-oriented Programming, pages373–425. MIT Press, 1994.

88 BIBLIOGRAPHY

[41] Siarhei Matsiukevich. Golang internals, part 2: Diving into the Go compiler. https:

//blog.altoros.com/golang-internals-part-2-diving-into-the-go-compiler.html,2015.

[42] Scott McKinney. Structural types in Gosu. https://gosu-lang.github.io/2014/04/

22/structural-types-in-gosu.html, 2014. [Online; accessed 2-May-2017].

[43] Urs Hölzle, Craig Chambers, and David Ungar. Optimizing dynamically-typedobject-oriented languages with polymorphic inline caches. In European Conferenceon Object-Oriented Programming (ECOOP ’91), pages 21–38. Springer, 1991.

Appendix A

Whiteoak 2.1 Benchmarks

In Fig. A.1 and Fig. A.2 Whiteoak 2.1 is compared to Java method reflection, Java fieldreflection and Scala’s cached reflection for anonymous refinement types. Note that thisversion of Whiteoak does not employ the generative technique described by Gil and Ma-man [6] and is therefore not included in the comparison in Chapter 6. It is instead sup-plied here for reference without further investigation.

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

200

400

600

800

1,000

1,200

1,400

1,600

1,800

2,000

Record Size

Acc

ess

tim

e[n

s]

Whiteoak 2.1Method ReflectionField ReflectionAnon. Refinements

Figure A.1: Record access time against record size in number of integer fields. Measuredas mean steady state execution time per access operation on records with 1, 2, 4, 6, ... up to32 fields. For each size, the field with highest index was accessed. Plotted with 99.9% con-fidence intervals. Whiteoak 2.1 is compared to Java method reflection, Java field reflectionand Scala’s cached reflection for anonymous refinement types.

89

90 APPENDIX A. WHITEOAK 2.1 BENCHMARKS

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

200

400

600

800

1,000

1,200

1,400

1,600

1,800

2,000

2,200

2,400


Acc

ess

tim

e[n

s]

Whiteoak 2.1Method ReflectionField ReflectionAnon. Refinements

Figure A.2: Record access time against degree of polymorphism on an array of differentrecords with 32 integer fields. Measured as mean steady state execution time per field ac-cess (including array indexing) and plotted with 99.9% confidence intervals. Whiteoak 2.1is compared to Java method reflection, Java field reflection and Scala’s cached reflection foranonymous refinement types.

www.kth.se

Record Types in Scala: Design and Evaluation1123270/...more, Scala has its theoretical foundation in...

Documents

Transcript of Record Types in Scala: Design and Evaluation1123270/...more, Scala has its theoretical foundation in...