RATIONAL DESIGN OF POLYMER DIELECTRICS VIA FIRST ... · Once a polymer chemical subspace has been...

22
RATIONAL DESIGN OF POLYMER DIELECTRICS VIA FIRST PRINCIPLES COMPUTATIONS AND MACHINE LEARNING Arun Kumar Mannodi-Kanakkithodi Materials Science and Engineering Department Institute of Materials Science University of Connecticut, Storrs, USA Principal Adviser: Dr. Rampi Ramprasad Associate Adviser: Dr. Yang Cao Associate Adviser: Dr. George Rossetti ABSTRACT Polymers are commonly used as dielectric material in capacitors owing to their easy processability and tendency to display ‘graceful failure’ at high electric fields. In capacitors tailored for high energy density applications, Biaxially Orientied Polypropylene (BOPP) is the current state of the art polymer dielectric. Although BOPP shows low dielectric losses and high breakdown strengths, it has a rather low dielectric constant which adversely affects the energy density. The prevalent electrification of transportation as well as military and civilian systems has increased the demand for high energy density capacitors, and the need of the hour is to significantly expand the pool of polymer choices that can serve as dielectrics. Towards this aim, a first principles computation approach is applied in combination with machine learning techniques to study selected chemical subspaces of organic and organometallic polymers. Density Functional Theory (DFT) is used as the tool to obtain stable crystalline arrangements of the polymers along with their two key properties: the dielectric constant and the band gap. Promising candidates are selected from regions where both property values are high, and these systems are recommended for synthesis and taken for in-depth computational studies. All the computational data thus generated is collected in the form of a comprehensive database that enables easy looking up of specific polymers and their properties. Further, the data is mined for unearthing hidden polymer design rules and for developing machine learning models to accelerate prediction and design. This happens via an intermediate polymer ‘fingerprinting’ step, with qualitative and quantitative correlations made between the fingerprint components and the properties. ‘On-demand’ property prediction and polymer design models are thus trained, tested and validated for the selected polymer chemical space. The successes highlighted with the polymer dielectrics problem makes this general materials design philosophy suitable to be applied to many other fields in materials science.

Transcript of RATIONAL DESIGN OF POLYMER DIELECTRICS VIA FIRST ... · Once a polymer chemical subspace has been...

RATIONAL DESIGN OF POLYMER DIELECTRICS VIA FIRST PRINCIPLES COMPUTATIONS AND MACHINE LEARNING

Arun Kumar Mannodi-Kanakkithodi

Materials Science and Engineering Department

Institute of Materials Science

University of Connecticut, Storrs, USA

Principal Adviser: Dr. Rampi Ramprasad

Associate Adviser: Dr. Yang Cao

Associate Adviser: Dr. George Rossetti

ABSTRACT

Polymers are commonly used as dielectric material in capacitors owing to their easy processability and tendency to display ‘graceful failure’ at high electric fields. In capacitors tailored for high energy density applications, Biaxially Orientied Polypropylene (BOPP) is the current state of the art polymer dielectric. Although BOPP shows low dielectric losses and high breakdown strengths, it has a rather low dielectric constant which adversely affects the energy density. The prevalent electrification of transportation as well as military and civilian systems has increased the demand for high energy density capacitors, and the need of the hour is to significantly expand the pool of polymer choices that can serve as dielectrics. Towards this aim, a first principles computation approach is applied in combination with machine learning techniques to study selected chemical subspaces of organic and organometallic polymers. Density Functional Theory (DFT) is used as the tool to obtain stable crystalline arrangements of the polymers along with their two key properties: the dielectric constant and the band gap. Promising candidates are selected from regions where both property values are high, and these systems are recommended for synthesis and taken for in-depth computational studies. All the computational data thus generated is collected in the form of a comprehensive database that enables easy looking up of specific polymers and their properties. Further, the data is mined for unearthing hidden polymer design rules and for developing machine learning models to accelerate prediction and design. This happens via an intermediate polymer ‘fingerprinting’ step, with qualitative and quantitative correlations made between the fingerprint components and the properties. ‘On-demand’ property prediction and polymer design models are thus trained, tested and validated for the selected polymer chemical space. The successes highlighted with the polymer dielectrics problem makes this general materials design philosophy suitable to be applied to many other fields in materials science.

1. INTRODUCTION

1.1 Background

Dielectric materials find wide applicability owing to their ability to polarize under applied electric fields. One such application is in capacitors- useful in electronic devices for energy storage purposes, in pulsed power applications, or as temporary batteries, for instance in car audio and stereo systems. The pervasive utility of capacitors comes from the fact that not only can they can store large amounts of electrical energy, they can discharge it all in a single flash.

In any capacitor, the amount of energy that can be stored depends on the dielectric and electronic characteristics of the dielectric interface between the metal plates (Figure 1a). In a linear capacitor, the charge stored (or the polarization) is directly and linearly proportional to the voltage, resulting in a constant capacitance. However, unless the dielectric interface is vacuum, there is always nonlinear dependence of the polarization on the voltage (or the electric field), as shown in Figure 1b. The area under the curve between the polarization and the applied electric field, known as the D-E loop, yields the energy stored in the capacitor. This transforms into a relationship between the energy on one side and the electric field and dielectric constant on the other, as the dielectric constant is given by the change in polarization with applied electric field. That is,

𝒅𝑷𝒅𝑬

=  𝛆

𝑼 = 𝑬𝒅𝑷 = 𝛆𝑬𝒅𝑬𝑬𝒃

𝟎

𝑬𝒃

𝟎

where dP is the change in polarization induced by the applied electric field E, ε is the dielectric constant, Eb is the breakdown electric field and U gives the energy density of the capacitor. For a linear capacitor, this converts into the following simple equation: U = ½*ε*Eb

2

Figure 1. (a) A simple capacitor containing a dielectric in between conductive plates, with area A and thickness d marked. (b) A standard polarization curve for a nonlinear dielectric [1].

While the relationship is not quite as simple for nonlinear dielectrics, the energy storage is still dependent on ε and Eb, and increasing both quantities will help increase the energy density of the capacitor. Thus, the optimal dielectric for high energy density applications would have a high dielectric constant and would break down at high electric fields. Many kinds of ceramics have traditionally served as capacitor dielectrics, for example lead zirconate and lead titanate. However, polymers are particularly attractive dielectric material options due to their easy processability, flexibility, high resistance to external chemical attacks, and graceful failure [1]. Many organic polymers such as Polyvinylidene fluoride (PVDF), Polypropylene (PP) and Polyethylene terephthalate (PET) have been used as dielectrics in a variety of energy and electronics related applications [2-4]. The key properties of consideration in dielectric polymers are not only the dielectric constant and the dielectric breakdown strength, but also the dielectric loss and mechanical properties, among others.

The current state-of-the-art polymer dielectric is Biaxially Oriented Polypropylene (BOPP), which has a modest dielectric constant of 2.25 but a very high dielectric breakdown strength of 750 MV/m and a small area of ~ 1cm2. This leads to an energy density of ~ 6 J/cm3. However, BOPP has quite a few limitations; while it can function at high electric fields, its low dielectric constant certainly imposes a restriction on the energy density. BOPP also suffers from significant dielectric losses due to electronic conduction at higher temperatures [3,5,6]. Thus, there have been experimental as well as computational efforts in improving over BOPP as the dielectric polymer candidate.

Much of the work in this regard has taken place with Polyvinylidene fluoride (PVDF), and related modifications. BOPP and most of the other dielectric polymer candidates are nonpolar polymers; atomic and electronic polarizations alone cannot contribute sufficiently to increasing the dielectric constant. PVDF was thus pursued, given that its orientational polarization and high dipole density could be exploited for high energy densities. Defect-modified PVDF, PVDF-HFP and PVDF-CTFE, as well as PVDF with inorganic fillers added to the matrix have been studied and recommended for polymer dielectrics, with high dielectric constants of ~10 and energy densities of 30 J/cm3 achieved [7-11]. However, a major problem with PVDF and its derivatives was their ferroelectric nature, which results in a hysteresis D-E loop. This causes heavy energy losses as compared to a paraelectric material, and makes the polymer unsuitable as dielectric for energy storage.

1.2 Need for new dielectric polymers and opportunities

Polymer dielectrics for modern power electronics applications require not only high energy densities, but also high temperature capabilities and miniaturization, without affecting the cost too much. Moreover, there has been a massive increase in the demand for high energy density capacitors owing to the electrification of land and sea transportation, as well as other military and civilian systems. Each of the current possible choices for polymer dielectric applications suffers from one shortcoming or the other. The need of the hour is to have a lot more such polymer choices, so that weaknesses can be eliminated and prospects improved [12].

Given the substantial known and unknown polymer chemical universe, the greatest challenge is the difficulty of experimental consideration of a large dataset of polymers; the synthesis and property measurement of polymers in a case-by-case manner, leading (one hopes) to the eventual identification of desirable systems, is a very involved and expensive process. Computations come to the rescue

here: it is much faster to study a large number of materials on a computer, and apply initial screening criteria to down-select polymers that can then be studied experimentally. Therefore, computations when combined with experiments in a rational manner can result in the quick and efficient design of new and improved polymer dielectrics [13].

Once a polymer chemical subspace has been identified, property data can be generated for a selected set of polymers using some flavor of computational materials science, such as first principles computations. Such a database will provide the following opportunities: a) Down-selection of promising dielectric polymer candidates for laboratory synthesis and electrical testing, b) In-depth computational study of synthesized polymers for further validation and expansion to more chemical spaces, and c) Extracting underlying chemical rules and machine learning models out of the data in order to further accelerate the prediction of properties and design of polymers.

Whereas steps (a) and (b) constitute a rational co-design approach to discovery of new materials, step (c) assumes great importance in terms of appropriately utilizing the data and conserving resources by not having to perform unnecessary future computations. This strategy of learning from past data would negate the need for computational consideration of all materials in a case-by-case manner, and in the case of dielectric polymers, provide the means to go from any polymer to its properties as well as any target properties to the appropriate polymer(s) in an on-demand fashion. Further, this results in the generation of a useful ‘polymer database’ that can be queried for information on polymers.

2. MY PROPOSAL

2.1 Chemical space of polymers

The first step towards the exploration of a large number of polymers as possible dielectric candidates is to identify a specific chemical space that the polymers will be chosen from. Figure 2 gives a glimpse of the immense expanse of such possibilities, with a variety of chemical units and building blocks, both organic and organometallic, well studied and rarely explored, shown as likely constituents of polymers. This includes simple linear polymers such as Polyethylene and Polypropylene, polymers containing aromatic units such as Kevlar and Polythiophene, and polymers containing metal-based units such as Sn-halide and Ti-ester, among others.

There are thousands of polymer possibilities such as these; as explained earlier, experimental consideration of this entire space is an improbability. While this is a vast space to even study computationally, a limited chemical subspace(s) can be selected to perform controlled, automated first principles computations on, and generate useful polymer property databases. Considering purely organic polymers first, I consider a chemical space where polymers are constructed by simple linear combinations of the following building blocks: CH2, CO, NH, C6H4, C4H2S, CS and O units, as shown in Figure 3a. The rationale for selecting these particular chemical units comes from their omnipresence in much of the well-known organic polymer chemical space today, as well as the ease of computations given the simple, well-defined atoms and bonds they contain. Any polymer in this chemical space containing n blocks in its repeating unit is referred to as an ‘n-block polymer’ [14].

Figure 2. The vast chemical space of polymer building blocks.

Next, I consider organometallic polymers: the rationale here comes from a study of the compounds of Group 14 elements [15,16] as well as from a study of single chain polymers containing Group 14 element based units [17], both of which revealed Sn and Ge as attractive metal atoms to be introduced in the polymer backbone. I take a chemical space of Sn containing polymers- Sn-ester based units incorporated in otherwise organic polymer chains (like PE)- and perform first principles computations to explore their relevant properties. Figure 3b gives a glimpse of these polymers; a variety of such systems are taken with different number of -CH2- units joining one Sn-ester unit to the next [18,19]. Further, Sn is replaced in these polymers by other metal atoms such as Ti, Zr, Cd, Zn etc., in order to study many more metal containing polymer dielectrics.

Figure 3. (a) The pool of organic building blocks. Also shown is a typical 4-block polymer, containing linearly connected blocks B1, B2, B3 and B4 in its repeating unit. (b) Sn-ester units introduced in a PE chain to form organo-Sn polymer chains.

2.2 Property Space of Polymers

With regards to polymer dielectric applications, the properties of interest are not limited to but include a subset of the following: dielectric constant, dielectric breakdown field, electronic bandgap, dielectric loss, morphology, glass transition temperature and mechanical strength. However, in my work, I restrict myself to using Density Functional Theory (DFT) as the means to computationally study dielectric polymers, and DFT only permits the reliable estimation of two of these properties: the dielectric constant and the bandgap. As explained in Section 1.1, in high energy density capacitors, the need is for polymers that show high dielectric constant values and large dielectric breakdown strengths. The bandgap provides a good theoretical substitute for the dielectric breakdown field strength since a higher bandgap would imply a higher threshold for impact ionization. A polymer that breaks down at a high electric field would correspond to a large bandgap value, and although a large bandgap doesn’t necessarily imply large breakdown strength, it can still serve the preliminary screening purpose along with the dielectric constant [20,21]. ‘High dielectric constant and large bandgap’ can thus be used as effective initial screening criteria for new and promising dielectric polymers before experiments (that can yield breakdown and energy density estimates) and further in-depth computations are done.

2.3 Specific Objectives and Goals

A. Data generation using high-throughput Density Functional Theory (DFT) computations.

Performing accurate DFT estimation of their properties involves determining the stable polymer crystal structures first, followed by subjecting the most stable structure to DFT calculations. The computations are performed in a ‘high-throughput’ manner [12,22-24], that is, on a substantial number of polymer systems, and in an automated fashion at a uniform level of theory. The output of this stage is a database of the DFT computed dielectric constants and bandgaps of organic polymers and organometallic polymers, as shown in Figure 4.

Figure 4. Generation of dielectric polymer database using DFT.

B. Learning from data: extract guidelines and design rules.

The data that has been generated can be utilized in applying machine learning techniques so that essential underlying rules of the chemical space can be brought to light. A look at the property regime of interest- high values of dielectric constant and bandgap- will enable us to see what kind of polymers, or what kind of constituent chemical blocks in polymers, give rise to desirable

combinations of properties. Further, factors such as chemical coordination environments and basic bond/atom characteristics can affect the properties. I will convert all the polymers into numerical fingerprints based on all these factors, and find correlations between the numbers and the properties (this is depicted in Figure 5). General design rules can then be devised for desirable dielectric polymers, providing guidance for experimentation and future studies based on the features

Figure 5. Correlating important features of polymers to the properties.

C. Rapid prediction and design algorithms: going beyond the chemical subspace.

I will attempt a mapping from the polymer fingerprint to the properties using the machine learning technique of regression, in order to train a property prediction model. After testing and validation, such a model allows the on-demand prediction of dielectric constants and bandgaps of polymers that did not belong to the initial database. I will further combine a genetic algorithm with the prediction model to efficiently search for polymers showing desirable properties, thus developing an on-demand dielectric polymer design model. These steps have been shown in Figure 6.

Figure 6. Development of property prediction and polymer design models.

D. Streamline guidance to synthesis by creating design tools, and ascertain iterative feedback between experiments and computations.

Based on the above three objectives, I would have the following kinds of ‘tools’: a) a polymer database that enables looking up of properties, and selection of promising cases, b) a set of design

rules that tells us what combinations of chemical blocks should be polymerized to obtain desirable properties, c) an on-demand property prediction model that requires just the identity of any new n-block polymer to give as output its properties, and d) an on-demand polymer design model that given a set of property requirements outputs the specific n-block polymer(s) showing those properties. Each of these tools provides a certain utility to the experimentalist, and makes it much simpler to decide on the next dielectric polymer candidate to be synthesized, tested and more.

Further, an extremely valuable iterative feedback loop can be established between experiments and computations [21]. It may happen that a design tool recommended polymer is not exactly amenable to synthesis, but a slight variation of it may be- at this point, the properties of the new polymer can be immediately seen from the on-demand prediction model. Moreover, polymers chosen for synthesis can also be studied further computationally, in a much more detailed manner than before. This enables delving deeper into the systems than was permissible by earlier high-throughput DFT computations and the statistical models, which have inherent theoretical limitations.

TIMELINE

The following timeline captures the current status and future work related to each of the objectives-

3. METHODS

The methods I will use as part of my thesis can be divided into three broad steps with respect to a general computational materials design philosophy, as shown in Figure 7. This includes the data generation step, the instant property prediction (using machine learning or statistical learning) step, and the direct design of desirable materials (using genetic algorithm) step.

Figure 7. Schematic showing the data generation, property prediction (using regression) and direct design (using genetic algorithm) steps towards materials design.

3.1 High-throughput computations

In order to perform DFT estimation of polymer properties, it is essential to first determine their stable crystal structural conformations. Given that we do not possess structure information for the largely novel (even hypothetical) polymers that constitute the selected chemical subspace, it is important to energetically examine different configurations of polymer chain arrangement. I use a structure prediction algorithm called Minima Hopping method (MHM) [25,26] to explore many possible crystal structures, with the necessary total potential energies and atomic forces determined using DFT.

The most stable polymer crystal structure thus obtained undergoes tight DFT relaxation, following which the properties are computed. The relaxed structure goes as input into a density functional perturbation theory (DFPT) [27] calculation, which yields the dielectric constant tensor that includes the electronic component [28] as well as the ionic (lattice) component [29]. Further, the Heyd-Scuseria-Ernzerhof (HSE) [30] functional is used to obtain the bandgap values, which are known to be accurate with respect to experimental measurements. The application of MHM and DFT on a large number of systems in an automated fashion makes this a ‘high-throughput’ treatment, and this enables the crucial first step towards computational design of dielectric polymers, as shown in Figure 7.

3.2 Machine Learning: Kernel Ridge Regression

Kernel Ridge Regression (KRR) is used to map the fingerprints to the properties to train a prediction model, the second step as shown in Figure 7. KRR is a widely used similarity-based prediction technique [31-33] where the Eucledian distances between fingerprints are used to compute a distance

Kernel; in this work, a Gaussian Kernel will be used. The property is expressed as a weighted sum of the Gaussians. The different parameters that go into the training of such a model are the Gaussian width parameter, the regularization parameter (which takes care of overfitting in the data), and the coefficients of the Gaussians, each of which are changed in a systematic manner so as to achieve maximum closeness between the weighted sum of the kernels and the property. The DFT generated data will be divided into two sets, the training set and the test set. The training set is used to train the KRR model and thus come up with the prediction model with the minimum error in property prediction. The best models thus obtained are used to predict the properties on the test set in order to evaluate their true out-of-sample performance. To ensure the best possible training in an unbiased manner, a cross-validation technique is used, where the training set is divided into two sets and one set is used for preliminary training with validation done on the other.

3.3 Direct Design: Genetic Algorithm

A genetic algorithm (GA) approach is applied as the means to optimize polymers, given some target properties. It has been shown that GA is a very efficient approach in searching for materials with desired properties when compared to other approaches like random search and even chemical-rules based search [34]. The idea here is to start with a random initial population of new polymers and let them undergo evolution (in terms of the polymer fingerprint) based on the principles of GA, finally yielding a set of polymers with properties closest to the provided targets. At any step, the properties of the polymers can be computed instantly using the on-demand prediction model that came out from step two. This would be the direct design step, the third step shown in Figure 7. Given the target dielectric constant and bandgap, the algorithm will generate a list of a few hundred polymers, which serve as the first generation. Based on the predicted property values, a fitness score is assigned to every polymer and all the polymers are ranked according to this score. While polymers with satisfactory fitness scores survive, the rest undergo different kinds of evolutions, namely crossover, elitism and mutation. New generations of polymers are produced in this manner; a stopping criterion is provided based on the fitness score, and once polymers with suitable fitness scores are obtained, the algorithm stops.

4. COMPLETED WORK

4.1 Organic Polymer Data Generated

As explained in Section 2.1, organic polymers are generated by linearly connecting chemical building units chosen out of a pool of 7 basic blocks. Higher the number of blocks in the polymer repeat unit, larger is the polymer system (more number of atoms and bonds, and thus more orientational degrees of freedom), which provides a computational bottleneck. Keeping this in mind, I considered only 4-block polymers for structure prediction calculations followed by DFT estimation of dielectric constants and bandgaps. The simplest polymer of this kind would simply be -CH2-CH2-CH2-CH2- (PE), that is all 4 blocks in the repeat unit being CH2. All polymers with O-O, CS-CS, NH-NH and CO-CO adjoining blocks in the repeat unit were eliminated due to stability concerns, and further applied symmetry-uniqueness considerations yielded 284 4-block polymers that I performed computations on [14].

Using MHM to obtain stable crystal structure arrangements of all 284 4-block polymers is a vital extension of prior computational work done on the same chemical space of polymers. Sharma et al. [12] considered long isolated polymer chains and computed their properties using DFT in a method that was fast and efficient, but yielded results with significant error bars. The explicit consideration of stable polymer crystal structures in this work makes the results much more accurate and useful, albeit expensive. Figure 8 shows the DFT computed dielectric constants and bandgaps (Egap) of the 4-block polymers, plotted against each other. The total dielectric constant (εtot) is divided into two contributions that we obtain individually from DFT: the electronic part (εelec) and the ionic part (εion). Shown in the inset in Fig. 4 is εion plotted vs the Egap. It can be seen εion takes almost an order of magnitude smaller values than εtot, meaning the major contribution to the dielectric constant is from εelec. Further, it can be seen that there is an inverse relationship between εtot and Egap, pointing towards the difficulty of increasing both properties simultaneously.

Figure 8. Dielectric constants and bandgaps of all 4-block organic polymers.

4.2 Organometallic Polymer Data Generated

The inverse relationship of εelec with Egap is an indication that the way to increase εtot while keeping Egap sufficiently high is to specifically try to enhance εion. This involves considering atoms and bonds that increase the orientational polarization of the system. The stretching and swinging of dipoles at low frequencies is what leads to a high ionic contribution, and one way to do this would be by having bonds between metal atoms and electronegative atoms in the polymer backbone.

Following the discovery by Pilania et al. [17] that SnF2 and SnCl2 units in polymer chains lead to a substantial increase in εtot, the compounds of Group 14 elements [15] were studied computationally and the results are shown in Figure 9a. Sn based flurorides and chlorides clearly show the largest dielectric constants, while maintaining high bandgap values as well. Incorporating Sn containing units into polymer chains is an attractive option to improve its dielectric properties.

Figure 9. (a) Dielectric constants of compounds based on Group 14 elements. (b) The variation of dielectric constant, and (c) bandgap as the number of CH2 units linking the Sn-ester units changes.

This led to the computational and experimental design of Sn-ester based polymers [18,19]. These polymers contained chains with Sn-ester units linked to each other via intermediate CH2 units. Figure 9b and Figure 9c show the dielectric constants and bandgaps respectively for a series of Sn-ester based polymers differing in the number of CH2 linker units (from 0 to 11). These properties have been shown for three different kinds of crystal structural motifs seen in the Sn-esters, and experimental measurements have also been shown for comparison. The dielectric constants are uniformly high, with values greater than 6 for systems with 4-7 CH2 units, whereas the bandgaps are all around 6 eV.

Figure 10. εelec, εion and εtot plotted against Egap for all the organic and organometallic polymers (as well as a few non-polymeric crystals, denoted by ‘NP’) studied computationally so far.

Clearly, much more favorable combinations of high dielectric constant and large bandgap could be attained with the Sn-ester based polymers as compared to purely organic polymers, and inserting metal based units in the polymer backbone is definitely the way to go. This led to the computational study of a number of other metal containing polymers, such as Ti, Zn and Cd, in similar polymer repeat units and coordination environments as the Sn-esters. The consideration of a variety of different organometallic polymers like this was very illuminating in terms of the combinations of properties obtained, and further highlighted the limitations of purely organic polymers.

Figure 10 shows εelec, εion and εtot plotted against Egap for all the organic and organometallic polymers considered in this work so far [35]. It can be seen that the organometallic polymers simply stand out

in terms of high dielectric constants and large bandgaps, when compared to the purely organic polymers. With all the different kinds of metals used, especially Sn and Zn, there has been a significant increase in εion. For a given band gap, the accessible dielectric constant of the organometallic polymers is, in general, superior to that of the organic materials (within the scope of the adopted initial screening criteria). Although organic polymers are more common and offer definite advantages like being amenable to synthesis and forming films, their dielectric behavior clearly lags behind organometallics, indicating that metals atoms in polymer backbones is certainly the way to go.

4.3 Learning from organic polymers data

While we can use the high-throughput generated data to obtain useful `lead candidates' with desired properties, a natural question is whether we can understand the origins of the attractive behavior in polymers, and harness this understanding to search for other suitable options. Within the context of organic polymeric materials described in Section 4.1, the origins should be traceable to the identities of the chemical building blocks. If the polymers can be numerically represented- or fingerprinted- on the basis of their constituent building blocks, correlations can be established between the fingerprints (or parts of it) and the properties [32,33]. The key requirements are that the fingerprints should be intuitive, easily computable, invariant with respect to translations and rotations of the polymer or permutations of like atoms or motifs, and generalizable to all cases within the chemical subspace.

A polymer fingerprint could therefore be a count of the number of different types of building blocks (e.g., the number of CH2 blocks, the number of C6H4 blocks, etc.; this is fingerprint MI), or the types of pairs of blocks (eg. CH2-NH pairs, CS-O pairs, etc.; this is fingerprint MII), or the types of triplets of blocks (CH2-NH-CO triplets, C4H2S-C6H4-CS triplets, etc.; this is fingerprint MIII) in the polymer [14]. Any such fingerprint would be normalized by the total number of blocks in the repeat unit to make it generalizable to polymers with any number of blocks in the repeat unit. With this prescription, the fingerprint for any given n-block polymer is populated by assigning a certain score to every block or pair of blocks or triplet of blocks that is encountered, with the counting done from either end of the polymer repeat unit to take periodicity and inversion into account.

Figure 11. Correlations between fingerprint components and the 4 properties.

The degree of linear correlations between the different components of fingerprint MI and 4 properties (εelec, εionic, εtotal and Egap) are shown in Figure 11a, and similar correlations between fingerprint MII and the properties are shown in Figure 11b. The opposite behavior of εelec and Egap can be ascertained by observing their respective plots in each case: CH2 and O blocks, and CH2-CH2 and CH2-O pairs, make notable positive contributions to Egap and negative contributions to εelec, whereas C4H2S and CS blocks and their respective pairs contribute positively to εelec and negatively to Egap. The same effects largely translate to εtotal as well while for εionic, CO and NH blocks and NH-CO, NH-CS and CO-O pairs make the most positive contributions [14]. Based on this knowledge, it is possible to come up with combinations of different kinds of blocks and block pairs in polymers targeted towards increasing the dielectric constant or the bandgap or indeed, both.

4.4 On-demand prediction and design models for organic polymers

The polymer fingerprints can be applied on the database of dielectric polymers and their properties to develop statistical learning models, so they can replace all future calculations. I map the organic polymer fingerprints to their calculated dielectric constants and bandgaps using Kernel Ridge Regression (KRR, described in Section 3.2). A prediction model is trained, tested and validated, following which it is combined with a genetic algorithm to design polymers with targeted properties.

After testing with the three fingerprints, MIII was selected as the optimal fingerprint to use for regression. The dataset of 284 polymers was split 90/10 into a training set (used for training the KRR model) and a test set (used for testing the performance of the trained KRR model). Figure 12a shows the KRR predicted εelec, εion and Egap plotted against the respective DFT computed values; there is an impressive closeness between the two, and we have an instant prediction model in our hands that given a polymer fingerprint will give as output its εelec, εion and Egap values [14].

Figure 12. (a) Prediction model performances for the three properties. (b) Predictions for all 6-block

and 8-block polymers using the prediction models.

The prediction model is validated by comparing with DFT results as well as available laboratory measurements for some polymers, as shown in Figure 13a. I selected a series of 8-block polymers, performed structure prediction calculations for them and determined their properties, which were seen to match quite well with the KRR predicted values. This proves the true out-of-sample predictability

of the model, given that purely 4-block polymers were used for training. Further, I selected a few polymers with available experimental and DFT results and again as seen in Figure 13a, there is impressive closeness with the KRR predictions [14].

Thus, the prediction model can be applied in exploring newer areas of the polymer chemical space, that is, higher block polymers. Figure 12b shows the predictions of εtot plotted vs the predictions for Egap for ~ 6000 6-block polymers and ~200000 8-block polymers, which were all listed down and fingerprinted in a strategy called enumeration. Also plotted for reference here are the DFT computed properties of 4-block polymers, showing how it was possible from merely a few hundred DFT obtained results to several thousands of results using the prediction model [14].

Figure 13. (a) DFT and experimental validation of the KRR prediction models. (b) The speed-up in using the direct design approach upon exponential increase in polymer possibilities.

Next, I combined the prediction model with a genetic algorithm to develop a direct design model for the organic polymer chemical space: this method follows the steps explained in Section 3.3. Given a target dielectric constant and bandgap, a randomly chosen set of n-block polymers undergoes evolution to yield optimal polymers that would show the target properties based on the instant prediction model. While enumeration can possibly be used to cover the entire expanse of the chemical space, the direct design model is of great value because of the exponential explosion in total polymer possibilities as we move to higher block polymers. Figure 13b shows how the total n-block polymer possibilities increase as n goes from 8 to 12, and how the percentage of points explored by the genetic algorithm implemented in the direct design scheme gradually goes down. This means that no matter how high the total number of possible polymers, the direct design scheme will obtain the optimal polymer(s) for some target properties by traversing a small percentage of points [14]. Thus, for very large polymer systems, the direct design strategy will provide us with the desired polymers significantly quicker than enumeration would.

5. PROPOSED WORK

5.1 Learning from a large polymer dataset

While reliable learning models have been developed for purely organic polymers based on 7 basic building blocks, it will be very valuable to have similar models for the much more substantial dataset shown in Figure 10. This set contains an eclectic mix of polymers, from purely organic to organometallics based on a variety of metals, with the organometallics clearly occupying a higher dielectric constant region compared to the organics for a given bandgap. The essential ingredients of such an exercise would be, as explained in Section 4.3, the generation of polymer fingerprints and the application of machine learning algorithms.

Figure 10 also highlights, to some extent, the effect of having different metal atoms in the polymer backbone- for instance, Sn-halide and Zn based polymers clearly show the highest dielectric constants. Further, Figure 9b and Figure 9c illustrate how different kinds of crystal structures affect the properties of polymers containing any given metal atom, Sn in this case. The metal coordination environment and the metal concentration would also have pronounced effects; I try to capture all these contributing factors in Figure 14.

Figure 14. Dependence of dielectric constant on metal identity, metal concentration (proportional to the circle size) and coordination number.

Based on all these factors, the following universal polymer fingerprint can be proposed:

POLYMERS

MI / MII / MIII

Coordination Number

Structure Type (Intra/Hybrid/Inter)

Organic Sn based Ti based Zn based

SnCl2 based … …

The fingerprints MI, MII or MIII, which quantify respectively the type of blocks, the type of block pairs and the type of block triplets as explained in Section 4.3, will take care of the population of metal containing blocks in the polymer, thus quantifying both the metal identity and the metal concentration. The last two columns take care of the metal coordination and the type of crystal structure. I shall map this fingerprint to the polymer properties in order to obtain correlations such as in Figure 11 and prediction models such as in Figure 12, but this time for a significantly larger database. This can further assist the accelerated predictions and design of newer, more promising organometallic polymers.

5.2 Chemical Space Expansion

Figure 15. The metals considered (in red) and to be considered (in green) in the polymer backbone. As Figure 10 clearly demonstrates the benefit of using organometallic polymers, the natural next extensions of this work include not only a deeper analysis of the organometallics, but an expansion of the polymer chemical space with the consideration of a number of new metals in the polymer backbone. I plan to scan across the metals in the periodic table and study them in some given coordination environments (for example, in the same environments adopted by Sn atoms in the Sn-

ester based polymers), which simultaneously enables us to learn the effect of the metal itself as well as the environment on the polymer properties. Further, I plan to uniformly change the amount of metal in the polymer system as well to study its effect (similar to uniformly increasing the number of CH2 units in the Sn-ester polymer chain). Figure 15 shows the periodic table, with the metals already considered shown with green ticks (shown in the legend of Figure 10) and metals I shall consider shown with blue stars. The idea is to significantly enhance the (already substantial) polymer database with new computations on not only new choices of metal atoms, but also the metals previously considered, but in newer chemical environments. 5.3 New ML techniques applied on dielectric polymer data

The carefully created, controlled organic polymer dataset can be further exploited to test other useful machine learning models. The testing of such techniques on this sample dataset would be with the idea of eventually applying them on similar, larger datasets (such as all the data shown in Figure 10), thus establishing a general philosophy of prediction, design and learning from materials data. I will apply the following machine learning techniques on the polymer database- Gaussian Process Regression (GPR) This is an alternative regression algorithm to KRR, which has been used extensively in the materials modeling community in recent times. In GPR, some prior knowledge about the nature of the relationship between the fingerprint (regression input) and the property (regression output) is incorporated into the system in terms of an a priori probability distribution over all possible values of the property. This prior knowledge is combined with the training data to obtain a nonlinear regression data fit, with a Gaussian kernel used similar to the KRR method. GPR would then yield not only the averaged prediction values for new data points, but also an a posteriori probability distribution over all the possible property values. This means that not only can I get prediction models as before, but the uncertainties in every prediction can be estimated as well, which can be called σ. Every prediction made on a test data point will thus be Ypred ± σ, which is a better and more reliable error estimate on the prediction than using the root mean squared errors coefficients of determination. Efficient Global Optimization (EGO) An important problem in machine learning is selecting the optimal training dataset, or the minimum number of necessary calculations or experiments needed to develop learning models from. In other words, given a starting set of points one may have, what would be the next best system to obtain data on, that would immediately improve the learning model? I plan to use a technique called Efficient Global Optimization (EGO) for this, by applying it on the organic polymer database as a test case. Since a regression algorithm such as GPR gives a probability distribution along with the predicted property values for new systems, an ‘expected improvement’ can be computed that helps identify the point that is the least predictable at the moment, and should thus be added into the training data for improving the predictability of the model. This essentially means identifying the next system that one needs to do calculations/experiments on, given a certain available dataset. I will use this technique to

go from a limited number of polymer data points to the optimal training dataset in an iterative manner, where every new point is selected based on the EGO formalism. Multi-Objective Optimization (MOO) The problem of maximizing the dielectric constant and bandgap of polymers simultaneously is, in mathematical terms, a multi-objective optimization (MOO) problem. The set of polymers with the best possible combinations of both properties are said to lie on the Pareto front- the line that ultimately restricts the increase in one property with increase in the other (as indicated in Figure 8). It would be very useful to model the ‘front’ in order to determine newer polymers that would lie on it. Given the organic polymer dataset and the on-demand property prediction models, I plan to use a MOO technique wherein a random archive of n-block polymers will be modified with respect to the polymer constituent blocks, and the Pareto front will be gradually populated on the basis of a ‘dominance function’ defined in terms of the properties. This function is a mathematical relationship that quantifies a point as a Pareto point or not based on all other points. In this manner, I will perform 2-objective optimization (εelec and Egap being the objectives) and 3-objective optimization (εelec, εtotal

and Egap being the objectives). This would be an efficient way of increasing the population of the polymer Pareto front, without having to traverse through thousands of newer points.

5.4 Development of a uniform, comprehensive Polymer Database

Figure 16. Steps towards building a computational polymer database.

All calculations that have been performed on organic and organometallic polymers (yielding stable structure, energies, dielectric constants and bandgaps) contribute directly towards enhancing a comprehensive, extremely useful polymer database [35]. The steps involved in the generation of such a database have been shown in Figure 16. Such a database can be added to from time to time, and enables us to instantly look up desirable systems and avoid future calculations on previously studied materials, as well as mine the data for further learning. An initial version of the database is already on the web [36], and will be improved upon subsequently.

SUMMARY and IMPACT

• A comprehensive computational DATABASE has been generated that contains information about the repeat units, stable crystal structures, the (electronic and ionic) dielectric constants and the electronic band gaps of a substantial number of organic and organometallic polymers.

• Correlations have been made between simple polymer features, such as the types of constituent blocks, and the properties of interest, resulting in crucial design rules for the given polymer chemical space.

• Reliable ‘on-demand’ property prediction and polymer design models have been developed, such that any new polymer in the chemical space can be fingerprinted and its properties can be instantly known, and any desired set of properties can be correlated to specific polymer(s).

• Computational guidance to experiments has led to the successful synthesis, characterization and electrical testing of a number of new polymers in the following polymer subclasses: polyurea, polythiourea, polyimide and organo-Sn [12,18,19,20,37,38].

• High measured breakdown strengths, low dielectric losses and good film formability seen in some of the synthesized polythioureas and polyimides [20,37] already makes them strong candidates for replacing BOPP in high energy density capacitor applications.

REFERENCES

1. H. S. Nalwa, Ed., Handbook of Low and High Dielectric Constant Materials and Their Applications, Vol. 2, (Academic Press, New York, 1999).

2. J. Ho, R. Ramprasad, S. Boggs, IEEE Trans. Dielectr. Electr. Insul. 14, 1295 (2007). 3. M. Rabuffi, G. Picci, IEEE Trans. Plasma Sci. 30, 1939 (2002). 4. N. Tu, K. Kao, J. Appl. Phys. 85, 7267 (1997). 5. E. J. Barshaw, J. White, M. J. Chait, J. B. Cornette, J. Bustamante, F. Folli, D. Biltchick, G.

Borelli, G. Picci, M. Rabuffi, IEEE Trans. Magn. 43, 223 (2007). 6. J. H. Tortai, N. Bonifaci, A. Denat, C. J. Trassy, Appl. Phys. 97, 053304 (2005). 7. L. Yang, J. Ho, E. Allahyarov, R. Mu, L. Zhu, ACS Appl. Mater. Interfaces 2015;7:19894-

905 8. Q.M. Zhang, V. Bharti, X. Zhao, Science 1998;Science:2101-4 9. W. Li, L. Jiang, X. Zhang, Y. Shen, C.W. Nan, J. Mater. Chem. A 2014;2:15803-7 10. Jiang L., Li W., Zhu J., Huo X., Luo L, and Zhu Y., Appl. Phys. Lett. 2015;106:052901

11. Zhang S., Zou C., Kushner D.I., Zhou X., Orchard R.J., Zhang N. et al., IEEE Trans. Dielectr. Electr. Insul. 2012;19:1158-66

12. Sharma V., Wang C.C., Lorenzini R.G., Ma R., Zhu Q., Sinkovits D.W. et al., Rational design of all organic polymer dielectrics. Nature Communications 2014;5:4845

13. Wang C.C., Pilanina G., Boggs S., Kumar S., Breneman C., and Ramprasad R., Computational strategies for polymer dielectrics design. Polymer 2014;55:979

14. Mannodi-Kanakkithodi A., Pilania G., Huan T.D., Lookman T., Ramprasad R., Informatics-Driven Strategy for the Accelerated Design of Polymer Dielectrics, Nature Scientific Reports (under review) 2015

15. Mannodi-Kanakkithodi A., Wang C.C., and Ramprasad R., Compounds based on Group 14 elements: building blocks for advanced insulator dielectrics design. J. Mater. Sci. 2015;50:801

16. Wang C.C., Pilania G., and Ramprasad R., Dielectric properties of carbon-, silicon-, and germanium-based polymers: A first-principles study. Phys. Rev. B 2013;87:035103

17. Pilania G., Wang C.C., Wu K., Sukumar N., Breneman C., Sotzing G. et al., New Group IV Chemical Motifs for Improved Dielectric Permittivity of Polyethylene. J. Chem. Inf. Model. 2013;53:879–886

18. Baldwin A.F., Ma R., Mannodi-Kanakkithodi A., Huan T.D., Wang C.C., Tefferi M. et al., Poly(dimethyltin glutarate) as a Prospective Material for High Dielectric Applications. Adv. Mater. 2015;27:346

19. Baldwin A.F., Huan T.D., Ma R., Mannodi-Kanakkithodi A., Tefferi M., Katz N. et al., Rational Design of Organotin Polyesters. Macromolecules 2015;48:2422-2428

20. Ma R., Sharma V., Baldwin A.F., Tefferi M., Offenbach I., Cakmak M. et al., Rational design and synthesis of polythioureas as capacitor dielectrics. J. Mater. Chem. A 2015;3:14845

21. A. Mannodi-Kanakkithodi, G. Treich, T.D. Huan et al., Rational Co-Design of Polymer Dielectrics for Energy Storage, Advanced Materials Progress Report (under review)

22. Jain, A., Hautier, G., Moore, C.J., Ong, S.P., Fischer, C.C., Mueller, T., Persson, K. A., & Ceder, G. Comp. Mat. Sc. 50, 8 (2011).

23. Strasser, P., Fan, Q., Devenney, M., Weinberg, W. H., Liu, P., & Norskov, J. K. J. Phys. Chem. B 40, 107 (2003).

24. Greeley, J., Jaramillo, T. F., Bonde, J., Chorkendorff, I. & Norskov, J. K. Nat. Mater. 5, 11 (2006).

25. Goedecker S., J. Chem. Phys. 2004;120:9911 26. Amsler M. and Goedecker S., J. Chem. Phys. 2010;133:224104 27. Baroni, S., de Gironcoli, S. & Corso, A. D. Rev. Mod. Phys. 73, 515 (2001). 28. Bernardini, F. & Fiorentini, V. Phys. Rev. Lett. 79, 04523 (1997). 29. Zhao, X. & Vanderbilt, D. Phys. Rev. B 65, 075105 (2002). 30. Heyd, J., Scuseria, G. E. & Ernzerhof, M. J. Chem. Phys. 124, 219906 (2006). 31. Vu, K., Snyder, J., Li, L., Rupp, M., Chen, B. F., Khelif, T., Muller, K. & Burke, K. Int. J.

Quantum Chem. 115, 16 (2015). 32. Pilania G., Wang C.C., Jiang X., Rajasekaran S., and Ramprasad R., Sci. Rep. 2013;3:2810. 33. Huan T.D., Mannodi-Kanakkithodi A., and Ramprasad R., Accelerated materials property

predictions and design using motif-based fingerprints. Phys. Rev. B 2015;92:014106

34. Jain, A., Castelli, I. E., Hautier, G., Bailey, D. H. & Jacobsen, K. W. J. Mat. Sc. 48, 19 (2013).

35. Huan T.D., Mannodi-Kanakkithodi A., Kim C., Sharma V., Pilania G., and Ramprasad R., A comprehensive polymer dataset for accelerated property prediction and design. Sci. Data (under review) 2015

36. Khazana. [Online]. http://khazana.uconn.edu/ 37. Ma R., Baldwin A.F., Wang C.C., Offenbach I., Cakmak M., Ramprasad R. et al., Rationally

Designed Polyimides for High-Energy Density Capacitor Applications. ACS Appl. Mater. Interfaces 2014;6:10445

38. Lorenzini R.G., Kline W.M., Wang C.C., Ramprasad R., and Sotzing G.A., The rational design of polyurea & polyurethane dielectric materials. Polymer 2013;54:3529