Hansen, Jacob

## Machine Learning with Scattering Transforms

Faculty Mentor: Gus Hart, Physics and Astronomy

### Introduction

Our goal was to implement scattering transforms as a mathematical representation of materials.

The intention of this project was to build intuition on this technique using model data in one and

two dimensions. The tools created here will be used as templates in further projects on real

materials data. The intuition built during this project is crucial to the machine learning

framework for materials design that we hope to build in the near future.

### Methodology

The first few months of the project were spent understanding the underlying components of

scattering transforms to validate that they would work as intended in a materials setting. A major

component of scattering transforms are wavelets. Most of this researching time was spent

reading and reviewing literature on scattering transforms and wavelets. Wavelets are important

because wavelet transforms are the primary driving force of scattering transforms. By spending

time understanding wavelet transforms we are more confident that we applied scattering

transforms in the correct way to our data. In short, wavelet transforms do not possess the desired

attributes that would help us use machine learning in materials space but they enable scattering

transforms which do. These features include isometry invariance as well as deformation stability.

Because scattering transforms use wavelet transforms, one of the first obstacles was to decide which type of wavelet was the most appropriate for scattering transforms in materials. Despite the existence of many different types of wavelets, in our tests we use Morlet wavelets. Morelet wavelets are a complex exponential windowed by a Gaussian function. Morlet wavelets are attractive because they are continuous, differentiable and they capture the features of atomic densities better than other types of wavelets. This means that they are more suited to mapping atoms than other wavelets.

Once we decided which type of wavelet to use in our scattering transforms we needed data that we could use to test and validate our scattering transforms and our machine leaning on. For test data we used the Lennard- Jones potential to create chains of atoms that are spaced such that they are in their lowest energy state. This is favorable because nature tends to favor lower energies. These chains consist of any number of atoms of 2 types. The chain pictured in figure 1 has 10 atoms. In practice we use similar chains that include hundreds of atoms. Once we have the positioning we fit them with a Gaussian surface to mimic the atomic densities of real atoms as in figure 2. As it creates these chains the code calculates a corresponding energy for each chain. Figure 3

The code that produces the test data was written to be modular and has the ability to create 1D as

well as 2D atom chains. It also has functionality built in to allow the mapping of any binary atom

system. In our calculations to create the data set, we used toy values as the parameters for the

atoms in the chains. This allowed for faster calculations as well as cleaner data and easier testing.

The modular structure with which the data was constructed was built such that we can change

these parameters in the future to match well known binary atom chains.

With a good set of model data to test with, the next step was to write code that could take the

scattering transform of each atom chain. At its core the scattering transform code is a

multilayered wavelet transform from which we save some critical information. Figure 3 is a

basic diagram of how the scattering works. The large black circle represents the original data.

The orange arrows represent math that we do to get to the data that we need. The orange circles are the

scattering coefficients. This is the information that we save and later give to the computer for machine learning. The black arrows represent wavelet transforms that are methodically scaled to cover the entire original function. Each black circle in the lower tiers represent a different scale. We again save the scattering coefficients and repeat the process until the

desired resolution is achieved.

Our goal is to give the computer a set of these atom chains with their scattering coefficients and

then ask the computer to predict what the energy of the chain should be. This is done by creating

a large data set of hundreds of chains of atoms and allowing the computer to learn on a subset of

that data. This subset is called the training set. Once the computer has been trained, it is then

asked to predict the energies of chains that it hasn’t seen yet. These predicted energies are then

compared to the energies that were associated with each chain as we created them. This allows

us to see how well the computer is learning our data

### Results/Discussion

Over the course of the project we designed and tested several tools including all of the

components necessary to machine learn with scattering transforms in 2D. Currently we have

created tools that allow us to find the scattering coefficients of 1D atom chains. We have also

written basic machine learning code that splits large data into training and test sets. This code

currently allows a basic regression with random data. We are working on configuring this code

to accept the scattering coefficients that our scattering code produces. We also created a modular

data set that we can use to validate that our tools work in 1D and 2D. The next step is to couple

these pieces and to use them to do machine learning on our data set in both 1D and 2D. We will

then tune the input parameters to better represent the physical world.

### Conclusion

Although this project has taken much longer than expected, we have made solid steps towards

having a functional 2D model of machine learning with scattering transforms. Overall this

project has helped build much of the intuition that is needed to expand this technique to real data

thus producing real steps forward in the materials science community.