Progress in the variational method for QFT

Leave a reply

Today I put online a major update of my pair of papers on the variational method for quantum field theory (short here, long here). The idea is still to use the same class of variational wave functions (relativistic continuous matrix product states) to find the ground state of (so far bosonic) quantum field theories in 1+1 dimensions. The novelty comes from the algorithm I now use to compute expectation values, that has a cost only proportional to $D^3$ where $D$ is the bond dimension. Using backpropagation techniques, the cost of computing the gradient of observables with respect to the parameters is also only $D^3$ . This is basically the same asymptotic scaling as standard continuous matrix product states.

Previously, the method was a nice theoretical advance as it worked without any cutoff, but it was not numerically competitive compared to bruteforce discretization + standard tensor methods (at least not competitive for most observables insensitive to the UV). With the new algorithm with improved scaling, I can go fairly easily from $D=9$ to $D=32$ , which gives $\simeq 10^{-5}$ relative error for the renormalized energy density at a coupling of order 1. Crucially, the error really seems to decrease exponentially as a function of the bond dimension, and thus with only a slightly higher numerical effort (say 100 times more) one could probably get close to machine precision. Already at $10^{-5}$ relative error for the renormalized energy density, that is a quantity where the leading lattice contribution has been subtracted, I doubt methods relying on a discretization can compete. This makes me a bit more confident that methods working directly in the continuum are a promising way forward even if only for numerics.

Energy density and relative error for $\phi^4$ theory with RCMPS. RHT is the state-of-the-art renormalized Hamiltonian truncation result which is manifestly less precise

Now, let me go a bit more into the technicalities. Computing expectation values of local functions of the field $\phi$ with such a low cost $D^3$ seems difficult at first because the ansatz is not written in terms of local functions of $\phi$ . Naively this should at the very least square the cost to $D^6$ . The main idea to obtain the cheap scaling is to realize that the expectation value of vertex operators, i.e. operators of the form $: \exp( \alpha \phi):$ , can be computed by solving an ordinary differential equation (ODE) where the generator has a cost $D^3$ . Basically, computing vertex operators for relativistic CMPS is as expensive as computing field expectation values for standard non-translation-invariant CMPS. To solve this ODE, one can use powerful method with extremely quickly decaying errors as a function of the discretization step (e.g. very high order Runge-Kutta). So vertex operators are, in fact cheap. But local powers of the field are merely differentials of vertex operators, and thus can be computed as well for the same cost. Finally, to get the gradient, one can differentiate through the ODE with backpropagation, and obtain the result for only twice the cost. This allows the full variational optimization for all well-defined bosonic Hamiltonians with polynomial and exponential potentials.

Two recent articles on tensor networks: Monte-Carlo chimera and the phi^4 supremacy

Leave a reply

There are 2 recent preprints on tensor networks I found interesting.

The first is Collective Monte Carlo updates through tensor network renormalization (arxiv:2104.13264) by Miguel Frias-Perez, Michael Marien, David Perez Garcia, Mari Carmen Banuls and Sofyan Iblisdir. I am certainly a bit biased because two authors come from our group at MPQ, and I heard about the result twice in seminars, but I think it is a genuinely new and interesting use of tensor networks. The main idea is to reduce the rejection rate of Markov Chain Monte Carlo (MCMC) by crudely estimating the probability distribution to be sampled from with tensor network renormalization. This combines the advantage of the Monte Carlo method (it is exact asymptotically, but slow especially for frustrated systems) with the advantage of tensor renormalization (it is very fast for approximations).

A quick word of caution for purists, technically they use the tensor renormalization group (TRG), which is the simplest approximate contraction method for 2d tensor networks. Tensor network renormalization (TNR) can refer to a specific (and more subtle) renormalization method introduced by Evenbly and Vidal (arxiv:1412.0732) that is in fact closer to what the Wilsonian RG does. This is just a matter of terminology.

Terminology aside, it seems the hybrid Monte Carlo method they obtain is dramatically faster than standard algorithms, at least in number of steps to thermalize the Markov chain (which remains of order 1 even for reasonably low bond dimensions). These steps are however necessarily more costly than for typical methods because sampling from the approximate distribution generated with TRG gets increasingly expensive as the bond dimension increases. As a result, for a large frustrated system, it is not clear to me how much faster the method can be in true computing time. I think in the future it would be really interesting to have a benchmark to see if the method can crush the most competitive Markov chain heuristics on genuine hard problems, like spin glasses or hard spheres for example. It is known that TRG struggles to get precise results for such frustrated problems, and so the approximation will surely degrade. But perhaps there is a sweet spot where the crude approximation with TRG (of high but manageable bond dimension) is sufficient to have a rejection rate in the MCMC at 10^(-4) instead of 10^(-15) for other methods in a hard-to-sample phase.

The second one is Entanglement scaling for $\lambda\phi^4_2$ by Bram Vanhecke, Frank Verstraete, and Karel Van Acoleyen. I am of course interested because it is on $\lambda \phi^4$ , my favorite toy theory of relativistic quantum field theory. There are many ways to solve this non-trivial theory numerically, and I really find them all interesting. Recently, I explored ways to work straight in the continuum, which is the most robust or rigorous perhaps, but so far not the most efficient for most observables.

So far, the most efficient method is to discretize the model, use the most powerful lattice tensor techniques to solve it, and then extrapolate the results to the continuum limit. The extrapolation step is crucial. Indeed, such relativistic field theories are free conformal field theories at short distance. This implies that for a fixed error, the bond dimension of the tensor representation (and thus the computational cost) explodes as the lattice spacing goes to zero. One can take the lattice spacing small, but not arbitrarily small.

With Clément Delcamp, we explored this strategy in the most naive way. We had used a very powerful and to some extent subtle algorithm (tensor network renormalization with graph independent local truncation, aka GILT-TNR) to solve the lattice theory. We pushed the bond dimension to the max that could run on our cluster, plotted 10 points and extrapolated. At the smallest lattice spacing, we were already very close to the real continuum theory, and a fairly easy 1-dimensional fit with 3 parameters gave us accurate continuum limit results. A crucial parameter for $\lambda \phi^4$ theory is the value of the critical coupling $f_c$ , and our estimate was $f_c = 11.0861(90)$ . We estimated it naively: by putting our lattice models exactly criticality, and extrapolating the line of results to the continuum limit. This was in 2020, and back then this was the best estimate. It was the best because of the exceptional accuracy of GILT-TNR, which allowed us to simulate well systems even near criticality, and allowed us to go closer to the continuum limit than Monte Carlo people.

Extrapolation of the critical coupling as the lattice spacing (lambda is a proxy for it here) goes to zero in our paper with Clément.

Bram Vanhecke and collaborators follow a philosophically different strategy. They use an arguably simpler method to solve the lattice theory, boundary matrix product states. Compared to GILT-TNR, this method is simpler but a priori less efficient to solve an exactly critical problem. But, spoiler, they ultimately get better estimates of the critical coupling than we did. How is that possible? Instead of simulating exactly the critical theory with the highest possible bond dimensions, the authors simulate many points away from criticality and with different bond dimension (see image below). Then they use an hypothesis about how the results should scale as a function of the bond dimension and distance from criticality. This allows them to fit the whole theory manifold around the critical point without ever simulating it. This approach is much more data driven in a way. It swallows the bullet that some form of extrapolation is going to be needed anyway and turns it into an advantage. Since the extrapolation is done with respect to more parameters (not just lattice spacing), the whole manifold of models is fitted instead of simply a critical line. This gives the fit more rigidity. By putting sufficiently many points, which are cheap to get since they are not exactly at criticality or with maximal D, one can get high precision estimates. They find $f_c = 11.09698(31)$ , which if their error estimate is correct, is about 1 and a half digits better than our earlier estimate (which is a posteriori confirmed, since the new result is about 1 standard deviation away from the mean we had proposed).

The points sampled by Bram Vanhecke et al around the critical line (also for different bond dimensions not shown)

Naturally, an extrapolation is only as good as the hypothesis one has for the functional form of the fit. There, I like to hope that our earlier paper with Clément helped. By looking at the critical scaling as a function of the lattice spacing, and going closer to the continuum limit than before, we could see that the best fit clearly contained logarithms and was not just a very high order polynomial (see above). I think it was the first time one could be 100 % sure that such non-obvious log corrections existed (previous Monte Carlo papers had ruled it out after suspecting it), the $\chi^2$ left no doubt. Interestingly, Bram and his Ghent collaborators had tried their smart extrapolations techniques back in 2019 (arxiv:1907.08603) but their ansatz for the functional form of the critical coupling as a function of the lattice spacing was polynomial. As a result their estimates looked precise, between 11.06 and 11.07, but were in fact underestimating the true value (like almost all papers in the literature did).

Could this scaling have been found theoretically? Perhaps. In this new paper, Vanhecke and collaborators provide a perturbative justification. My understanding (which may be updated) is that such a derivation is only a hint of the functional form because the critical point is deep in the non-perturbative regime and thus all diagrams contribute with a comparable weight. For example, individual log contributions from infinitely many diagrams could get re-summed into polynomial ones, e.g. $\sum_k \log(\lambda)^k/k! = \lambda$ non-perturbatively. In any case, I think (or rather hope…) that precise but “less extrapolated” methods will remain complementary and even guide the hypothesis for more data driven estimates.

In the end, congratulations to Bram, Frank, and Karel for holding the critical $\phi^4_2$ crown now. I will dearly miss it, and the bar is now so high that I am not sure we can contest it…

Variational method in field theory – videos

Leave a reply

A bit more than a month ago, I put on arxiv two preprints (short and long) that summarize my recent work on applying the variational method in relativistic quantum field theory. I am happy that they were (so far) well received, and I got the chance to present the corresponding results in a few seminars already. The last two, at the University of Helsinki and EPFL, were recorded. I had a better microphone at EPFL, and expanded more on tensor networks, so I embed the video below (or direct link here).

I skipped the introduction to the basics of relativistic quantum field theory at EPFL and so you may prefer the Helsinki recording if this is not well known to you (as I was a basically two years ago). As the typical Frenchman, I say “eeeeuuuh” a lot when presenting, which makes the whole experience particularly awful for people listening. Now that I am painfully aware of it, I will try to work on it ifor future presentations…

Random news

Leave a reply

The preprint on subcritical reactors I had written about two months ago has finally appeared on arxiv. Apparently it was “on hold” for that long because the primary category (applied physics) was not the right one, and the moderators thought “instrumentation and detectors” was better. Honestly, I am not sure they are right, given how theoretical the paper is, but I am just glad the paper appeared (I would have just preferred it to be faster).

I also have one article written with Vincent Vennin and one interview by Philippe Pajot in the latest edition of La Recherche. Both are, of course, in French. This edition, with a new format, features many interesting articles, in particular an interview of the president of the Max Planck Society.

Sustaining the chain reaction without criticality

Leave a reply

I have found a neat nuclear physics problem, and have written a draft of article about it. Below I explain how I came to be interested in this problem, give some context most physicists may not be familiar with, and explain the result briefly.

A while back, I took a small part in the organization of the French public debate on radioactive waste management. My work was not very technical, but got me interested in the rich physics involved. The debate itself didn’t allow me to do real research on the subject: it was certainly not what was asked from me, and there was already so much to learn about the practical details.

After the debate ended, I kept on reading about the subject on the side. In particular, I read more on advanced reactor designs, where I found a neat theoretical question which I believe was unanswered.

Continue reading →

The sound of quantum jumps

Leave a reply

Before I finally go on holidays, I put on arxiv an essay on quantum jumps, in fact rather on collapse models, that I initially submitted to the FQXI essay contest.

In this essay/paper, I just make a simple point which I have made orally for years at conferences. Every time, people looked quite surprised and so I thought it made sense to write it down.

The argument is simple enough that I can try to reproduce it here. Collapse models are stochastic non-linear modifications of the Schrödinger equation. The modification is meant to solve the measurement problem. The measurement problem is the fact that in ordinary quantum mechanics, what happens in measurement situations is postulated rather than derived from the dynamics. This is a real problem (contrary to what some may say), the dynamics should say what can be measured and how, it makes no sense to have an independent axiom (and it could bring contradictions). Decoherence explains why the measurement problem does not bring contradictions for all practical purposes, but again contrary to what some may say, it certainly does not solve the measurement problem. So the measurement problem is a real problem and collapse models provide a solution that works.

The stochastic non-linearity brought by collapse models creates minor deviations from the standard quantum mechanical predictions (it makes sense, the dynamics has been modified). This is often seen, paradoxically perhaps, as a good thing, because it makes the approach falsifiable. It is true that collapse models are falsifiable. What is not true, is that collapse models modify the predictions of quantum mechanics understand broadly. This is what is more surprising, sometimes seems contradictory with the previous point, and is the subject of my essay.

How is it possible? Collapse models are non-linear and stochastic, surely ordinary quantum mechanics cannot reproduce that? But in fact it can. As was understood when collapse models were constructed in the eighties, the non-linearity of collapse models, which is useful to solve the measurement problem, has to vanish upon averaging the randomness away. Since we have no a priori access to this randomness, all the things we can measure in practice can be deduced from linear equations, even in the context of collapse models. This linear equation is not the Schrödinger equation, but one that does not preserve purity, the Lindblad equation. However, it is also known that by enlarging the Hilbert space (essentially assuming hidden particles), Lindblad dynamics can be reproduced by Schrödinger dynamics. Hence, the predictions of collapse models can always be reproduced exactly by a purely quantum theory (linear and deterministic) at the price enlarging the Hilbert space with extra degrees of freedom. Collapse models do not deviate from quantum theory, they deviate from the Standard Model of particle physics, which is an instantiation of quantum theory. Even if experiments showed precisely the kind of deviations predicted by collapse models, one could still defend orthodox quantum mechanics (not that it would necessarily be advisable to do so).

Collapse models are still useful in that they solve the measurement problem, which is an ontological problem (what the theory says the world is like or what the world is made of). However, the empirical content of collapse models (what the theory predicts) is less singular that one might think. In the essay, I essentially make this point in a more precise way, and illustrate it on what I believe is the most shocking example, the sound of quantum jumps, borrowed from a paper by Feldmann and Tumulka. I doesn’t make sense to write more here since I will end up paraphrasing the essay, but I encourage whoever is interested to read it here.

The skyscraper and pile of dirt approaches to QFT

2 Replies

Quantum field theory (QFT) is the main tool we use to understand the fundamental particles and their interactions. It also appears in the context of condensed matter physics, as an effective description. But it is unfortunately also a notoriously difficult subject: first because it is tricky to define non-trivial instances rigorously (it’s not known for any one that exists in Nature), and also because even assuming it can be done, it is then very difficult to solve to extract accurate predictions.

There is a subset of QFTs where there is no difficulty: free QFTs. Free QFTs are easy because one can essentially define them in a non-rigorous way first, physicist style, then “solve” them exactly, and finally take the solution itself as a rigorous definition of what we actually meant in the first place. Then, to define the interacting theories, the historical solution has been to see them as perturbations of the free ones. This comes with well known problems: interacting theories are not as close to free ones as one would naively think, so the expansions one obtains are weird: they diverge term by term, and if the divergences are subtracted in a smart way (renormalization), the expansions still diverge as a whole.

Continue reading →

Bien plus qu’il n’en faut sur le cycle du combustible nucléaire et le taux de recyclage associé

Leave a reply

Les combustibles nucléaires usés issus des réacteurs électrogènes français sont recyclés. D’aucuns arguent que ce recyclage permet de récupérer 96% du combustible, l’industrie est très vertueuse et exemplaire ! D’autres disent au contraire que le recyclage ne permet qu’1% de réutilisation, le recyclage c’est du bullshit ! Qui a raison ?

Je trouve que cette controverse est un bon prétexte pour expliquer la physique associée, qui est intéressante. Mon objectif est d’expliquer en détails l’aval du cycle du combustible pour son intérêt propre, la résolution de la controverse étant ensuite un corollaire trivial. Au passage, c’est l’occasion d’apprendre un peu plus sur l’histoire civile et militaire du nucléaire, le principe d’un réacteur nucléaire, et les subtilités des différents isotopes de l’uranium et du plutonium.

Continue reading →

Nice quantum field theory videos

Leave a reply

These days I am trying to improve my understanding of quantum field theory with as little perturbation theory as possible. I came across videos from a workshop at IHES in Bures-sur-Yvette on Hamiltonian methods for QFT and videos from a semester at the Newton institute in Cambridge, which both happened about a year ago. Both events are quite well filmed (especially at IHES), most presentations are made on a blackboard, and most talks I checked were well explained and interesting so I definitely recommend them.

The workshop at IHES could have been called 50 shades of $\phi^4_2$ , since many talks try to find the critical point of the theory with more or less elaborate methods (8 loop perturbation theory, and various non-perturbative Hamiltonian methods). I recommend in particular the talk of Joan Elias Miro on renormalized Hamiltonian truncation methods, which I found very clear and interesting. There are also nice tensor network talks by the usual suspects (Mari Carmen Banuls, Frank Pollman, Guifre Vidal, Karen Van Acoleyen, Philippe Corboz). Finally there is an intriguing talk by Giuseppe Mussardo on the sinh-Gordon model.

The semester at the Newton Institute was clearly geared more towards mathematics, with important emphasis on modern probabilistic approaches, starting from the stochastic quantization of Euclidean field theories. The semester opens with 4 really amazing lectures by Antti Kupiainen on the renormalization group (supplemented by lecture notes). He works with Euclidean $\phi^4$ in all dimensions, on the lattice and in the continuum limit, and explains everything that can happen. He distinguishes very well the IR scaling limit and UV continuum limit problems, the various fixed point structures, the easy and hard problems, many issues which had always been quite confused in my mind. It’s a pleasure to listen to people who understand what they are doing. There is another talk, more like a work in progress, where Martin Hairer attempts the stochastic quantization of Yang-Mills (which starts from a quite original explanation of what a gauge theory is!). I have not had much time to check the other talks, but the whole program looks really interesting (with a lot of different ways to define rigorously $\phi^4_3$ ). I watch these while ironing my shirts, so I will know more at the next laundry.

Great work from friends

Leave a reply

My smart friends have been doing great work recently and I think it deserves attention.

I Understanding deep neural networks theoretically

Jonathan Donier, who now works for Spotify in London after a PhD in applied maths in Paris, has put a series of 3 fundamental articles on theoretical machine learning:

1) Capacity allocation analysis of neural networks: A tool for principled architecture design — arXiv:1902.04485
2) Capacity allocation through neural network layers — arXiv:1902.08572
3) Scaling up deep neural networks: a capacity allocation perspective — arXiv:1903.04455

In these papers, Jonathan defines and explores the notion of capacity allocation of a neural network, which formalizes the intuitive idea that some parts of a network encode more information about certain parts of the input space. The objective is to understand how a given architecture of network manages to capture the structure of correlations in the input. Ultimately, this should allow one to go beyond fuzzily grounded heuristics and expensive trial and error in order to design networks with a topology adapted to the problem right from the start.

Jonathan very progressively builds up the theory from basic definitions to non-trivial scaling prescriptions for deep networks. The first paper defines the capacity rigorously in the simplest settings and deals mostly with the linear case. The second one considers special non-linear settings where the capacity analysis can still be carried out exactly and where one gets insights about the decoupling role of non-linearity. The final one puts all the pieces together and, among other things, allows to rigorously recover many initialization prescriptions for deep networks that where known only from heuristics. This super quick summary does not do justice to the content: this series of papers is, in my opinion, a major advance in the theoretical understanding of deep neural networks.

II Making measurements crystal clear in Bohmian mechanics

Dustin Lazarovici, who is now a philosopher of physics in Lausanne after a PhD in mathematical physics in Munich, has put online a very clear paper explaining how position measurements work in Bohmian mechanics and what their relation with particle positions is.

Position Measurements and the Empirical Status of Particles in Bohmian Mechanics — arXiv:1903.04555

Dustin is perhaps one of the people who has the clearest mind on foundations and Bohmian mechanics in particular. The notion of measurement in Bohmian mechanics is usually so deeply misunderstood that Dustin’s concise explanation is a great reference for anyone interested in these questions. I particularly enjoyed the very end, where the link (or lack of link) with consciousness is precisely discussed. I think it exemplifies what useful work by philosophers of physics can be like: not muddling the water (as physicists usually think philosophers do) but sharpening the reasoning to save physicists from their own confusion.

III Popularizing tricky mathematical notions

Antoine Bourget, who is now a postdoc at Imperial College in London, after a postdoc in Oviedo and a PhD at ENS in Paris (in the same office as me), has put a series of pedagogical videos on Youtube, through his account Scientia Egregia.

The videos are in French, and I recommend in particular the dictionnaire entre algèbre et géométrie. Antoine starts with many simple examples to show the subtleties and motivate the definitions. He explains very well how one constructs mathematical notions to fit a certain intuition, a certain purpose, and thereby manages to make “obvious” really non-trivial concepts. Go check his videos so that he gets pressure to make more.

Antoine Tilloy's research log

orthodox and unorthodox explorations of Quantum Theory

Progress in the variational method for QFT

Two recent articles on tensor networks: Monte-Carlo chimera and the phi^4 supremacy

Variational method in field theory – videos

Random news

Sustaining the chain reaction without criticality

The sound of quantum jumps

The skyscraper and pile of dirt approaches to QFT

Bien plus qu’il n’en faut sur le cycle du combustible nucléaire et le taux de recyclage associé

Nice quantum field theory videos

Great work from friends

I Understanding deep neural networks theoretically

II Making measurements crystal clear in Bohmian mechanics

III Popularizing tricky mathematical notions