Author Archives: Antoine Tilloy

Unknown's avatar

About Antoine Tilloy

postdoc at MPQ

Through two doors at once

I have really enjoyed Anil Ananthaswamy’s latest book: Through two doors at once: The Elegant Experiment That Captures the Enigma of Our Quantum Reality. It is very well written and one reads through it like a novel. But, most importantly, it gets the physics right, and the subtleties are not washed away with metaphors. Accurate and captivating, the book strikes a balance rarely reached in popular science books.

The foundations of quantum mechanics is a difficult branch of physics, and almost every narrative shortcut that was invented to convey its subtlety is, strictly speaking, a bit wrong. Further, foundations is an unfinished branch of physics: different group of experts disagree about what the main message of quantum mechanics is and what should be done to make progress in understanding. This makes it hard to popularize the subject without writing incorrect platitudes or pushing one orthodoxy.

Anil’s strategy is to use the simplest experiment illustrating quantum phenomena: the double slit experiment. He discusses the results and shows why they are so counter-intuitive. However, the simple double slit experiment is not enough to go to the bottom of the mystery. Anil thus very progressively refines the experimental setup to gradually add the subtleties that prevent naive stories from explaining away the weirdness of quantum theory. As in a police investigation, Anil interviews the experts of the main interpretations of quantum mechanics, and guides the reader through the explanations they give for each setup. The reader can then decide for herself which story she finds most appealing.

Crucially, I think the different interpretations are presented fairly. Anil does not take a side. I personally much prefer “non-romantic and realist” interpretations of quantum theory: I find accounts of the world where stuff simply moves, be it with non-local laws of motion, far more convincing than alternatives (where there are infinitely many worlds, or where “reality” has a subjective nature). The “realist” view is well represented in the book (which is rare, because it is not “hype”), but I was not annoyed by the thorough discussion of the other possibilities. More radical proponents of one or the other interpretation may however be annoyed by this attempted neutrality.

amazon_coverAnil’s writing style is very enjoyable. He does not make the all too common mistake of using cheap metaphors which are dangerous in the context of quantum mechanics where they provide a deceiving impression of depth and understanding. In this book, you actually learn something. Sure you do not become an expert in foundations, but you get an accurate sense of what motivates researchers in the field. This is both nice in itself, and if you want to keep on digging with a more specialized book. Even though I already knew the technical content of the book, I found the inquiry captivating. I definitely recommend Through two doors at once, especially to my friends and family who want to quickly yet genuinely understand the sorts of questions that drive me.

Disclaimer: I have provided minor help for the rereading of an almost finished draft of the book.

Survival bias and the non-empirical confirmation of physical theories

Survival Bias

Survivorship-bias

drawing by McGeddon

During World War II, the US military did statistics to see where its bombers got primarily damaged. The pattern looked like the picture on the right. The first intuition of the engineers was to reinforce the parts that were hit the most. Abraham Wald, a statistician, realized that the distribution of impacts was observed only for the airplanes that actually came back from combat. Those that were hit somewhere else could probably not even make it back home. Hence it is precisely where the planes seemed to be the least damaged that adding reinforcements was the most useful! This famous story illustrates the problem of survival bias (or survivorship bias).

Definition (Wikipedia): Survival bias is the logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not.

Survival bias is the reason why we tend to believe in planned obsolescence and more generally why we sometimes have the nostalgia of a golden age that never existed. “Back in the days, cars and refrigerators were reliable, unlike today! And back then, buildings were beautiful and lasted forever unlike the crap they construct today!”

But actually none of this is true. Most refrigerators from the sixties stopped working in a few years and the very few that still function today are just in the 0.1% that made it. The same goes for cars which are more reliable than they used to be: the vintage cars we see  around show an impressive number of kilometers, but only because they are part of the infinitesimal fraction that miraculously survived. Finally, most buildings in earlier centuries were poorly constructed, lacking both taste and resistance. Most of them collapsed or got destroyed and this is why new buildings now stand in place of them. The few old monuments that remain are still there precisely because they were particularly beautiful and well constructed for the time. More generally, the remnants of the past we see in our everyday life are not a fair sample of what life used to be. They are, with rare exceptions, the only things that were good enough to not be replaced.

survivorship_bias_2x

from the great xkcd

Survival bias can explain an impressively wide range of phenomena. For example, most Hedge Funds show stellar historical returns (even after fees) while investing in hedge funds is not profitable on average. This is easy to understand if hedge funds simply have random returns: the hedge funds that lose money after a period go bankrupt or have to downsize for lack of investors while hedge funds that made money survive and increase in size. The same bias explains why the tech success stories are often overrated and why it seems cats do not get more injured when they fall from a higher altitude (wikipedia).

This bias very often misleads us in our daily life. My worry is that it may also mislead us in our assessment of physical theories, especially when we lack experimental data. To understand why, I need to discuss the problem of the “non-empirical confirmation” of physical theories.

Non-empirical confirmation of physical theories

Physicists always use some form of non-empirical assessment of physical theories. Most theories never get the chance to be explicitly falsified experimentally and are just abandoned for non-empirical reasons: it is just impossible to make computations with them or they turn out to violate principles we thought should be universal. As the time between the invention of new physical theories and their possible experimental test widens, it becomes important to know more precisely what non-empirical reasons we use to temporarily trust theories. The current situation of String Theory, which predicts new physics that seems untestable in the foreseeable future, is a prime example of this need.

This is a legitimate question that motivated a conference in Munich about two years ago “Why trust a theory? Reconsidering scientific methodology in light of modern physics“, which was then actively discussed and reported on online (see e.g. Massimo PigliucciPeter Woit and Sabine Hossenfelder). Among the speakers was philosopher Richard Dawid, who has come up with a theory (or formalization) of non-empirical confirmation in Physics, notably in the book String Theory and the Scientific Method.

Dawid contends that physicists so far use primarily the following criteria to assess physical theories in the absence of empirical confirmation:

  • Elegance and beauty,
  • Gut feelings (or the gut feelings of famous people),
  • Mathematical fertility.

I think Dawid is unfortunately correct in this first analysis. The reasons why physicists momentarily trust theories before they can be empirically probed are largely subjective and sociological. This anecdote recalled by Alain Connes in an interview about 10 years ago is quite telling:

“How can it be that you attended the same talk in Chicago and you left before the end and now you really liked it. The guy was not a beginner and was in his forties, his answer was ‘Witten was seen reading your book in the library in Princeton’.”

Note that this does not mean that science is a mere social construct: this subjectivity only affects the transient regime when “anything goes”, before theories can be practically killed or vindicated by facts. Yet, it means there is at least room for improvement in this transient theory building phase.

Dawid puts forward 3 principles, which I will detail below, to more rigorously ground the assessment of physical theories in the absence of experimental data. Before going any further I have to clarify what we may expect from non-empirical confirmation. There is a weak form: we mean by non-empirical confirmation simply a small improvement in the fuzzily defined Bayesian prior we have that a theory will turn out to be correct. This is the uncontroversial understanding of non-empirical confirmation, but one that Dawid deems too weak. There is also a strong form, where “confirmation” is understood in its non-technical sense, that of definitely validating the theory without even requiring experimental evidence. This one, which some high energy theorists might sometimes foolishly defend, is manifestly too strong. Part of the controversy around non-empirical confirmation is that Dawid wants something stronger than the weak form (which would be trivial in his opinion) but weaker than the strong form (which would be simply wrong). However, because it is quite difficult to understand precisely where this sweet spot would lie, Dawid has often been caricatured as defending an unacceptably strong form of his theory.

What we may expect from non-empirical hints is an important question and I will come back to it later. Right now, I ask: can we find good guides to increase our chances to stay on the right track whilst experiments are still out of reach?

Dawid’s principles

  1. No Alternative Argument (aka “only game in town”):
    Physicists tried hard to find alternatives, they did not succeed.
    ~
  2. Meta-Inductive Argument:
    Theories with the same characteristics (obeying the same heuristic principles) proved successful in the past.
    ~
  3. Unexpected Explanatory Interconnections:
    The theory was developed to solve a problem but surprisingly solves other problems it was not meant to.

These principles are manifestly crafted with String Theory in mind for which they seem to fit perfectly. String Theory is not the only game in town but it is arguably more developed than the alternatives (and arguably more developed than some alternatives I find interesting). String Theory also fares well on the Meta Inductive Argument: it uses extensively the ideas and principles that made the success of previous theories, especially those of quantum field theory. In the course of the development of String Theory, a lot of unexpected interconnections also emerged. Many of them are internal to the theory: different formulations of String Theory actually seem to describe different limits of the same thing. But there are also unexpected byproducts: e.g. a theory constructed to deal with the strong nuclear force ends up containing gravitational physics.

At this stage, one may be tempted to nitpick and find good reasons why String Theory does not actually satisfy Dawid’s principles, possibly to defend one’s alternative theory. However, I am not sure this is a good line of defense and think it draws the attention away from the interesting question: independently of String Theory, are Dawid’s principles a good way to get closer to the truth?

Naive meta check

We may do a first meta check of Dawid’s principle, i.e. ask the question:

Would these principles have worked in the past?
or
Would they have guided us to what we now know are viable theories?

We may carry this meta check on the Standard Model of particle physics (an instance of Quantum Field Theory) and General Relativity.

At first sight, both theories fare pretty well. It seems that quantum field theory quickly became the main tool to describe fundamental particles while being the simplest extension of the principles that were previously successful (quantum mechanics and special relativity). Further, quantum field theoretic techniques unexpectedly applied to a wide range of problems including classical statistical mechanics. General relativity also seemed like it was the only game in town, minimally extending the earlier principles of special relativity introduced by Einstein. The question of the origins of the universe, which General Relativity did not primarily aim to answer, was also unexpectedly brought from metaphysics to the realm of physics. I chose simple examples, but it seems that for these two theories, there are plenty of details which fit into the 3 guiding principles proposed by Dawid. The latter look almost tailored to get the maximum number of points in the meta check game.

Fooled by survival bias

As convincing as it may seem, the previous meta check is essentially useless. It shows that successful theories indeed fit Dawid’s principles. But we have looked only at the very small subset of successful theories. It does not tell thus that following the principles would have led us to successful theories rather than unsuccessful ones. In the previous assessment, we were being dangerously fooled by survival bias. We looked at the path ultimately taken in the tree of possibilities, focusing on its characteristics, but forgetting that what matters is rather the difference with other possible paths.

To really meta check Dawid’s principles, it is important to study failures as well: the theories that looked promising but then were disproved and ultimately forgotten. For obvious reasons, such theories are no longer taught thus all too easy to overlook.

A brief History of failures

Let us start our brief History of promising-theories-that-failed by Nordström’s gravity. This theory is slightly anterior to General Relativity and was proposed by Gunnar Nordström in 1913 (with crucial improvements by Einstein, Laue, and others; see Norton for its fascinating history). It is built upon the same fundamental principles as General Relativity and differs only subtly in its field equations. Mainly, General Relativity is a tensor theory of gravity, in that the Einstein’s tensor G_{\mu\nu}= R_{\mu\nu} - \frac{1}{2} R g_{\mu\nu} is proportional to the matter stress-energy tensor T_{\mu\nu}:

R_{\mu\nu} - \frac{1}{2} R g_{\mu\nu} = \frac{8 \pi G}{c^4} T_{\mu\nu}

Nordström’s theory is a simpler scalar theory of gravity. The curvature R is sourced by the trace T:=T_{\mu}^\mu of the stress-energy tensor. This field equation is insufficient to fully fix the metric and one just adds the constraint that the Weyl tensor C_{abcd} is zero:

R = \frac{24  \pi G}{c^4} T
C=0

This makes Nordström’s theory arguably mathematically neater than Einstein’s theory. Further, while it brings all the modern features of metric theories of gravity, its prediction are in many cases quantitatively closer to the predictions of Newton’s theory. Finally, for two years, it was the only game in town as Einstein’s tensor theory was not yet finished.

But Nordström’s theory predicts no light deflection by gravitational fields and the wrong value (by a factor -\frac{1}{6}) for the advance of the perihelia of Mercury. These experimental results were not known in 1913. If we had had to compare Nordström’s and Einstein’s theories with Dawid’s principles, I think we would have hastily given Nordström the win.

Another example of a promising theory that was ultimately falsified is the SU(5) Grand Unified theory, proposed by Georgi and Glashow in 1974. The idea is to embed the Gauge groups U(1)\times SU(2) \times SU(3) of the Standard Model into the simple Gauge group SU(5). In this theory, the 3 (non gravitational) forces are the low energy manifestations of a single force. Going towards greater unification had been a successful way to proceed, from Maxwell’s fusion of electric and magnetic phenomena to Glashow-Salam-Weinberg’s electroweak unification. Further, the introduction of a simple Gauge group mimics earlier approaches successfully applied to quarks and the strong interaction. The theory of Georgi and Glashow seems to leverage the unreasonable effectiveness of mathematics (coined by Wigner) in its purest form.

The SU(5) Grand Unified Theory predicts that protons can decay and have a lifetime of ~10^{31} years. The Super-Kamiokande detector in Japan has looked for such events, without success: if protons actually decay, they do so at least a thousand times too rarely to be compatible with SU(5) theory. Despite the early enthusiasm and its high score at the non-empirical confirmation game, this theory is now falsified.

Physics is full of such examples of theoretically appealing yet empirically inadequate ideas. We may mention also Kaluza-Klein type theories unifying gauge theories and gravity, S-matrix approaches to the understanding of fundamental interactions, and Einstein’s and Schrödinger’s attempts at unified theories. We can probably add many supersymmetric extensions of the Standard Model to this list given the recent LHC null results. In many cases, we have theories that fit Dawid’s principles even better than our currently accepted theories, but that nonetheless fail experimental tests. The Standard Model and General Relativity do pretty well in the non-empirical confirmation game, but they would have been beaten by many alternative proposals. Only experiments allowed to choose the right yet not-so-beautiful path.

Conclusion

Looking at failed theories makes Dawid’s principles seem less powerful than a test on a surviving subset. But I do not have a proposal to improve on them. It may very well be that they are the best one can get: perhaps we just cannot expect too much from non-empirical principles. In the end, I am not sure we can defend more than the weakest meaning of non-empirical confirmation: a slight improvement of an anyway fuzzily defined Bayesian prior.

Looking at modern physics, we see an extremely biased sample of theories: they are the old fridge that is still miraculously working. Their success may very well be more contingent than we think.

I think this calls for more modesty and open-mindedness from theorists. In light of the historically mixed record of non-empirical confirmation principles, we should be careful not to put too much trust in neat but untested constructions and remain open to alternatives.

Theorists often behave like deluded zealots, putting an absurdly high level of trust in their models and the principles on which they are built. While it may be efficient to obtain funding, it is suboptimal to understand Nature. Theoretical physicists too can be fooled by survival bias.

This post is a write up of a talk I gave for an informal seminar at MPQ a few months ago. As main reference, I have used Dawid’s article The Significance of Non-Empirical Confirmation in Fundamental Physics (arXiv:1702.01133) which is Dawid’s contribution to the “Why trust a theory” conference.

Self-promotion

Anil Ananthaswamy has written a very nice piece for New Scientist about semiclassical gravity. It deals with recent attempts (to which I have taken part) in trying to make sense of a theory in which gravity is fundamentally classical.

screennewscientist

The article is a bit too kind and likely gives me a more central place in this adventure than I actually deserve. Nonetheless, the article contains the right amount of qualifiers and lets the skeptics speak. I understand them all too well: our approach is clearly not without flaws. It is more a counterexample to pessimistic views about semiclassical gravity than a believable proposal for a theory of everything. And I would not be surprised if it were to be falsified in the near future. But as Lajos is quoted saying at the end: “we must explore”.

Around the end of the article, Carlo Rovelli says he gives gravity 99% chance of being quantum. There, I think he is being a bit overconfident about the path he and his collaborators are pursuing, although his skepticism about our own work is again warranted.  Are the reasons why we think gravity should be quantum so strong? I am not sure, after all we know very little about gravity (see this recent essay). If gravity is not semiclassical in the way we have proposed, it could be in many others. Fortunately this question is answerable and will not require a particle accelerator the size of the Milky Way. If gravity is not quantum, this proposed experiment (which I had advertised here) will see it. Meanwhile, we have to remain open.

Spacetime essay

I have finally decided to put on arXiv a slightly remastered version (with figures) of my submission to the Beyond Spacetime essay contest. I have slightly tamed the provocative tone but it remains a bit rough. I still post it because I think it puts together many arguments I often make informally in seminars.

The winning essays took a slightly less head on approach to the subject of the contest, which was “Space and Time After Quantum Gravity“:

  • Why Black Hole Information Loss is Paradoxical, by David Wallace, arXiv
  • Problems with the cosmological constant problem, by Adam Koberinski, PhilSci

Wallace’s article is an answer to Tim Maudlin’s article I had mentioned here, but I have not read it yet.

Podcast on quantum mechanics

I have been interviewed by Vincent Debierre (in French) for Liberté Académique. 

Dans ce podcast nous parlons principalement de mécanique quantique et un peu de sa popularisation. Si je suis peut-être un peu provocateur dans ma critique des effets de manche de certains physicien, je crois ne pas avoir dit trop de bêtises. Je me suis trompé sur une date : l’axiomatisation de la mécanique quantique due à Von Neumann date de 1932 et non de 1926 (qui correspond à la “première vague”, avec Heisenberg, Born et Jordan, suivie de la formulation de Dirac en 1930).

Grâce à Thomas Leblé, j’ai retrouvé un brouillon que j’avais écrit il y a 3 ans sur ce que je comprenais des difficultés de la mécanique quantique et qui fait echo à ce que je raconte dans ce podcast. En le relisant, je suis étonné d’être assez d’accord avec ce que j’avais écrit à l’époque. Naturellement, il y a pas mal de formulations un peu malheureuses (notamment je pense qu’on peut supprimer “réaliste” dans “réaliste local”), ainsi que pas mal de fautes d’orthographes. Ceci dit je préfère laisser le texte momentanément dans son jus, avec ses problèmes, plutôt que de passer un temps indéterminé à le retravailler sans que jamais personne ne puisse le lire. Bref, vous pouvez regarder si ça vous intéresse, en gardant à l’esprit que c’est un vieux brouillon.

Status of the article pipeline

Ideas have a long way to go before they become articles. Usually for me the first step is to write down the computations and main arguments on paper. Then I write down a better typeset draft and iterate on it until it looks like a decent preprint. I then put it on arXiv, hoping to gather some feedback. Then I further iterate on the draft, rewrite things, correct mistakes, and send it to a first journal. The first journal typically rejects it but hopefully I get constructive reports. So I go to a second journal or a third and an article usually quite different from the original idea gets published. In the process, my ego gets shattered but the article becomes, I think, substantially better. This process is very long and so many articles are at different stages in the “pipeline” at the same time. This week and the last, I have made a little bit of progress emptying the pipeline.

1) Exact signal correlators in continuous quantum measurement

This is a new preprint I am quite happy about. In continuous quantum measurement, the objective is usually to reconstruct a continuous quantum trajectory \rho_t from a noisy continuous measurement readout I_t. People often make the confusing remark that the  quantum trajectory \rho_t is not directly “measurable”, it is just reconstructed from a model. This is misleading. One can do projective measurements every time the state reaches a given value \rho_t = \sigma and then check that the statistics obtained do match the theoretical prediction from continuous measurement theory. Nonetheless there is a valid point which is that this is inconvenient and that to validate the model or measure its parameters, it would be more convenient to talk only about the statistics of the continuous measurement readout. So instead of using the theory to reconstruct the state \rho_t, we can use the theory to compute the statistics of the signal I_t. This would allow read the free parameters of the model from “directly” obtained quantities. This point is not from me and has been made recently notably in a recent preprint by Atalaya et al. . In this article, the authors compute the n-point correlation function of the signal for qubits with a method that is (or at least seems) ad hoc. Reading this preprint, I remembered that I knew how to compute the n-point functions in full generality. I just had never understood that it could be useful. My only fear is that the result is known but buried in the Russian literature of the nineties. In that case, it would be the end of the journey for this preprint in the pipeline.

2) Binding quantum matter and spacetime, without romanticism

This is an essay I had written for the “Beyond spacetime” essay contest. In the absence of empirical evidence, I am defending semiclassical gravity as a sober and metaphysically sounder alternative to quantum gravity. While most people advocating semiclassical gravity criticize the cheap rebuttals made against it, I have tried to also be constructive, pushing the explicit examples we have introduced with Lajos Diosi. The essay is voluntarily provocative and certainly not optimally constructed. Perhaps I have tried a bit too hard to stretch my arguments to fit into the subject (and actually take its counterpoint). Anyway, while it was apparently shortlisted, it didn’t win the prize. So I am left with a rather specific essay I do not know what to do with.

I am not sure what will happen to it. The organizers of the contests kindly gave me the referee reports along with a suggestion to submit the essay (after corrections) to a philosophy of science journal. But I think I would need to substantially modify it to transform it into a proper article and I am a bit overwhelmed by the task. On the other hand, although imperfect, I think this essay made a few points that are insufficiently known. I am also tempted by the sunk cost fallacy: I have already spent quite some time on this and I would hate for it to be totally wasted. Before I make up my mind, you’re welcome to read the present version of the essay.

3) Ghirardi Rimini Weber model with massive flashes

This is an article I had already talked about here. It started as a simple toy model to explain the basic idea of new approaches to semiclassical gravity. Stimulated by the perspective that such non-bullshity foundational work might be acceptable in Physical Review Letters, I spent a bit of time polishing the sentences and shortening the article. The objective was to make it as easy to read as possible (thanks to Dustin Lazarovici who helped me a great deal). This is a decently good letter I think, in that it is really self contained and readable by a general audience interested in foundations (and not the grandiosely oversold summary of a 20 page Supplementary Material that letters tend to be). Well, PRL eventually refused but it was accepted as is in Phys. Rev. D Rapid Communications which is probably the next best I could hope for. It will soon be published but you can check the latest version already here.

Random news

  • An interesting experimental proposal to finally pin down the quantum (or classical) character of gravity has just been accepted in Physical Review Letters (arxiv). I really think it is far more urgent to explore (theoretically and experimentally) whether gravity really has to be quantized, than to head on brute force quantize everything. Hence I am very sympathetic to this proposal by Sougato Bose and his collaborators (I had heard of it already a year ago in Bangalore). Anil Ananthaswamy has written a popular summary of the article for New Scientist and I am quoted (together with Maaneli Derahkshani) saying all the good that I think of such attempts.
  • I have written a comment (arxiv) to correct some technical mistakes made in a couple of papers on collapse models. The trouble in the original articles came from slightly too brutal perturbative expansions of stochastic differential equations. I have tried to show different ways one could have obtained the correct results and explained what I think went wrong in the original derivations. The bottom line is that to compute the empirical predictions of all “standard” collapse models, it is never necessary to consider the full stochastic differential equation. The average of the solutions, which obeys a simpler linear master equation with smooth solutions, is always sufficient. This is really not a big deal, but perhaps this will simplify the life of people who want to make computations with such models.

Breaking 3h

On Saturday May 6th, on the Monza formula 1 track in Italy, three world class marathoners (Eliud Kipchoge, Zersenay Tadese, and Lelisa Desisa), helped by a team of Nike physiologists and bio-mechanics engineers, attempted to break the 2h barrier on marathon. Eliud Kipchoge came the closest with a breathtakingly fast 2h00’25” setting an unofficial world record (smashing the official record of Denis Kimetto by 2 minutes). A beautiful 1h documentary by National Geographic has been made on this attempt and on the 6 months preceding it. It is a beautiful ode to running, replacing the Chariots of Fire in my Pantheon of running movies.

While the 2h barrier is about to be broken by professional runners from the East African highlands, the more reachable 3h mark is still a tough target for amateur runners. This post is about my attempt to break it.

Why run?

Running hard makes a priori no sense. It brings less health benefits than running moderately, is far more painful, gets you tired, and often wears down your will for other activities. The endorphines are only a mild compensation for this absurdity. They are probably not the main reason why people train: in my opinion, runners are not mainly endorphines junkies. Running is great rather because it allows to project down your value to hard numbers. It is hard for me to know how good of a researcher I am, but I know I am worth 36′ on a flat 10km. I also know quite precisely what I could do to improve this number almost deterministially. In running, progress is straightforward to quantify and well correlated with the work that is put into it.

I have been running regularly, more or less seriously, for about 12 years. But I have not really made progress in the recent years. I was decently well trained when I set my records on middle distances (mainly 1500m in 4’15”) and it has thus become difficult to beat my old self. The exhilarating feeling of slow but steady progress has faded away. Trying a new distance for which I had no previous reference was a good way to get some motivation back. Marathon training is also quite different from middle distance training, and thus provides some novelty in an otherwise dull workout routine. So I started an training plan for the Munich marathon in order to add a new “number” to my profile.

The theory

There is a lot of theory around running. Not all of it is scientific and there is quite a lot of running “tips” that are probably just superstition. That said, I think the fundamentals are more or less undisputed and so may be worth quickly explaining. I will have to be very sketchy and the reader interested in constructing a good training plan should crawl the web or get in touch with a professional coach. I will leave out the problems of nutrition/hydration, running form/technique, and gear.

My target for the marathon was 3h. This translates into 4’15” per kilometer or 14.1 km/h. At what pace should I train? Surprisingly, rarely at 14.1 km/h. What distance should I run? Counter-intuitively, never close to the marathon distance before the D-day.

The core of the training should be aimed at building the fundamental endurance without exhausting the body for the harder workouts. It should be at a pace that is easy, at which one can “easily speak but not sing”. For me, fundamental endurance is around 5′ per kilometer or 12km/h. This is much slower than marathon pace, but there is no need to go faster. Going faster would not improve fundamental endurance. It might slightly enhance other abilities I will mention later, but it would do so at the price of increased muscular fatigue. The latter would reduce the benefits of the other work outs that require a fresh body.

At the opposite end of the training pace “spectrum” lies the speed at V02 max. If you run faster and faster on a treadmill, the amount of oxygen consumed by your body per unit time will increase steadily until it reaches a plateau. The speed at which this happens is the speed at VO2max (vitesse maximum aérobie, VMA, in French). You can still go faster, because the body knows how to produce energy without oxygen. But at these faster speeds, the muscles produce massive amounts of lactic acid through the incomplete processing of glucose. Runners who specialize in distances from 400m to 1500m (and even 5000m for the finish) need to increase their tolerance to lactic acid because their race pace is faster than their pace at VO2max. For marathon, it is useless and so the fastest training pace is the pace around V02 max where the cardio-respiratory system is already at its max. It can be estimated crudely as the pace at which one can run 6 min (the half Cooper test) full blast. For me it is around 19km/h, that is much faster than marathon pace. Training at this pace puts the heart and all the respiratory system under maximum strain. By pushing the aerobic boundary, it makes the slower marathon pace easier to sustain. Training at this pace is extremely tiring and difficult and thus requires a fresh body.

vo2

The next important pace, more fuzzily defined, is the pace close to the “lactic threshold”. Actually, even before reaching the V02max, where your cardio respiratory system is at its maximum, the muscles already start producing lactic acid. This lactic acid is continuously eliminated by blood circulation but at some point it starts accumulating. The delimitation is not sharp, but for me it starts perhaps around 16-17 km/h (or the 10km pace).

At last comes the “specific” speed, that is the race pace. For me again, this is a tiny bit more than 14 km/h. Interestingly, nothing much physiologically happens at this pace so it is not great for training. One needs to run at this pace so that the body gets used to it (to make it the automatic cruising speed) and to improve the running form and running economy. But there is no need to overdo it, especially at the beginning.

Training

Starting from a decent level on middle distance running but a recent lack of proper runs, I aimed for a 3 months training plan containing 5 work outs a week. The objective of a standard training plan is to put the previous pieces together. Here is what a typical week contains:

  • 2 easy endurance run (45min to 1h at 12km/h)
  • 1 “short interval” training session at VO2max pace (typically 20*[30″ fast 30″ slow]), starting with a 20-30′ warm-up
  • 1 “long interval” training session around the lactic threshold (typically 6*3′), starting with a 20-30′ warm-up
  • 1 long run, 1h30 to 2h15, at endurance pace (12 km/h) with a few easy accelerations inside (say 4*5′ at 15km/h). Going beyond 2h15 provides no benefits (or rather, the potential benefits are outweight by the difficulty to recover and the tremendously increased risks of injury).

The two pure endurance runs are easy and can typically go after the 3 other hard workouts. The two interval work outs, which both put a strong strain on the heart, should be spaced as much as possible.

This typical week is adapted to the first part of the training plan as it optimally develops the physiology. It is done at paces either much slower or faster than the race pace. Only in the second half of the training plan is the gap bridged and this latter speed introduced, either during the long interval session or during the “accelerations” of the long run. This plan could be improved by adding at least one session of musculation (targeted on core muscles and legs), possibly after a shortened endurance run. The benefits are more long term and so I did not put any in my 3 month plan. It made me run between 60 and 85 km a week and I could feel tremendous progress. The two weeks before the race day were 30% and 60% lighter in volume than the typical weeks to get back some freshness. I was ready.

Race day

After such a lot of training, cruising at marathon pace (4’15” per km) feels very easy, especially after two weeks of lighter work outs and overcompensation. I actually started a bit faster, stabilizing around 4’09” where I felt just fine. The first 10 km were naturally very easy, I just had to focus on staying calm and drinking properly. I clocked 41′ (3h pace gives 42’30) without any effort. Going up to the half marathon mark at the same pace required a bit more effort. From the half mark to the 30 km mark, the pain slowly ramped up. Maintaining the pace started requiring a constant pressure from the mind, the feeling of cruising seamlessly faded away. After 30 km the amateur marathoner enters terra incognita. Long runs are almost never longer than 30 km during the training months because going beyond destroys the body too much for vanishing benefits. But on race day, the end is at 42 km. From 30 km to about 38 km I almost maintained the pace at the price of a huge mental effort. At that point the pain is everywhere in the body, every single cell lacks oxygen and glycogen. Around the 38 km mark, physiology beat my mind. At some point, no matter how much the mind pushes, the chemistry in the muscles says no. My body slowed down and my mind had to agree. I struggled to stay above 4’45” per kilometer which is just barely above the usually trivial fundamental endurance pace. This tough mind body bargain brought me to the finish line.

Official chip time 2h56’55”. Mission accomplished.

marathon

It is a decently good marathon, especially for a first. The slow down happened quite late in the race and was not catastrophic. Otherwise the pace was steady. Now I can say I am worth “below 3”. This will compensate any blow to the ego from the rejection of an article by angry referees. At least for a short while. But next time, to get the same feeling of achievement, I will have to go faster.

The risk of false positives in epidemiology

Recently, a study carried out partly by INSERM researchers on endocrine disruptors has been discussed quite a lot in French mainstream medias (see for example in le Monde, France Inter, le Point, le Parisien, Libération). The conclusion of the study is allegedly the following: young kids whose mother has been exposed to certain phenols and phtalates have an increased risk to develop behavioral disorders (such as hyperactivity). Such a conclusion, if it were really robustly supported by the study, would no doubt deserve a wide coverage. Unfortunately (or fortunately) that does not seem to be the case. After reading the study, it seems to me that if an adverse effect exists, it is sufficiently small to be statistically insignificant on a sample of more than 500 people.

Before jumping in the details and criticizing the publicity around a statistically under-powered study, let me provide a disclaimer. I am neither a biologist nor a medical doctor. I have no specific knowledge of the mechanisms by which phtalates and phenols may adversely impact the human body. It is entirely possible that these compounds indeed have a damaging effect on humans at a large scale, something the rest of the literature suggests. My objective is also not to discuss the experimental aspects of the test protocol of the study (urine samples, survey, etc.), I am not qualified to do so. I would just like to discuss the statistical part that illustrates a recurring difficulty in epidemiology and psychology. It is also an excuse to think about the mainstream reporting of such discoveries.

The study

abstract_ehp_1.png

Let me introduce briefly the study. The objective of the authors was to see if chemicals of the family of phenols and phtalates had adverse effects on the behavior of young boys. They used a urine sample from the mother as a proxy to the fetus’ exposition in utero and two surveys (answered by the mother) as proxies to the boy’s behaviors at 3 and 5 years. The number of participants ranged from 529 to 464 at the end of the study. According to the abstract, they found that Bisphenol A (BPA) was responsible for relationship problems at age 3 and hyperactivity-inattention problems at age 5. Three other compounds (Mono-n-butyl phtalates, Monobenzyl phtalaten, and triclosan) were also found to have the same kind of adverse effects on various behavioral subtests.

abstract_ehp_results.png

These results all seem statistically significant and are presented as such. So it looks like the news are accurate. Reading the main text however gives a different message. To understand why, we need to make a detour by elementary statistics.

The objective of the study is to see if there exists a correlation (hinting at a causation in this context) between exposition to some compounds and behavior. The difficulty is that behavioral problems may happen to all kids, whether or not they have been exposed to endocrine disruptors. To simplify, we can imagine that there are two groups of equal size, one that has been exposed and one that has not. Because there will be kids with behavioral problems in both groups, it may happen that the difference of observed rate measured in the two groups is just the result of chance. Tossing a coin 100 times may give 55 heads and 45 tails even if the coin is not biased, just out of “typical” randomness. We can never be sure that a result is not just a statistical fluke. What we can do however, is quantify the probability that the result would have happened had it been produced by randomness. We can ask: what is the probability p that we would have found the observed ratio of kids with problems in each group, had the two groups been the same? The definition of this probability may look convoluted but one should be careful to not simplify it into “What is the probability that there is an effect?”. This latter probability is impossible to know without additional assumptions.

The threshold for the probability p is often put at 5%. Usually one says a result is statistically significant if the data is such that chance would have produced it in less than 5% of the cases. The results found in the abstract of the study and mentioned before all meet this threshold. They thus look quite robust according to the standards in the field. But they are not.

Fake discovery rate or “look elsewhere effect”

Let us go back to the coin toss example. Let us say I have tossed a coin 100 times and I find 65 heads and 35 tails. The probability that a fair coin gives more than 65 heads is less than 5%. So I would tend to say that my coin is biased. Let us now say that I have 1000 different coins. I throw the first one, I get say 55 heads 45 tails. Maybe it’s unbiased. So I throw the next one, and the next one, and the next one… At some point, say at the 258th coin, I find 65 heads and 35 tails. Does that mean the coin is biased? Taken alone, this result looks statistically significant. Considering the experiment as a whole, it could have totally happened by chance. It is likely that something unlikely happens if it is attempted many times. In that case, unless I get the opportunity to test the 258th coin again, I cannot say much.

So if we test many hypothesis at the same time, we are sure that some will look independently statistically significant. If this is not taken care of, there is an important risk of fake discovery. This is a very well known problem in epidemiology and in psychology. Most researchers now correct their results of statistical significance taking into account the many hypothesis they have explored. Sometimes however, this is difficult because the hypothesis that were tested but gave null results are simply not published or not ever remembered if they were only briefly considered. This difficulty is at the root of the replication crisis in psychology where many historically important results cannot be reproduced and are now understood to have been the result of pure chance.

In particle physics, experimentalists face a similar problem. To look for new particles, they check if the signal they measure can be explained by the random background of known physics. But because they have many signals, at different energies, and in different production channels, they are effectively testing many hypothesis and so they have to be careful not to trust weakly locally significant results. To avoid fake discoveries, the threshold p, the probability that the background would have given such a result, is traditionally pushed to an absurdly small value of 5 sigma (i.e.  p< 3. 10^{-7}). Additionally, when preliminary searches do not reach this stringent threshold, experimentalists usually provide the statistical significance of their results corrected for the “look elsewhere effect”, that is from the fact that they have explored many hypothesis. This does not mean that physicists never get excited by statistical flukes (see e.g. this), but at least the field is well protected from fake positives.

Back to the study

The study we are interested in is very transparent about what hypothesis have been tested and is extremely precise from that point of view. The authors are models of rigor. The results are summarized in 4 tables that are similar and only differ in some minor subtleties (test at 3 or 5 years) and in the way statistical corrections are applied. The first one is shown below as an illustration:

table_ehp

Two stars next to a number mean that it is individually statistically significant. For example, in the “Bisphenol A” line and “peer relationship problems” column we have a number larger than 1 (which means positive correlation here) and two stars, meaning that the result is statistically significant (p < 5%). But considering the table immediately points to the previous difficulty. The authors have measured the correlation between 15 compounds and 7 behavioral subtests, that is a total of 105 hypothesis. For 7 of them, the authors find a statistically significant effect, which is typically what one would obtain… from pure chance, as in the coin tossing example. Interestingly, the MiBP is even found to reduce hyperactivity problems! This is not surprising and easily attributed to the testing of too many hypotheses with a low significance threshold. But if one rightfully finds the positive effect of MiBP likely spurious, one is bound to conclude that it could as well be the case for the negative effects reported.

In the main text, the authors are perfectly honest and acknowledge the fact that the results considered as a whole are not statistically significant.

“When we applied a correction for multiple comparisons using an FDR [fake discovery rate] method, none of the associations reported in the results section remained significant, the lowest corrected p-value being 0.42 for the association between BPA and the peer relations problem score at 3y.”

As a result they later acknowledge that the results can be due to chance.

“Because we tested many associations, some of our results might be chance finding, as suggested by the fact that none of the observed p-values remained significant after FDR correction.”

The “might” might be a bit too weak given that the results are far from being statistically significant globaly. The rest of the article is devoted to an interesting discussion of the previous literature that may go in the same direction as the results individually significant. This is aimed at providing a justification to “keep” them and mention them in the conclusion. This is indeed rational from a Bayesian point of view to give more credit to results if they confirm previous findings. Nonetheless, as the justification is likely found a posteriori in the literature, there is a strong risk of confirmation bias. I think in the present situation, the best one can honestly conclude is that the study does not explicitly contradicts the existing literature but does not allow one to reliably make a conclusion in one way or another. Perhaps there is one comforting robust result which is that the effect of the compounds tested, if it exists, is sufficiently small to remain statistically insignificant on a sample of 500 people. If endocrine disruptors are causing some behavioral troubles, this study would suggest that they are not the main cause.

One may wonder why the researchers tested so many hypothesis knowing it would likely ruin the statistical significance of their results. It is actually totally understandable. The authors had a unique and rare dataset, that took years to be collected. It is tempting to “make the best of it” and explore as many associations as possible. The drawback is that the study becomes only exploratory. Further studies are needed to rigorously assess the effects of each molecule individually. This is very standard but it is important to understand that one should not conclude anything and especially not base any public policy on exploratory studies (unless they find globally statistically significant results). Indeed, such studies will typically have a huge rate of false positives. The same study replacing the 15 molecules by 15 different varieties of carrots or potatoes would have typically found that 5 of them had individually statistically significant adverse effects on some behavioral trait. As I hope this illustrates, basing public policies on associations coming from such exploratory studies would not be the reasonable application of the precautionary principle, it would be pure insanity.

Who is to blame?

Given the important limitations of the study, the reporting around it was clearly excessive and people were given an inaccurate presentation of the results. Not a single one of the news articles I have seen alludes to the statistical weakness of the study. So who is to blame? Interestingly, I do not think that there is any major culprit in this situation, but rather a succession of small inefficiencies in modern science and in the way it is reported.

The authors of the study cannot easily be blamed. Their article is a marvel in academic writing: it is perfectly correct while subliminally implying a little bit more than it actually demonstrates. The abstract is factually correct and only “omits” the qualifiers that the findings are only locally statistically significant. But the authors might defend themselves arguing that all the limitations of a study cannot be mentioned in the abstract (otherwise, who would need the full study?). The editor or referees could have requested that the abstract not only display technically correct information but also do not mislead the public. This would have forced the authors to report a negative result. But after all, one can read the main text especially as the journal is open access.

In the present case, the journalists have most likely not even read the abstract of the article. Most of the mainstream articles were clearly reporting only what was written in the official press release from INSERM. The press release does not mention a single relevant limitation of the study. I think this is where a mistake was made. Journalists should never take a press release on recent scientific findings as the unique source for a piece. One can write anything in a press release. Even if INSERM is a respectable and reliable institution, its researchers are humans. They will naturally be tempted to inflate the impact of their results in a non peer reviewed release. Researchers need to secure funding, they want their students to be hired in the future, and they have an ego like everyone else.

It is natural for journalists to distrust press releases coming from companies like Monsanto and to cross-check everything that is written in them. It is natural and desirable: the company may present the findings in a biased way to push forward its agenda. But journalists should not entirely drop their skepticism when the research comes from public or independent research institutions. Again, researchers are almost always of good faith but they are humans.

Conclusion

Do low doses of phenols and phtalates have adverse effects on the health of young boys? From the study I discussed, I think it is impossible to know. This does not mean it is not the case. The effect may be subtle enough that the study I mentioned only spotted it in a statistically insignificant way. Research on larger populations may find a robust association in the future. Should one modify the regulation on phenols and phtalates based only on the study I discussed? I think not. If one takes seriously all the locally statistically significant adverse effects found in exploratory studies, one can forbid carrots, potatoes, or even water. This does not mean that there are no other studies telling us to tightly regulate phenols of phtalates. Unfortunately I do not know the literature well, but this review for example seems to point to unquestionable adverse effects of BPA, especially on reproductive functions. So there are probably good reasons to tightly regulate BPA. That said, pushing forward studies that are not really conclusive and thus easy to rebute is probably not the best way to get such a regulation in place.

New article in Quantum

My preprint from January “Time-local unraveling of non-Markovian stochastic Schrödinger equations” has been accepted and published in Quantum. It is a new open-access journal managed by physicists (mine is just the 29th article!):

quantum_logo

“Quantum is a non-profit and open access peer-reviewed journal that provides high visibility for quality research on quantum science and related fields. It is an effort by researchers and for researchers to make science more open and publishing more transparent and efficient.”

I indeed had a smooth experience with the peer-review process, which is done through the well designed Scholastica platform. It was not particularly fast but at least it was very transparent and the editor was quick to kindly update me about what was going on.

I have actually been lucky that peer-review took some time because it allowed me to spot a serious but quite subtle mistake in the original submission. I had severely misunderstood a step that made a whole section incorrect. The upside is that I finally managed to do what I initially claimed to do in this section, although I ended up doing it in a totally different (and perhaps even nicer) way. At that point, I was a bit suspicious with my own ability to carry analytical computations correctly. So I have added numerics to check, on a not-totally-trivial example, that the critical step I had messed up was now correct. So please check only the new published version and forgive me for the previous flawed preprints on arXiv. Everything should be fine now.

quantu_figure

The reassuring numerics