<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">NEJSDS</journal-id>
<journal-title-group><journal-title>The New England Journal of Statistics in Data Science</journal-title></journal-title-group>
<issn pub-type="ppub">2693-7166</issn><issn-l>2693-7166</issn-l>
<publisher>
<publisher-name>New England Statistical Society</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">NEJSDS4REJ</article-id>
<article-id pub-id-type="doi">10.51387/22-NEJSDS4REJ</article-id>
<article-categories>
<subj-group subj-group-type="heading"><subject>Commentary and/or Historical Perspective</subject></subj-group>
<subj-group subj-group-type="area"><subject>Statistical Methodology</subject></subj-group>
</article-categories>
<title-group>
<article-title>Rejoinder of “Four Types of Frequentism and Their Interplay with Bayesianism”<xref ref-type="fn" rid="j_nejsds4rej_fn_001"><sup>✩</sup></xref></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Berger</surname><given-names>James</given-names></name><email xlink:href="mailto:berger@duke.edu">berger@duke.edu</email><xref ref-type="aff" rid="j_nejsds4rej_aff_001"/>
</contrib>
<aff id="j_nejsds4rej_aff_001">Department of Statistical Science, <institution>Duke University</institution>, <country>USA</country>. E-mail address: <email xlink:href="mailto:berger@duke.edu">berger@duke.edu</email></aff>
</contrib-group>
<author-notes>
<fn id="j_nejsds4rej_fn_001"><label>✩</label>
<p>Main article: <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/22-NEJSDS4">https://doi.org/10.51387/22-NEJSDS4</ext-link>.</p></fn>
</author-notes>
<pub-date pub-type="ppub"><year>2023</year></pub-date><pub-date pub-type="epub"><day>19</day><month>12</month><year>2022</year></pub-date><volume>1</volume><issue>2</issue><fpage>147</fpage><lpage>148</lpage><history><date date-type="accepted"><day>18</day><month>7</month><year>2022</year></date></history>
<permissions><copyright-statement>© 2023 New England Statistical Society</copyright-statement><copyright-year>2023</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions><related-article related-article-type="commentary-article" ext-link-type="doi" xlink:href="https://doi.org/10.51387/22-NEJSDS4" id="j_nejsds4rej_ppc_001"/>
</article-meta>
</front>
<body>
<p>Our thanks to all the discussants for their enlightening comments and valuable perspectives.</p>
<sec id="j_nejsds4rej_s_001">
<title>Response to Luis Pericchi</title>
<p>Pericchi’s Table 1 and Figure 1 are interesting, in that they indicate that the empirical error in testing (which is called fdr, therein) is more sensitive to <inline-formula id="j_nejsds4rej_ineq_001"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\pi _{0}}$]]></tex-math></alternatives></inline-formula> than to <italic>β</italic>. This is also clear from the odds expression in equation 18 in the paper; the odds change more rapidly with changes in <inline-formula id="j_nejsds4rej_ineq_002"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\pi _{0}}$]]></tex-math></alternatives></inline-formula> (the derivative of the log odds with respect to <inline-formula id="j_nejsds4rej_ineq_003"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\pi _{0}}$]]></tex-math></alternatives></inline-formula> is <inline-formula id="j_nejsds4rej_ineq_004"><alternatives><mml:math>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mo fence="true" stretchy="false">[</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo fence="true" stretchy="false">]</mml:mo></mml:math><tex-math><![CDATA[$1/[{\pi _{0}}(1-{\pi _{0}})]$]]></tex-math></alternatives></inline-formula>) than with changes in <italic>β</italic> (the derivative of the log odds with respect to <italic>β</italic> is <inline-formula id="j_nejsds4rej_ineq_005"><alternatives><mml:math>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mi mathvariant="italic">β</mml:mi></mml:math><tex-math><![CDATA[$1/\beta $]]></tex-math></alternatives></inline-formula>). So refusal to even consider <inline-formula id="j_nejsds4rej_ineq_006"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\pi _{0}}$]]></tex-math></alternatives></inline-formula> is questionable.</p>
<p>Indeed, when <inline-formula id="j_nejsds4rej_ineq_007"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\pi _{0}}$]]></tex-math></alternatives></inline-formula> is unknown, it is natural for a Bayesian to treat it as just another unknown to be given a prior distribution. To frequentists, however, it is more common to estimate <inline-formula id="j_nejsds4rej_ineq_008"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\pi _{0}}$]]></tex-math></alternatives></inline-formula> via, say, empirical Bayes.</p>
<p>Improving the <inline-formula id="j_nejsds4rej_ineq_009"><alternatives><mml:math>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mspace width="0.1667em"/>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mspace width="0.1667em"/>
<mml:mo movablelimits="false">log</mml:mo>
<mml:mi mathvariant="italic">p</mml:mi></mml:math><tex-math><![CDATA[$-e\hspace{0.1667em}p\hspace{0.1667em}\log p$]]></tex-math></alternatives></inline-formula> bound on a Bayes factor is certainly a worthy goal, and we wish Pericchi success in this endeavor.</p>
</sec>
<sec id="j_nejsds4rej_s_002">
<title>Response to Judith Rousseau</title>
<p>Rousseau’s concern about the lack of precision in the notion of empirical frequentism is understandable, since we purposely avoided trying to be precise, to allow for flexibility. She does make the helpful and clarifying distinction that, however it is defined, empirical frequentism should be based on a sequence of observable events, rather than a sequence of unobservable events. In hypothesis testing for instance, ‘rejections’ are observable events, so studying what happens under ‘rejections’ is compatible with empirical frequentism. But basing the evaluation on a series of unobservable events, such as the set of all true null hypotheses (the series of events used to define Type I error), would not qualify as empirical frequentism.</p>
<p>This is also complicated by the fact that empirical frequentism imagines that one learns the truth for the considered events, e.g., learns which of the null hypotheses are true in the sequence of rejected events. Sometimes this is somewhat realistic, in that rejections are ideally followed by efforts at replication. But we could never learn which nulls were true in the set of acceptances, so Type I error could not be determined from the series of real experiments.</p>
<p>Rousseau shows that one can make some rather strange and unhelpful empirical frequentist statements, reinforcing that care in the definition is needed.</p>
<p>Rousseau mentions E-values and states that <inline-formula id="j_nejsds4rej_ineq_010"><alternatives><mml:math>
<mml:mo movablelimits="false">log</mml:mo>
<mml:mi mathvariant="italic">E</mml:mi></mml:math><tex-math><![CDATA[$\log E$]]></tex-math></alternatives></inline-formula> might well have an empirical frequentist justification. That would be nice because E-values lack a procedural frequentist justification. An E-value, <inline-formula id="j_nejsds4rej_ineq_011"><alternatives><mml:math>
<mml:mi mathvariant="italic">E</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$E(x)$]]></tex-math></alternatives></inline-formula>, satisfies the condition <inline-formula id="j_nejsds4rej_ineq_012"><alternatives><mml:math>
<mml:mi mathvariant="italic">P</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mi mathvariant="italic">E</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo stretchy="false">≤</mml:mo>
<mml:mi mathvariant="italic">α</mml:mi>
<mml:mo stretchy="false">∣</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo stretchy="false">≤</mml:mo>
<mml:mi mathvariant="italic">α</mml:mi></mml:math><tex-math><![CDATA[$P(1/E(x)\le \alpha \mid {H_{0}})\le \alpha $]]></tex-math></alternatives></inline-formula>, so the procedure “reject if <inline-formula id="j_nejsds4rej_ineq_013"><alternatives><mml:math>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mi mathvariant="italic">E</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo stretchy="false">≤</mml:mo>
<mml:mi mathvariant="italic">α</mml:mi></mml:math><tex-math><![CDATA[$1/E(x)\le \alpha $]]></tex-math></alternatives></inline-formula>” does have the procedural frequentist property of having Type I error controlled at level <italic>α</italic>. But reporting <inline-formula id="j_nejsds4rej_ineq_014"><alternatives><mml:math>
<mml:mi mathvariant="italic">E</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$E(x)$]]></tex-math></alternatives></inline-formula> itself has no obvious procedural frequentist justification. (The situation is exactly the same as with a <italic>p</italic>-value: <inline-formula id="j_nejsds4rej_ineq_015"><alternatives><mml:math>
<mml:mi mathvariant="italic">P</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo stretchy="false">≤</mml:mo>
<mml:mi mathvariant="italic">α</mml:mi>
<mml:mo stretchy="false">∣</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">α</mml:mi></mml:math><tex-math><![CDATA[$P(p(x)\le \alpha \mid {H_{0}})=\alpha $]]></tex-math></alternatives></inline-formula>, but directly reporting <italic>p</italic> does not have any procedural frequentist justification.)</p>
</sec>
<sec id="j_nejsds4rej_s_003">
<title>Response to Aad Van Der Vaart</title>
<p>Van Der Vaart starts out with a fun journey detailing his personal impressions of the various frequentist types. Everything he says here is sensible. I particularly liked the statement that an empirical frequentist is a practicing statistician, while a procedural frequentist is a theoretician, and that both have value. The comments on consistency are also nice; indeed, consistency does not exactly fit the definition of procedural frequentism. The suggestion that one probably needs more refinement in the ‘types’ of frequentism, such as making empirical Bayes its own ‘type’ certainly has merit; this is reiterated later in the discussion, when referring to multiple testing.</p>
<p>In regards to ordinary testing, Van Der Vaart notes that there are possible empirical frequentist targets other than the empirical false discovery rate. He mentions two, the fraction of incorrect rejections amongst all true nulls, and the fraction of incorrect rejections amongst all tests, and notes that both are bounded by <italic>α</italic>. The first is just the Type I error and we would argue that this is not a valid empirical frequentist target, as it is not based on what happens with observables. The second is a valid empirical frequentist target, but not a reasonable one; why normalize the incorrect rejections by all tests <italic>N</italic>, rather than just the rejections?</p>
<p>In the discussion of multiple testing in the paper, note that each <inline-formula id="j_nejsds4rej_ineq_016"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${E_{i}}$]]></tex-math></alternatives></inline-formula> in the sequence <inline-formula id="j_nejsds4rej_ineq_017"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${E_{1}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_nejsds4rej_ineq_018"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${E_{2}}$]]></tex-math></alternatives></inline-formula>, … is itself a multiple testing scenario, so we are looking at a sequence of different multiple tests; it is this sequence of different multiple tests that is evaluated according to empirical frequentism. Bonferroni and regular FDR are both procedural frequentist properties, computed under the null hypothesis, so the goal was to study the empirical frequentist performance of such reports in repeated use. As with ordinary testing, it does not seem to be possible find a sensible empirical frequentist measure that avoids involvement of the prior probabilities of the hypotheses. As Van Der Vaart notes, such involvement is clearly feasible in situations where <inline-formula id="j_nejsds4rej_ineq_019"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\pi _{0}}$]]></tex-math></alternatives></inline-formula> can be estimated.</p>
</sec>
</body>
</article>
