A Sliced Design Approach for Conducting Online Experiments with Four Platforms, with Application to an Industry Email Campaign

Sadeghi, Soheil; Hung, Tzu-Hsiang; Chien, Peter; Arora, Neeraj

doi:10.51387/24-NEJSDS63

The New England Journal of Statistics in Data Science

A Sliced Design Approach for Conducting Online Experiments with Four Platforms, with Application to an Industry Email Campaign

Volume 2, Issue 3 (2024), pp. 311–322

Soheil Sadeghi Tzu-Hsiang Hung Peter Chien All authors (4)

https://doi.org/10.51387/24-NEJSDS63

Pub. online: 29 May 2024 Type: Methodology Article

Open Access

Area: Statistical Methodology

Accepted
20 January 2024

Published
29 May 2024

Abstract

Multivariate testing is a popular method to improve the effectiveness of digital marketing in industry. Online campaigns are often conducted across multiple platforms, such as desktops, tablets, smart phones, and smart watches. We propose minimum sliced aberration designs to accommodate online experiments with four platforms. This approach provides important insights into how different sets of design factors work differently across the four platforms, which can be used by industry for optimizing many forms of digital marketing. The effectiveness of the proposed approach is illustrated by an industrial email campaign with four platforms.

1 Introduction

Digital marketing through emails, social media, webinars, podcasts and other forms is commonly used in all industries. For example, email marketing is a powerful marketing tool used by many Business-to-Business and Business-to-Consumer companies and about 87% percent of marketers use it to disseminate their content. Email marketing has a high return on investment (ROI) ($42 for every $1 investment, on average); about 4 billion individuals send about 300 billion emails each day and these figures are expected to grow. About 80% of small and medium sized business rely on emails for customer acquisition and retention and a wide spectrum of industries including software and technology, hospitality, entertainment, retail, and consumer goods depend on emails as the primary form of promotion [7].

In light of widespread use of emails, it is not surprising that experiments are often used to improve email effectiveness. Common factors tested in email experiments include plain text vs. HTML, image A vs. B, the location of an image (e.g. right vs. left aligned), template design C vs. D, day of the week, time of the day, personalization (e.g. first name vs. no name), image call to action (CTA) vs. text CTA, familiar tone vs. professional tone, long vs. short emails, etc.

Statistical design (A/B or multivariate testing) plays a key role in email marketing. While A/B tests are effective in assessing the performance of one factor at a time, multivariate tests are far more powerful because they can be used to determine the optimal combination of several factors at the same time. It is also possible to assess interaction effects of email factors in a multivariate experiment. Online testing is a popular method to improve the layout of digital products such as a website and an app. It is usually conducted for the purpose of increasing the engagement and conversion metrics, e.g., page visits, click-through rate, and purchase. In its general form, online testing includes multiple attributes of a digital product and the effects of these attributes are studied on a response variable simultaneously. Factorial designs are increasingly used to perform online testing; for example, see [3]. As a unique challenge in digital spaces, online testing is conducted across multiple platforms including desktops, tablets, smart phones, and smart watches. A customer can interact with an application on one of these platforms, and a different set of attribute combinations may optimize his/her engagement metric for each platform. For example, although the presence of multiple images may work the best for an application on a tablet, a series of links might be the best for the same application on a smart-watch.

Recent research in marketing points to the fact that potential buyers follow different paths to purchase [4] that may involve different devices. For example, a person may initiate a purchase process on their smart-phone at home, continue to evaluate alternatives at their desktop computer during lunch, and may purchase a product on their laptop or tablet at home. This multi-device path to purchase requires that marketers ensure that their website is optimized for the user experience from a variety of device types (smart-phone, tablet, laptop, and desktop). Display advertising [2] and retargeted display advertising [5] copy that potential buyers see, should be optimized for the different device types. Such optimization is not limited to different device types alone. Variations also occur because of four browser types (Chrome, Internet Explorer, Firefox, and Safari). Moreover, marketers may want to optimize their display advertising campaigns across the four social media outlets that include Facebook, Instagram, Twitter, and LinkedIn.

[8] introduced a sliced version of the minimum aberration criterion to accommodate online experiments with two platforms. This article extends this method to construct sliced factorial designs for online experiments with four platforms using the method of replacement from [1] and [11]. The proposed designs are applied to an industrial email campaign by a network company. The goal of the campaign is to identify which attributes of the campaign are the most effective to impact the measured outcome (e.g. open rate). The email design team of the company identified six binary design factors for the multivariate test for four platforms: Android, iOS, Windows, and macOS.

The remainder of the article is organized as follows. Section 2 introduces the email campaign problem faced by the company. Section 3 provides a design solution to this campaign using sliced factorial designs for the four platforms and generalizes the method to any number of design factors. Section 4 gives results of the application of this design in the email campaign. Section 5 concludes with discussions.

2 Motivating Example

The network company launched an email blast to identify which among the six attributes are the most effective to impact the measured outcome. The email design team in the company sent an email to its customers with brief information on the market research report. To maintain the confidentiality of the company, we have masked parts of the email that may reveal the company name. There are six binary design factors of the multivariate test for four platforms ${P_{1}}$, ${P_{2}}$, ${P_{3}}$, and ${P_{4}}$. Platform ${P_{1}}$ refers to Android, ${P_{2}}$ refers to iOS, ${P_{3}}$ refers to Windows, and ${P_{4}}$ refers to macOS. The slice factor S is defined as a four-level factor where the jth level of S represents ${P_{j}}$. The six binary design factors are thumbnail, subject line, asset type, header image, preview text, and content display. If a full factorial design was needed, we would have to create ${2^{6}}$ versions for each of the four platforms. Blocking is a common method to form blocks of homogeneous units in a factorial design. While this method works well for agriculture and engineering applications where treatment-blocking interaction is negligible [10], it is ill-suited for online experiments with multiple platforms [8]. If one uses the slice factor S as a block factor to construct a blocked factorial design d with blocks ${d_{1}},\dots ,{d_{4}}$, —for example, a 32-run ${2^{6-1}}$ fractional factorial design with generators $6=12345$ and ${B_{1}}=134$ and ${B_{2}}=234$ —then S would be aliased with the higher-order interaction effects of the design factors. This assumes that the slice factor S has a negligible interaction with the design factors. This assumption contradicts the primary goal of how the six design factors effect may interact with the four platforms. Practical constraints such as budgets of extensive programming limit the number of versions. The company we work with can only afford up to eight versions for each of the four platforms and is interested in modeling the interaction between the design factors and the four platforms in addition to the factorial effects of the design factors. None of the aforementioned designs fit the requirement. We decided to use a ${2^{6+2-3}}$ minimum sliced aberration design constructed in Section 3 for the email campaign.

Using the design to be generated in Section 3, we created ${2^{3}}$ versions to perform the multivariate testing. We use the ${2^{6+2-3}}$ minimum sliced aberration design for four platforms. Table 1 lists six binary design factors identified for this study. These factors are 1: thumbnail, 2: subject line, 3: asset type, 4: header image, 5: preview text, and 6: content display. For each factor, we label the two levels as + and −.

Table 1

Six binary design factors for an industrial email blast.

Factor	+	−
1 Thumbnail	Yes	No
1 Thumbnail

2 Subject Line	Direct	Indirect
2 Subject Line
	Juniper Is a Leader...	Take me to your Leader
3 Asset Type	With	Without
3 Asset Type
	Report:
4 Header Image	Including	No
4 Header Image

5 Preview Text	No	Including
6 Content Display	Paragraph in the body	Bullet Points
6 Content Display

Each platform has eight versions to perform this multivariate testing. The eight versions form a ${2^{6-3}}$ fractional factorial design. The first version of our design is the version with all six design factors at − levels, presented in Table 3. Version two has factors 1, 4, and 5 that are at + levels and the remaining three factors are at − levels. Similarly, version three has factors 2, 4, and 6 at + levels although the other three factors are at − levels. Table 2 lists the description of the eight versions. Tables 3 and 4 include the email of all eight versions used in the campaign.

Table 2

Description of eight versions of the email study.

Version	Attribute
Version	Thumbnail	Subject Line	Asset Type	Header Image	Preview Text	Content Display
1	No	Indirect	Without	No	Including	Bullet Points
2	With	Direct	With	No	Including	Bullet Points
3	With	Indirect	Without	Including	No	Bullet Points
4	No	Direct	Without	Including	Including	Paragraph
5	No	Indirect	With	No	No	Paragraph
6	With	Direct	Without	No	No	Paragraph
7	With	Indirect	With	Including	Including	Paragraph
8	No	Direct	With	Including	No	Bullet Points

Table 3

Versions one, two, three, and four of the email study.


Version one:	Version two:
Take me to your Leader	Report: Juniper is a Leader...


Version three:	Version four:
Take me to your Leader	Juniper Is a Leader...

Table 4

Versions five, six, seven, and eight of the email study.


Version five:	Version six:
Report: Take me to your Leader	Report: Juniper is a Leader...


Version seven:	Version eight:
Report: Take me to your Leader	Report: Juniper is a Leader...

3 Sliced Factorial Designs with Four Platforms

We discuss how we constructed the sliced design in Section 2 for the email campaign. Using the same notation as [8], we cast our email campaign as a multi-platform experiment with four platforms: Android, iOS, Windows, and macOS. For readers who are unfamiliar with design of experiments, please refer to Appendix A and [10].

3.1 Four-Platform Experiment: Android, iOS, Windows, and macOS

Consider the four-platform experiment involving Android, iOS, Windows, and macOS discussed above. Denote the six two-level design factors by $1,\dots ,6$ on the four platforms denoted by ${P_{1}},\dots ,{P_{4}}$. The complete design d of the experiment consists of four sub designs, ${d_{1}},\dots ,{d_{4}}$, with ${d_{j}}$ associated with ${P_{j}}$. To quantify the difference among the platforms, let S denote a categorical factor, called the slice factor, with four levels. The jth level of S is associated with ${P_{j}}$.

We consider the following properties from [8] to guide the construction of our design:

Property 1.

For $j=1,\dots ,4$, the sub design ${d_{j}}$ should achieve desirable estimation capacity for the design factors on platform ${P_{j}}$.

Property 2.

Combined together, the complete design d should achieve desirable estimation capacity for the slice factor S and the two-way interactions between S and the design factors.

As a result of Property 1, each sub design ${d_{j}}$ estimates the effects of design factors on platform ${P_{j}}$, and according to effect hierarchy [10, p. 168], the focus of estimation is on the lower-order effects – main effects and two-way interactions. Property 2 suggests that the complete design d focuses on the estimation of the slice factor S and its two-way interactions with the design factors. This requires a different ordering of effects than the effect hierarchy for the complete design d in which S is more likely to be important than the main effects of the design factors, and two-way interaction effects of S with the design factors are more likely to be important than the two-way interaction effects of the design factors. [8] proposed the sliced effect hierarchy for the complete design d in order to accommodate Property 2. To formally define this ordering of effects for the design d in our experiment, let ${E_{I}}$ be the set of all effects that exclude the slice factor S and ${E_{S}}$ be the set of all effects that include the slice factor S. [8] defined the sliced effect hierarchy as follows:

Sliced Effect Hierarchy.

• For ${E_{I}}$ or ${E_{S}}$, the lower-order effects are more likely to be important than the higher-order effects.
• For ${E_{I}}$ or ${E_{S}}$, effects of the same order are equally likely to be important.
• Any effect in the set ${E_{S}}$ is likely to be more important than an effect in ${E_{I}}$ that is of the same order.
• Any effect in the set ${E_{S}}$ is likely to be less important than an effect in ${E_{I}}$ that is of a lower order.

In this experiment, the slice factor differs from the design factors in two ways. First, our four-platform experiment aims to detect what level of the design factors should be chosen for each platform and is not trying to select between platforms. Second, according to the sliced effect hierarchy, the importance of the effects related to the slice factor is higher than the importance of the same-order effects of the design factors. A design of the experiment should distinguish between the slice factor effects and the effects of the design factors.

We wanted to use the sliced factorial designs in [8] for our experiment. In a sliced factorial design, each sub design ${d_{j}}$ follows the effect hierarchy and the complete design d follows the sliced effect hierarchy. Unfortunately, [8] only constructed such designs for two platforms. Since our problem consists of four platforms, we cannot use that method directly. Below we discuss a solution by extending the method in [8] to accommodate our four platforms: Android, iOS, Windows, and macOS.

Our solution generates a design d with ${2^{6+2-p}}$ runs for our experiment, which is a ${(\frac{1}{2})^{p}}$ fraction of a ${2^{6+2}}$ factorial design. First, we describe the construction of a full factorial design d for the experiment. Consider a saturated ${2^{N-1}}$ design with $N={2^{8}}$ runs. We represent the $N-1$ columns of this design by eight independent columns denoted by $1,\dots ,8$, and their interactions of order two to eight, $12,13,\dots ,12\cdots 8$ [11]. Any three columns of the form $(a,b,ab)$, where $ab$ is the interaction column between columns a and b, can be used to represent the levels of the slice factor S without affecting orthogonality [1]. This replacement can be done according to the rule in Table 5.

Table 5

Rule for replacing any three columns of the form $(a,b,ab)$ by the 4-level column S.

a	b	$ab$		4-level column S
0	0	0		0
0	1	1	⟶	1
1	0	1		2
1	1	0		3

Next, we discuss details of our design d with ${2^{6+2-p}}$ runs. Consider a full factorial design with ${2^{6+2-p}}$ runs, with the 4-level column represented by $\mathbf{S}=({s_{1}},{s_{2}},{s_{3}})$ with ${s_{3}}={s_{1}}{s_{2}}$, and the 2-level columns represented by $1,\dots ,6-p$. The remaining p columns, $6-p+1,\dots ,6$, can be generated as interactions of the first $6-p+2$ columns. How to pick these p columns determines the generators and the defining relation of the design d. For a two-platform experiment, [8] defined the sliced wordlength pattern to accommodate the aliasing relation of the slice factor S. For a four-platform experiment, this definition does not work as the slice factor S has three aliasing relations of ${s_{1}}$, ${s_{2}}$, and ${s_{3}}$, respectively. The aliasing relation of ${s_{j}}$ is obtained by multiplying the defining relation of d by ${s_{j}}$. Therefore, a word W in the defining relation of d appears in the three aliasing relations for the slice factor S as ${s_{1}}W$, ${s_{2}}W$, and ${s_{3}}W$. We extend [8]’s definition of the sliced wordlength pattern to cover the minimum length of ${s_{1}}W$, ${s_{2}}W$, and ${s_{3}}W$. This extension minimizes the number of the shortest length of a sliced wordlength pattern. Defining the sliced wordlength pattern over the minimum length of ${s_{1}}W$, ${s_{2}}W$, and ${s_{3}}W$ ensures that the minimum sliced aberration protects against the worst-case scenario.

We use [11]’s definition of wordlength pattern for designs with two-level and four-level factors to define the sliced wordlength pattern. The design d with ${2^{6+2-p}}$ runs has two types of words in its defining relation. The first, called type 0, involves only the design factors $1,\dots ,6$, and the second, called type 1, involves one of the ${s_{j}}$’s and some of the design factors $1,\dots ,6$. Because of the relation ${s_{1}}{s_{2}}{s_{3}}=I$, any two ${s_{j}}$’s appearing in a word can be replaced by the third ${s_{j}}$. Therefore, these two types cover all the possible combinations. Following [11], the vector

(3.1)

\[ W(d)={([{A_{i0}}(d),{A_{i1}}(d)])_{i\ge 3}},\]

is the wordlength pattern of d in which ${A_{i0}}(d)$ and ${A_{i1}}(d)$ are the numbers of type 0 and type 1 words of length i in the defining relation of d, respectively. The term $[{A_{20}}(d),{A_{21}}(d)]$ is not considered in (3.1) because any design d with a positive value of $[{A_{20}}(d),{A_{21}}(d)]$ is not useful as two of its main effects are aliased. We define the sliced wordlength pattern of a design d for a four-platform experiment as follows: For a design d with the wordlength pattern $W(d)={([{A_{i0}}(d),{A_{i1}}(d)])_{i\ge 3}}$ for the four-platform experiment under consideration, we define the sliced wordlength pattern to be the vector $SW(d)={([S{A_{i0}}(d),S{A_{i1}}(d)])_{i\ge 2}}$, where

- $S{A_{i0}}(d)={A_{(i+1)1}}(d)$ for $i\ge 2$;
- $S{A_{i1}}(d)={A_{(i-1)0}}(d)$ for $i\ge 4$;
- $S{A_{i1}}(d)=0$ for $i=2,3$.

A type 0 word W in the defining relation of d appears as a type 1 word in the aliasing relations of the sliced factor S. It is counted as a type 1 word in the sliced wordlength pattern, resulting in $S{A_{i1}}(d)={A_{(i-1)0}}(d)$. A type 1 word W in the defining relation of d appears as a type 1 word in the aliasing relations of two ${s_{j}}$’s and as a type 0 word in the aliasing relation of the third ${s_{j}}$. It is counted as a type 0 word in the sliced wordlength pattern with $S{A_{i0}}(d)={A_{(i+1)1}}(d)$ because the sliced wordlength pattern is defined over the minimum length of a word in the three aliasing relations.

The sliced resolution of d is defined to be the smallest i for which at least one of $S{A_{i0}}(d)$ and $S{A_{i1}}(d)$ is positive. Additional discrimination among designs with the same sliced resolution is covered by the following minimum sliced aberration. The two types of words of the design d are not treated the same. According to the sliced effect hierarchy, a type 1 word in the aliasing relations of the slice factor S is more serious because it involves one ${s_{j}}$. This is consistent with [11]’s result that ranks a type 0 word in the defining relation of d more important than a type 1 because a type 0 word in the defining relation appears as a type 1 word in the aliasing relations of the slice factor S. Therefore, it is more important to require a smaller $S{A_{i1}}(d)$ than a smaller $S{A_{i0}}(d)$ for the same i. We define minimum sliced aberration designs for a four-platform experiment as follows:

Definition 1 (Minimum Sliced Aberration Designs).

Suppose that, for our experiment, two designs ${d^{(1)}}$ and ${d^{(2)}}$ with ${2^{6+2-p}}$ runs are to be compared. Let r be the smallest integer such that $[S{A_{r0}}({d^{(1)}}),S{A_{r1}}({d^{(1)}})]\ne [S{A_{r0}}({d^{(2)}}),S{A_{r1}}({d^{(2)}})]$. If $S{A_{r1}}({d^{(1)}})\lt S{A_{r1}}({d^{(2)}})$, or $S{A_{r1}}({d^{(1)}})=S{A_{r1}}({d^{(2)}})$ but $S{A_{r0}}({d^{(1)}})\lt S{A_{r0}}({d^{(2)}})$, then ${d^{(1)}}$ is said to have less sliced aberration than ${d^{(2)}}$. If there is no design with less sliced aberration than ${d^{(1)}}$, then ${d^{(1)}}$ is called a minimum sliced aberration design.

For our experiment with six design factors, let ${s_{1}}$, ${s_{2}}$, 1, 2, and 3 be the five independent columns of the 32-run ${2^{5}}$ design. Consider two designs:

\[\begin{aligned}{}{d^{(1)}}:S,1,2,3& ,12,13,23\\ {} {d^{(2)}}:S,1,2,3& ,13{s_{2}},23{s_{2}},123{s_{1}}\end{aligned}\]

where $\mathbf{S}=({s_{1}},{s_{2}},{s_{3}}={s_{1}}{s_{2}})$ and the last three columns represent the last three design factors. For example, $4=12$, $5=13$ and $6=23$ in ${d^{(1)}}$ and $4=13{s_{2}}$, $5=23{s_{2}}$ and $6=123{s_{1}}$ in ${d^{(2)}}$. Therefore, the defining relations of ${d^{(1)}}$ and ${d^{(2)}}$ are:

\[\begin{aligned}{}{d^{(1)}}:I& =124=135=236=2345=1346=1256=456\\ {} {d^{(2)}}:I& =134{s_{2}}=235{s_{2}}=1236{s_{1}}=1245=246{s_{3}}=156{s_{3}}\\ {} & =3456{s_{1}}.\end{aligned}\]

The defining relation of ${d^{(1)}}$ has seven words of type 0: four of length three and three of length four. The wordlength pattern of ${d^{(1)}}$ is $W({d^{(1)}})=({[4,0]_{3}},{[3,0]_{4}})$. Multiplying the defining relation of ${d^{(1)}}$ by ${s_{j}}$’s provides the following three aliasing relations of the slice factor S:

\[\begin{aligned}{}{s_{1}}& \hspace{-0.1667em}=\hspace{-0.1667em}124{s_{1}}\hspace{-0.1667em}=\hspace{-0.1667em}135{s_{1}}\hspace{-0.1667em}=\hspace{-0.1667em}236{s_{1}}\hspace{-0.1667em}=\hspace{-0.1667em}2345{s_{1}}\hspace{-0.1667em}=\hspace{-0.1667em}1346{s_{1}}\hspace{-0.1667em}=\hspace{-0.1667em}1256{s_{1}}\hspace{-0.1667em}=\hspace{-0.1667em}456{s_{1}}\\ {} {s_{2}}& \hspace{-0.1667em}=\hspace{-0.1667em}124{s_{2}}\hspace{-0.1667em}=\hspace{-0.1667em}135{s_{2}}\hspace{-0.1667em}=\hspace{-0.1667em}236{s_{2}}\hspace{-0.1667em}=\hspace{-0.1667em}2345{s_{2}}\hspace{-0.1667em}=\hspace{-0.1667em}1346{s_{2}}\hspace{-0.1667em}=\hspace{-0.1667em}1256{s_{2}}\hspace{-0.1667em}=\hspace{-0.1667em}456{s_{2}}\\ {} {s_{3}}& \hspace{-0.1667em}=\hspace{-0.1667em}124{s_{3}}\hspace{-0.1667em}=\hspace{-0.1667em}135{s_{3}}\hspace{-0.1667em}=\hspace{-0.1667em}236{s_{3}}\hspace{-0.1667em}=\hspace{-0.1667em}2345{s_{3}}\hspace{-0.1667em}=\hspace{-0.1667em}1346{s_{3}}\hspace{-0.1667em}=\hspace{-0.1667em}1256{s_{3}}\hspace{-0.1667em}=\hspace{-0.1667em}456{s_{3}}.\end{aligned}\]

Each type 0 word in the defining relation of ${d^{(1)}}$ appears as type 1 word in all three aliasing relations of ${s_{j}}$’s. The sliced wordlength pattern of ${d^{(1)}}$ is $SW({d^{(1)}})=({[0,0]_{2}},{[0,0]_{3}},{[0,4]_{4}},{[0,3]_{5}})$. The defining relation of ${d^{(2)}}$ has one word of type 0 of length four and six words of type 1: four of length four and two of length five. The wordlength pattern of ${d^{(2)}}$ is $W({d^{(2)}})=({[0,0]_{3}},{[1,4]_{4}},{[0,2]_{5}})$. Multiplying the defining relation of ${d^{(2)}}$ by ${s_{j}}$’s provides the following three aliasing relations of the slice factor S:

\[\begin{aligned}{}{s_{1}}& \hspace{0.1667em}=\hspace{0.1667em}134{s_{3}}\hspace{0.1667em}=\hspace{0.1667em}235{s_{3}}\hspace{0.1667em}=\hspace{0.1667em}\underline{1236}\hspace{0.1667em}=\hspace{0.1667em}1245{s_{1}}\hspace{0.1667em}=\hspace{0.1667em}246{s_{2}}\hspace{0.1667em}=\hspace{0.1667em}156{s_{2}}\hspace{0.1667em}=\hspace{0.1667em}\underline{3456}\\ {} {s_{2}}& \hspace{0.1667em}=\hspace{0.1667em}\underline{134}\hspace{0.1667em}=\hspace{0.1667em}\underline{235}\hspace{0.1667em}=\hspace{0.1667em}1236{s_{3}}\hspace{0.1667em}=\hspace{0.1667em}1245{s_{2}}\hspace{0.1667em}=\hspace{0.1667em}246{s_{1}}\hspace{0.1667em}=\hspace{0.1667em}156{s_{1}}\hspace{0.1667em}=\hspace{0.1667em}3456{s_{3}}\\ {} {s_{3}}& \hspace{0.1667em}=\hspace{0.1667em}134{s_{1}}\hspace{0.1667em}=\hspace{0.1667em}235{s_{1}}\hspace{0.1667em}=\hspace{0.1667em}1236{s_{2}}\hspace{0.1667em}=\hspace{0.1667em}1245{s_{3}}\hspace{0.1667em}=\hspace{0.1667em}\underline{246}\hspace{0.1667em}=\hspace{0.1667em}\underline{156}\hspace{0.1667em}=\hspace{0.1667em}3456{s_{2}}.\end{aligned}\]

The type 1 word $134{s_{2}}$ in the defining relation of ${d^{(2)}}$ appears as type 0 word of length three in the aliasing relation of ${s_{2}}$ because ${s_{2}}{s_{2}}=I$ and as type 1 word of length four in the aliasing relations of ${s_{1}}$ and ${s_{3}}$ because ${s_{1}}{s_{2}}={s_{3}}$ and ${s_{3}}{s_{2}}={s_{1}}$. It has length three of type 0 in the sliced wordlength pattern. Similar explanations can be used for the other six words in the defining relation of ${d^{(2)}}$. The sliced wordlength pattern of ${d^{(2)}}$ is $SW({d^{(2)}})=({[0,0]_{2}},{[4,0]_{3}},{[2,0]_{4}},{[0,1]_{5}})$. Between the two designs ${d^{(1)}}$ and ${d^{(2)}}$, 3 is the smallest integer such that ${[0,0]_{3}}({d^{(1)}})\ne {[4,0]_{3}}({d^{(2)}})$. The design ${d^{(1)}}$ has less sliced aberration than ${d^{(2)}}$ because $S{A_{31}}({d^{(1)}})=S{A_{31}}({d^{(2)}})=0$ and $S{A_{30}}({d^{(1)}})=0\lt 4=S{A_{30}}({d^{(2)}})$. We will show later that ${d^{(1)}}$ is a minimum sliced aberration design with six design factors and 32 runs. Here ${d^{(2)}}$ is a minimum aberration design with 32 runs from [11], which is inferior to a minimum sliced aberration design for a four-platform experiment.

Equipped with a suitable design criterion for our experiment, we are now ready to construct the corresponding minimum sliced aberration designs given in Section 2. Theorem 1 below guides the construction of the minimum sliced aberration designs using readily available minimum aberration designs of fewer numbers of factors.

Theorem 1.

A minimum sliced aberration design as defined above corresponds to a defining relation in which all words are type 0.

As a result of Theorem 1, constructing a minimum sliced aberration design entails a search among possible designs for which all the words are type 0 in the defining relation. Therefore, minimizing the number of the shortest length in the sliced wordlength pattern of d with ${2^{6+2-p}}$ runs is equivalent to minimizing the number of the shortest length in the wordlength pattern of a ${2^{6-p}}$ fractional design consisting of design factors only. We use Theorem 1 to generate the minimum sliced aberration design given in Table 6.

The minimum sliced aberration designs in Theorem 1 have a cross array structure similar to product parameter design [10].

Table 6

Minimum sliced aberration design for six design factors with 32 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

Design d	$SW{(d)_{i\ge 4}}$
$S,1,2,3,4=12$, $5=13$, $6=23$	$({[0,4]_{4}},{[0,3]_{5}})$

3.2 Generalization to a General Number of Factors

The aforementioned theoretical results and the construction method work for the general k number of factors by changing six to k. Following the general case of Theorem 1, constructing a sliced minimum aberration design entails search among possible designs for which all the words are type 0 in the defining relation. Therefore, minimizing the number of the shortest length in the sliced wordlength pattern of d with ${2^{k+2-p}}$ runs is equivalent to minimizing the number of the shortest length in the wordlength pattern of a ${2^{k-p}}$ fractional design consisting of design factors only. For a four-platform experiment, we use Theorem 1 to provide sliced minimum aberration designs with 16, 32, and 64 runs in Tables 7-9, respectively.

Table 7

Sliced minimum aberration designs with 16 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

k	Design d	$SW{(d)_{i\ge 4}}$
3	S, 1, 2, 12	$({[0,1]_{4}})$

Table 8

Sliced minimum aberration designs with 32 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

k	Design d	$SW{(d)_{i\ge 4}}$
4	S, 1, 2, 3, 123	$({[0,0]_{4}},{[0,1]_{5}})$
5	S, 1, 2, 3, 12, 13	$({[0,2]_{4}},{[0,1]_{5}})$
6	S, 1, 2, 3, 12, 13, 23	$({[0,4]_{4}},{[0,3]_{5}})$
7	S, 1, 2, 3, 12, 13, 23, 123	$({[0,7]_{4}},{[0,7]_{5}},{[0,0]_{6}},{[0,0]_{7}},{[0,1]_{8}})$

Table 9

Sliced minimum aberration designs with 64 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

k	Design d	$SW{(d)_{i\ge 4}}$
5	S, 1, 2, 3, 4, 1234	$({[0,0]_{4}},{[0,0]_{5}},{[0,1]_{6}})$
6	S, 1, 2, 3, 4, 123, 124	$({[0,0]_{4}},{[0,3]_{5}})$
7	S, 1, 2, 3, 4, 123, 124, 134	$({[0,0]_{4}},{[0,7]_{5}})$
8	S, 1, 2, 3, 4, 123, 124, 134, 234	$({[0,0]_{4}},{[0,14]_{5}},{[0,0]_{6}},{[0,0]_{7}}$,${[0,0]_{8}}$, ${[0,1]_{9}})$
9	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234	$({[0,4]_{4}},{[0,14]_{5}},{[0,8]_{6}},{[0,0]_{7}}$, ${[0,4]_{8}}$, ${[0,1]_{9}})$
10	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34	$({[0,8]_{4}},{[0,18]_{5}},{[0,16]_{6}},{[0,8]_{7}}$, ${[0,8]_{8}}$, ${[0,5]_{9}})$
11	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34, 24	$({[0,12]_{4}},{[0,26]_{5}},{[0,28]_{6}},{[0,24]_{7}}$, ${[0,20]_{8}}$, ${[0,13]_{9}}$, ${[0,4]_{10}})$
12	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34, 24, 14	$({[0,16]_{4}},{[0,39]_{5}},{[0,48]_{6}},{[0,48]_{7}}$, ${[0,48]_{8}}$, ${[0,39]_{9}}$, ${[0,16]_{10}}$,
		${[0,0]_{11}}$, ${[0,0]_{12}}$, ${[0,1]_{13}})$
13	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34, 24, 14, 23	$({[0,22]_{4}},{[0,55]_{5}},{[0,72]_{6}},{[0,96]_{7}}$, ${[0,116]_{8}}$, ${[0,87]_{9}}$, ${[0,40]_{10}}$,
		${[0,16]_{11}}$, ${[0,6]_{12}}$, ${[0,1]_{13}})$
14	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34, 24, 14, 23, 13	$({[0,28]_{4}},{[0,77]_{5}},{[0,112]_{6}},{[0,168]_{7}}$, ${[0,232]_{8}}$, ${[0,203]_{9}}$, ${[0,112]_{10}}$,
		${[0,56]_{11}}$, ${[0,28]_{12}}$, ${[0,7]_{13}})$
15	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34, 24, 14, 23, 13, 12	$({[0,35]_{4}},{[0,105]_{5}},{[0,168]_{6}},{[0,280]_{7}}$, ${[0,435]_{8}}$, ${[0,435]_{9}}$, ${[0,280]_{10}}$,
		${[0,168]_{11}}$, ${[0,105]_{12}}$, ${[0,35]_{13}}$, ${[0,0]_{14}}$, ${[0,0]_{15}}$, ${[0,1]_{16}})$

4 Results

In this section, we discuss the results of the application of our design to the email campaign under consideration.

4.1 Summary of the Design

Using the design criterion in Section 3, we create ${2^{3}}$ versions to perform the multivariate testing. Each platform has eight versions and we generate them based on the criterion in Section 3 to perform this multivariate testing. This design is a ${2^{6+2-3}}$ minimum sliced aberration design for four platforms. By Theorem 1, the sub design for each platform is a ${2^{6-3}}$ minimum aberration design with the three generators 124, 135, and 236 [10, p.252]. The sliced word length pattern is $({[0,4]_{4}},{[0,3]_{5}})$. More details of this design were discussed in Section 2.

4.2 Data Display and Summary

The response variable in the study is the email open rate. As the data are aggregated across users exposed to each version, how the response variable varies within a version is unknown to us. We use the Lenth’s method [6] to identify significant factors, which is specifically designed for testing effects in experiments for which variance estimates are not available.

Table 10 includes some descriptive statistics of the study. The total number of recipients is 139033, which are divided into roughly equal eight sets receiving the eight versions of the email. Table 11 is a two-way table providing the number of opened emails in each combination of operating systems and email versions.

Table 10

Recipients and opened emails in each version.

	V1	V2	V3	V4	V5	V6	V7	V8
Recipients	17295	16920	17452	17306	17362	17558	17582	17558
Opened	1458	1436	1446	1178	1337	1195	1336	1234

Table 11

Two-way frequency tables: operating system vs. version.

	V1	V2	V3	V4	V5	V6	V7	V8
Android	100	61	100	61	75	58	87	55
iOS	243	236	242	172	210	180	204	226
Windows	804	803	766	636	704	637	741	662
macOS	128	141	116	130	139	135	118	100

4.3 Identification of Platform-Specific Significant Effects

Since there are eight versions for each operating system (platform), seven effects of design factors can be estimated per platform. Table 12 includes the aliased effects within each platform. For convenience, we label each set of aliased effects.

Table 12

Aliased effects.


Labels	Aliased effects
A	$\mathbf{1}=\mathbf{24}=\mathbf{35}=\mathbf{346}=\mathbf{256}=\mathbf{1236}=\mathbf{1456}=\mathbf{12345}$
B	$\mathbf{2}=\mathbf{14}=\mathbf{36}=\mathbf{345}=\mathbf{156}=\mathbf{2456}=\mathbf{1235}=\mathbf{12346}$
C	$\mathbf{3}=\mathbf{15}=\mathbf{26}=\mathbf{245}=\mathbf{146}=\mathbf{1234}=\mathbf{3456}=\mathbf{12356}$
D	$\mathbf{4}=\mathbf{12}=\mathbf{56}=\mathbf{235}=\mathbf{136}=\mathbf{1345}=\mathbf{2346}=\mathbf{12456}$
E	$\mathbf{5}=\mathbf{13}=\mathbf{46}=\mathbf{126}=\mathbf{234}=\mathbf{1245}=\mathbf{2356}=\mathbf{13456}$
F	$\mathbf{6}=\mathbf{23}=\mathbf{45}=\mathbf{134}=\mathbf{125}=\mathbf{1246}=\mathbf{1356}=\mathbf{23456}$
G	$\mathbf{16}=\mathbf{34}=\mathbf{25}=\mathbf{145}=\mathbf{246}=\mathbf{356}=\mathbf{123}=\mathbf{123456}$

In the sliced factorial design framework, slices are used for analyzing data of the four operating systems together. Tables 13 to 16 include the effects of design factors that are estimated using the design in Table 2 on each platform. The Lenth’s method is used to test the significance of effects and to report the p-values. The same method is done for each operating system to estimate the effect of design factors within the platform.

Table 13

Results for Android.

Effect	Estimate	P-value
A	2.07e-4	$\gt 0.2$
B	$-1.80$e-3	0.015	∗
C	$-5.84$e-4	0.158
D	8.13e-5	$\gt 0.2$
E	$-3.44$e-4	$\gt 0.2$
F	$-5.38$e-4	0.18
G	$-3.42$e-6	$\gt 0.2$

Note: Any p-value less than 0.1 is considered statistically significant which is indicated by ∗.

Table 14

Results for iOS.

Effect	Estimate	P-value
A	1.78e-4	$\gt 0.2$
B	$-1.15$e-3	0.074	∗
C	6.03e-4	$\gt 0.2$
D	$-5.16$e-4	$\gt 0.2$
E	$-1.14$e-4	$\gt 0.2$
F	$-2.71$e-3	0.014	∗
G	$-2.68$e-4	$\gt 0.2$

Note: Any p-value less than 0.1 is considered statistically significant which is indicated by ∗.

Table 15

Results for Windows.

Effect	Estimate	P-value
A	2.07e-3	$\gt 0.2$
B	$-3.72$e-3	$\gt 0.2$
C	1.11e-3	$\gt 0.2$
D	$-2.57$e-3	$\gt 0.2$
E	$-3.60$e-3	$\gt 0.2$
F	$-4.95$e-3	0.183
G	$-1.51$e-3	$\gt 0.2$

Note: Any p-value less than 0.1 is considered statistically significant which is indicated by ∗.

Table 16

Results for macOS.

Effect	Estimate	P-value
A	7.76e-5	$\gt 0.2$
B	2.30e-4	$\gt 0.2$
C	$-1.17$e-5	$\gt 0.2$
D	$-1.10$e-3	0.061	∗
E	$-3.66$e-4	$\gt 0.2$
F	3.46e-4	$\gt 0.2$
G	$-6.36$e-4	0.195

Note: Any p-value less than 0.1 is considered statistically significant which is indicated by ∗.

Comparing Tables 13-16 indicates that effect B is significant on ${P_{1}}$ (Android), two effects B and F are significant on ${P_{2}}$ (iOS), and effect D is significant on ${P_{4}}$ (macOS) although no effect is significant on ${P_{3}}$ (Windows). Table 12 reveals that effect B is the sum of the following aliased effects $\mathbf{2},\mathbf{14},\mathbf{36},\mathbf{345},\mathbf{156},\mathbf{2456},\mathbf{1235},\mathbf{12346}$. As the slices follow the effect hierarchy principle, B can be viewed to represent effect $\mathbf{2}$ by assuming that all higher-order aliased effects are negligible. The main takeaway for the Android system from Table 13 is that using the direct subject line will likely decrease the open rate and other factors are not expected to decrease or increase the metric. Similar arguments can be made for the other three platforms ${P_{2}}$, ${P_{3}}$, and ${P_{4}}$. For ${P_{2}}$ from Table 14, using the direct subject line and displaying content in paragraph is expected to decrease the open rate. The remaining factors will not likely impact the metric in any way. The main takeaway for the Windows system from Table 15 is that no factor will likely affect the metric. The main takeaway for the macOS system from Table 16 is that including header image is expected to decrease the open rate and other factors are not expected to have any effect on the response. In summary, a comparison of Tables 13 to 16 indicates that different sets of design factors work differently for these multiple operating systems.

4.4 Calculation of the Factorial Effects for the Multiple Operating Systems

From Table 11, the open rate for Windows is extremely larger than that of other platforms. As no factor will affect the open rate for the Windows system, the operating system might impact the metric. It is important to figure out whether different operating systems have significant interaction with the platform-specific significant effects. To compare the results of the four platforms, the complete design d is used to estimate the slice factor and its interaction with the platform-specific significant effects. The slice factor S is represented by $\mathbf{S}=({s_{1}},{s_{2}},{s_{3}})$ with ${s_{3}}={s_{1}}{s_{2}}$. Table 17 describes the relation between ${P_{i}}$’s and $({s_{1}},{s_{2}},{s_{3}})$.

Table 17

Relation between ${P_{j}}$ and $({s_{1}},{s_{2}},{s_{3}})$.

${s_{1}}$	${s_{2}}$	${s_{3}}={s_{1}}{s_{2}}$	Platform
−	−	+	${P_{1}}$
−	+	−	${P_{2}}$
+	−	−	${P_{3}}$
+	+	+	${P_{4}}$

We use Lenth’s method to test the significance of the effects. Because the slice factor S is four-level, the effect S contains three effects ${s_{1}}$, ${s_{2}}$ and ${s_{3}}$. Take the interaction between S and $\mathbf{2}$ for example. It includes three effects $\mathbf{2}{s_{1}}$, $\mathbf{2}{s_{2}}$, and $\mathbf{2}{s_{3}}$. Table 18 includes the 12 effects for S and the interactions between the slice factor S and three platform-specific significant effects $\mathbf{2}$, $\mathbf{4}$, and $\mathbf{6}$. We now focus on the first three effects ${s_{1}}$, ${s_{2}}$, and ${s_{3}}$ in Table 18. The magnitude of the effect of Android is 2.41e-2, the counterpart of the effect of iOS is 0.79e-2, the counterpart of Windows is 5.01e-2, and the counterpart of macOS is 1.81e-2. The magnitude of the effect of Windows is about two times larger than the effect of Android, six times larger than the effect of iOS, and three times larger than the effect of macOS, which explains why the open rate for Windows is extremely larger than that of other platforms. Further, the magnitude of the effects ${s_{1}}$, ${s_{2}}$, and ${s_{3}}$ are around ten to hundred times larger than the effects of design factors. This finding is consistent with the sliced effect hierarchy principle. The other effects except for ${s_{1}}$, ${s_{2}}$ and ${s_{3}}$ in Table 18 uncover the way the effects $\mathbf{2}$, $\mathbf{4}$, and $\mathbf{6}$ interact with the slice factor S, respectively, meaning how these effects differentially affect the open rate from ${P_{1}}$ to ${P_{4}}$. Only $\mathbf{6}{s_{3}}$ is significant, implying that the differential effect of $\mathbf{6}$ on the open rate from ${P_{2}}$ and ${P_{3}}$ to ${P_{1}}$ and ${P_{4}}$ is significant. As a result, one version that adopts displaying content in paragraph is expected to decrease the open rate in ${P_{2}}$ and ${P_{3}}$. However, the same version will likely increase the metric in ${P_{1}}$ and ${P_{4}}$.

Table 18

Slice factor behavior.

Effect	Estimate	P-value
${s_{1}}$	1.60e-2	$\lt 0.001$	∗
${s_{2}}$	$-1.30$e-2	$\lt 0.001$	∗
${s_{3}}$	$-2.11$e-2	$\lt 0.001$	∗
$\mathbf{2}{s_{1}}$	$-1.34$e-4	$\gt 0.2$
$\mathbf{2}{s_{2}}$	1.15e-3	0.193
$\mathbf{2}{s_{3}}$	8.24e-4	$\gt 0.2$
$\mathbf{4}{s_{1}}$	$-8.09$e-4	$\gt 0.2$
$\mathbf{4}{s_{2}}$	2.18e-4	$\gt 0.2$
$\mathbf{4}{s_{3}}$	5.17e-4	$\gt 0.2$
$\mathbf{6}{s_{1}}$	$-3.39$e-4	$\gt 0.2$
$\mathbf{6}{s_{2}}$	7.82e-4	$\gt 0.2$
$\mathbf{6}{s_{3}}$	1.87e-3	0.046	∗

Note: Any p-value less than 0.1 is considered statistically significant which is indicated by ∗.

In conclusion, the application of the sliced design in Section 3 to the email campaign shows that different sets of design factors would increase the open rate for each of the four operating systems. From Tables 13 to 16, each design factor significant on the corresponding platform has a negative effect, which means factors $\mathbf{2}$, $\mathbf{4}$, and $\mathbf{6}$ should be at the − level. See Table 1 for the information of the design factors. Because we do not know in advance what platform a particular user will use to open the email, it is desirable to choose a version that has factors $\mathbf{2}$, $\mathbf{4}$, and $\mathbf{6}$ at the − level. This hypothesis should be tested by looking at the interactions between S and the three design factors $\mathbf{2}$, $\mathbf{4}$, and $\mathbf{6}$ from the complete design d. Table 18 indicates the effect of $\mathbf{6}{s_{3}}$ is significant, implying that $\mathbf{6}$ at the − level is expected to increase the open rate in ${P_{2}}$ and ${P_{3}}$ but would decrease the metric in ${P_{1}}$ and ${P_{4}}$. In order to test this hypothesis, we fit a regression model using the open rate as the response and ${s_{1}}$, ${s_{2}}$, ${s_{3}}$, $\mathbf{2}$, $\mathbf{4}$, $\mathbf{6}$, and $\mathbf{6}{s_{3}}$ as the covariates. The average open rate is estimated by

(4.1)

\[ \begin{aligned}{}& \text{Average Open Rate}=\\ {} & \hspace{1em}0.0163+0.0080{s_{1}}-0.0065{s_{2}}-0.0105{s_{3}}\\ {} & \hspace{1em}-0.0008\mathbf{B}-0.0005\mathbf{D}-0.0010\mathbf{F}+0.0009\mathbf{F}{s_{3}}.\end{aligned}\]

We use (4.1) to compare the average open rate between the two versions: design factors $\mathbf{2}$, $\mathbf{4}$, $\mathbf{6}$ are at − level and design factors $\mathbf{2}$, $\mathbf{4}$ are at − level but $\mathbf{6}$ is at + level for each of the four platforms. The result is given in Table 19.

Table 19

Comparison between two versions: design factors $\mathbf{2}$, $\mathbf{4}$, $\mathbf{6}$ at − level vs. design factors $\mathbf{2}$, $\mathbf{4}$ at − level but $\mathbf{6}$ at + level for each of the four platforms.

	Version A	Version B
	$(\mathbf{B},\mathbf{D},\mathbf{F})=(-,-,-)$	$(\mathbf{B},\mathbf{D},\mathbf{F})=(-,-,+)$
${P_{1}}$	0.00566	0.00556
${P_{2}}$	0.01556	0.01173
${P_{3}}$	0.04464	0.04081
${P_{4}}$	0.00867	0.00858

Table 19 indicates that the average open rate of version A is larger than that of version B for each of the four operating systems. To conclude, we recommend the following changes for the network company to increase the open rate: (i) use the indirect subject line, (ii) drop the header image, and (iii) display content in bullet points. These statistics-guided recommendations can help the network company optimize email campaigns and increase the ROI of its marketing efforts.

5 Discussion and Conclusions

Email marketing is a big business (US spend of 2.84 billion dollars in 2020). The average email open rate is 21% and welcome emails have a much higher open rate of about 82%. Many customers seek emails (e.g. for coupons and sales) and targeted and personalized emails are known to have higher open rates (50%) and more effective. [9] find that adding the name of the message recipient to the email’s subject line increased the probability of the recipient opening from 9% to 11% and an increase in sales leads from 0.39% to 0.51%.

We successfully applied a sliced design solution to an industry email campaign with four platforms. This application revealed interesting insights into how different sets of design factors work differently for the four operating systems. We identified the best version for the four platforms. Our statistics-guided recommendations can help the network company and the industry, in general, optimize email campaigns and increase the ROI of marketing efforts.

There are many possible directions for future explorations. It will be of interest to apply the proposed method for marketing campaigns with multiple popular social networking services: Facebook, YouTube, WhatsApp, and Instagram. Another possibility is to apply the method to general multivariate testing problems in industry, such as web design, parameter tuning of deep learning and test and learn programs in insurance and finance.

Appendix A Review of Factorial Designs at Two Levels

We provide a brief introduction of two-level factorial designs for readers who are unfamiliar with the topic. The material is adopted by [10] and [8]. A full factorial design requires all ${2^{k}}$ level combinations of the k factors. For large k, a fraction of a full factorial design, called a fractional factorial design denoted by ${2^{k-p}}$, is often used. In general, ${2^{k-p}}$ denotes a ${(\frac{1}{2})^{p}}$ fraction of a ${2^{k}}$ factorial design. The optimal fraction can be selected according to resolution and aberration based criteria [10].

To construct a ${2^{k-p}}$ fractional factorial design, first let $\mathbf{1}$, …, $\mathbf{k}-\mathbf{p}$ denote the $k-p$ independent columns that generate the ${2^{k-p}}$ factorial design. The remaining p columns, $\mathbf{k}-\mathbf{p}+\mathbf{1}$, …, k can be generated as interactions of the first $k-p$ columns. Choice of these p columns determines the generators and the defining relation of the design. The defining relation of the design consists of the identity element I plus the group formed by the p generators (${2^{p-1}}$ words in the group). For a ${2^{k-p}}$ design, let ${A_{i}}$ be the number of words of length i in its defining relation. The wordlength pattern of the design is

(A.1)

\[ W=({A_{3}},\dots ,{A_{k}}).\]

The resolution of a ${2^{k-p}}$ design is defined to be the smallest r such that ${A_{r}}\ge 1$ which is the length of the shortest word in the defining relation. In general, a design of resolution R is one in which no p-factor effect is aliased with any other effect containing less than $R-p$ factors.

The maximum resolution design is the ${2^{k-p}}$ design with the highest resolution. However, resolution is not always enough to select the best design. Consider two ${2^{7-2}}$ designs ${d_{1}}:\mathbf{I}=\mathbf{4567}=\mathbf{12346}=\mathbf{12357}$ and ${d_{2}}:\mathbf{I}=\mathbf{1236}=\mathbf{1457}=\mathbf{234567}$. The word $\mathbf{12357}$ is simply obtained by multiplying the two generators $\mathbf{4567}$ and $\mathbf{12346}$ of ${d_{1}}$. The defining relation of ${d_{2}}$ is obtained by a similar mechanism. The wordlength pattern $W({d_{1}})=(0,1,2,0,0)$ is different from $W({d_{2}})=(0,2,0,1,0)$ although they both have resolution IV. Since ${d_{1}}$ has one word of length 4, it has three aliased pairs of two-factor interactions ($\mathbf{45}=\mathbf{67}$, $\mathbf{46}=\mathbf{57}$, $\mathbf{47}=\mathbf{56}$). In contrast, ${d_{2}}$ has six aliased pairs of two-factor interactions as it has two words of length 4 ($\mathbf{12}=\mathbf{36}$, $\mathbf{13}=\mathbf{26}$, $\mathbf{16}=\mathbf{23}$, $\mathbf{14}=\mathbf{57}$, $\mathbf{15}=\mathbf{47}$, $\mathbf{17}=\mathbf{45}$). Suppose two ${2^{k-p}}$ designs ${d_{g}}$ and ${d_{h}}$ are to be compared. Let r be the smallest integer such that ${A_{r}}({d_{g}})\ne {A_{r}}({d_{h}})$. Design ${d_{g}}$ is said to have less aberration if ${A_{r}}({d_{g}})\lt {A_{r}}({d_{h}})$. If there is no design with less aberration than ${d_{g}}$, then ${d_{g}}$ is called the minimum aberration design [10]. For a given pair of k and p, a minimum aberration design always exists. The minimum aberration criterion can be used to rank any two designs.

Appendix B Proof of Theorem 1

We restate the theorem below.

Theorem.

A minimum sliced aberration design as defined above corresponds to a defining relation in which all words are type 0.

Proof.

It suffices to prove that any defining relation with at least a type 1 word is inferior to a defining relation in which all words are type 0. Any design with a type 1 word has at least one ${s_{j}}$ involved in the generators of the design. We prove for the case where one generator uses one ${s_{j}}$. The proof can be easily generalized to the case with more than one generator using ${s_{j}}$’s. Consider a design d with ${2^{k+2-p}}$ runs that has $p-1$ generators not involving ${s_{j}}$’s and one generator g involving ${s_{j}}$. It suffices to prove that a design with all generators excluding ${s_{j}}$’s is better according to the minimum sliced aberration criterion. Form a new design ${d_{\text{new}}}$ by removing ${s_{j}}$ from g. Call the new generator ${g_{\text{new}}}$. As ${s_{j}}$ only appears in g, the product of ${g_{\text{new}}}$ with other generators will result in type 0 words in the defining relation of ${d_{\text{new}}}$. Therefore, the length of all type 0 words formed by ${g_{\text{new}}}$ in ${d_{\text{new}}}$ has decreased by one compared with the length of all type 1 words formed by g in d. As a result, all these words formed by ${g_{\text{new}}}$ of ${d_{\text{new}}}$ appear as type 1 words in defining relations of ${s_{j}}$’s and are recorded with higher length in the sliced wordlength pattern compared to the ones in d. The lengths of all other words of ${d_{\text{new}}}$ not formed by ${g_{\text{new}}}$ remain the same as the ones in d not formed by g in their sliced wordlength patterns. Therefore, ${d_{\text{new}}}$ is better according to the minimum sliced aberration design. □

Acknowledgements

The authors thank Editor, Associate Editor and referees for useful suggestions and comments that improved the article.

References

[1]

Addelman, S. Techniques for constructing fractional replicate plans. Journal of the American Statistical Association 58(301) 45–71 (1963). MR0150901

[2]

Goldfarb, A. and Tucker, C. Online display advertising: Targeting and obtrusiveness. Marketing Science 30(3) 389–404 (2011).

[3]

Haizler, T. and Steinberg, D. M. Factorial designs for online experiments. Technometrics 63(1) 1–12 (2021). https://doi.org/10.1080/00401706.2019.1701556. MR4205687

[4]

Kannan, P., Reinartz, W. and Verhoef, P. C. (2016). The path to purchase and attribution modeling: Introduction to special section.

[5]

Lambrecht, A. and Tucker, C. When does retargeting work? information specificity in online advertising. Journal of Marketing research 50(5) 561–576 (2013).

[6]

Lenth, R. V. Quick and easy analysis of unreplicated factorials. Technometrics 31(4) 469–473 (1989). https://doi.org/10.2307/1269997. MR1041567

[7]

Ryan, D. Understanding digital marketing: marketing strategies for engaging the digital generation. Kogan Page Publishers (2016).

[8]

Sadeghi, S., Chien, P. and Arora, N. Sliced designs for multi-platform online experiments. Technometrics 62(3) 387–402 (2020). https://doi.org/10.1080/00401706.2019.1647288. MR4125504

[9]

Sahni, N. S., Wheeler, S. C. and Chintagunta, P. Personalization in email marketing: The role of noninformative advertising content. Marketing Science 37(2) 236–258 (2018).

[10]

Wu, C. J. and Hamada, M. S. Experiments: Planning, Analysis, and Optimization, Volume 552. John Wiley & Sons (2011). MR2583259

[11]

Wu, C. J. and Zhang, R. Minimum aberration designs with two-level and four-level factors. Biometrika 80. 203–209 (1993). https://doi.org/10.1093/biomet/80.1.203. MR1225225

Reading mode

Table of contents

1 Introduction
2 Motivating Example
3 Sliced Factorial Designs with Four Platforms
4 Results
5 Discussion and Conclusions
Appendix A Review of Factorial Designs at Two Levels
Appendix B Proof of Theorem 1
Acknowledgements
References

Open access article under the CC BY license.

Keywords

Multivariate testing Factorial designs Sliced designs Email campaign

Metrics

since December 2021

281

Article info
views

198

Full article
views

144

PDF
downloads

XML
downloads

RSS

Tables
19
Theorems
2

Table 1

Six binary design factors for an industrial email blast.

Table 2

Description of eight versions of the email study.

Table 3

Versions one, two, three, and four of the email study.

Table 4

Versions five, six, seven, and eight of the email study.

Table 5

Rule for replacing any three columns of the form $(a,b,ab)$ by the 4-level column S.

Table 6

Minimum sliced aberration design for six design factors with 32 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

Table 7

Sliced minimum aberration designs with 16 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

Table 8

Sliced minimum aberration designs with 32 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

Table 9

Sliced minimum aberration designs with 64 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

Table 10

Recipients and opened emails in each version.

Table 11

Two-way frequency tables: operating system vs. version.

Table 12

Aliased effects.

Table 13

Results for Android.

Table 14

Results for iOS.

Table 15

Results for Windows.

Table 16

Results for macOS.

Table 17

Relation between ${P_{j}}$ and $({s_{1}},{s_{2}},{s_{3}})$.

Table 18

Slice factor behavior.

Table 19

Theorem 1.

Theorem.

Table 1

Six binary design factors for an industrial email blast.

Factor	+	−
1 Thumbnail	Yes	No
1 Thumbnail

2 Subject Line	Direct	Indirect
2 Subject Line
	Juniper Is a Leader...	Take me to your Leader
3 Asset Type	With	Without
3 Asset Type
	Report:
4 Header Image	Including	No
4 Header Image

5 Preview Text	No	Including
6 Content Display	Paragraph in the body	Bullet Points
6 Content Display

Table 2

Description of eight versions of the email study.

Version	Attribute
Version	Thumbnail	Subject Line	Asset Type	Header Image	Preview Text	Content Display
1	No	Indirect	Without	No	Including	Bullet Points
2	With	Direct	With	No	Including	Bullet Points
3	With	Indirect	Without	Including	No	Bullet Points
4	No	Direct	Without	Including	Including	Paragraph
5	No	Indirect	With	No	No	Paragraph
6	With	Direct	Without	No	No	Paragraph
7	With	Indirect	With	Including	Including	Paragraph
8	No	Direct	With	Including	No	Bullet Points

Table 3

Versions one, two, three, and four of the email study.


Version one:	Version two:
Take me to your Leader	Report: Juniper is a Leader...


Version three:	Version four:
Take me to your Leader	Juniper Is a Leader...

Table 4

Versions five, six, seven, and eight of the email study.


Version five:	Version six:
Report: Take me to your Leader	Report: Juniper is a Leader...


Version seven:	Version eight:
Report: Take me to your Leader	Report: Juniper is a Leader...

Table 5

Rule for replacing any three columns of the form $(a,b,ab)$ by the 4-level column S.

a	b	$ab$		4-level column S
0	0	0		0
0	1	1	⟶	1
1	0	1		2
1	1	0		3

Table 6

Minimum sliced aberration design for six design factors with 32 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

Design d	$SW{(d)_{i\ge 4}}$
$S,1,2,3,4=12$, $5=13$, $6=23$	$({[0,4]_{4}},{[0,3]_{5}})$

Table 7

Sliced minimum aberration designs with 16 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

k	Design d	$SW{(d)_{i\ge 4}}$
3	S, 1, 2, 12	$({[0,1]_{4}})$

Table 8

Sliced minimum aberration designs with 32 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

k	Design d	$SW{(d)_{i\ge 4}}$
4	S, 1, 2, 3, 123	$({[0,0]_{4}},{[0,1]_{5}})$
5	S, 1, 2, 3, 12, 13	$({[0,2]_{4}},{[0,1]_{5}})$
6	S, 1, 2, 3, 12, 13, 23	$({[0,4]_{4}},{[0,3]_{5}})$
7	S, 1, 2, 3, 12, 13, 23, 123	$({[0,7]_{4}},{[0,7]_{5}},{[0,0]_{6}},{[0,0]_{7}},{[0,1]_{8}})$

Table 9

Sliced minimum aberration designs with 64 runs, $\mathbf{S}=({s_{1}},{s_{2}},{s_{1}}{s_{2}})$.

k	Design d	$SW{(d)_{i\ge 4}}$
5	S, 1, 2, 3, 4, 1234	$({[0,0]_{4}},{[0,0]_{5}},{[0,1]_{6}})$
6	S, 1, 2, 3, 4, 123, 124	$({[0,0]_{4}},{[0,3]_{5}})$
7	S, 1, 2, 3, 4, 123, 124, 134	$({[0,0]_{4}},{[0,7]_{5}})$
8	S, 1, 2, 3, 4, 123, 124, 134, 234	$({[0,0]_{4}},{[0,14]_{5}},{[0,0]_{6}},{[0,0]_{7}}$,${[0,0]_{8}}$, ${[0,1]_{9}})$
9	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234	$({[0,4]_{4}},{[0,14]_{5}},{[0,8]_{6}},{[0,0]_{7}}$, ${[0,4]_{8}}$, ${[0,1]_{9}})$
10	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34	$({[0,8]_{4}},{[0,18]_{5}},{[0,16]_{6}},{[0,8]_{7}}$, ${[0,8]_{8}}$, ${[0,5]_{9}})$
11	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34, 24	$({[0,12]_{4}},{[0,26]_{5}},{[0,28]_{6}},{[0,24]_{7}}$, ${[0,20]_{8}}$, ${[0,13]_{9}}$, ${[0,4]_{10}})$
12	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34, 24, 14	$({[0,16]_{4}},{[0,39]_{5}},{[0,48]_{6}},{[0,48]_{7}}$, ${[0,48]_{8}}$, ${[0,39]_{9}}$, ${[0,16]_{10}}$,
		${[0,0]_{11}}$, ${[0,0]_{12}}$, ${[0,1]_{13}})$
13	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34, 24, 14, 23	$({[0,22]_{4}},{[0,55]_{5}},{[0,72]_{6}},{[0,96]_{7}}$, ${[0,116]_{8}}$, ${[0,87]_{9}}$, ${[0,40]_{10}}$,
		${[0,16]_{11}}$, ${[0,6]_{12}}$, ${[0,1]_{13}})$
14	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34, 24, 14, 23, 13	$({[0,28]_{4}},{[0,77]_{5}},{[0,112]_{6}},{[0,168]_{7}}$, ${[0,232]_{8}}$, ${[0,203]_{9}}$, ${[0,112]_{10}}$,
		${[0,56]_{11}}$, ${[0,28]_{12}}$, ${[0,7]_{13}})$
15	S, 1, 2, 3, 4, 123, 124, 134, 234, 1234, 34, 24, 14, 23, 13, 12	$({[0,35]_{4}},{[0,105]_{5}},{[0,168]_{6}},{[0,280]_{7}}$, ${[0,435]_{8}}$, ${[0,435]_{9}}$, ${[0,280]_{10}}$,
		${[0,168]_{11}}$, ${[0,105]_{12}}$, ${[0,35]_{13}}$, ${[0,0]_{14}}$, ${[0,0]_{15}}$, ${[0,1]_{16}})$

Table 10

Recipients and opened emails in each version.

	V1	V2	V3	V4	V5	V6	V7	V8
Recipients	17295	16920	17452	17306	17362	17558	17582	17558
Opened	1458	1436	1446	1178	1337	1195	1336	1234

Table 11

Two-way frequency tables: operating system vs. version.

	V1	V2	V3	V4	V5	V6	V7	V8
Android	100	61	100	61	75	58	87	55
iOS	243	236	242	172	210	180	204	226
Windows	804	803	766	636	704	637	741	662
macOS	128	141	116	130	139	135	118	100

Table 12

Aliased effects.


Labels	Aliased effects
A	$\mathbf{1}=\mathbf{24}=\mathbf{35}=\mathbf{346}=\mathbf{256}=\mathbf{1236}=\mathbf{1456}=\mathbf{12345}$
B	$\mathbf{2}=\mathbf{14}=\mathbf{36}=\mathbf{345}=\mathbf{156}=\mathbf{2456}=\mathbf{1235}=\mathbf{12346}$
C	$\mathbf{3}=\mathbf{15}=\mathbf{26}=\mathbf{245}=\mathbf{146}=\mathbf{1234}=\mathbf{3456}=\mathbf{12356}$
D	$\mathbf{4}=\mathbf{12}=\mathbf{56}=\mathbf{235}=\mathbf{136}=\mathbf{1345}=\mathbf{2346}=\mathbf{12456}$
E	$\mathbf{5}=\mathbf{13}=\mathbf{46}=\mathbf{126}=\mathbf{234}=\mathbf{1245}=\mathbf{2356}=\mathbf{13456}$
F	$\mathbf{6}=\mathbf{23}=\mathbf{45}=\mathbf{134}=\mathbf{125}=\mathbf{1246}=\mathbf{1356}=\mathbf{23456}$
G	$\mathbf{16}=\mathbf{34}=\mathbf{25}=\mathbf{145}=\mathbf{246}=\mathbf{356}=\mathbf{123}=\mathbf{123456}$

Table 13

Results for Android.

Effect	Estimate	P-value
A	2.07e-4	$\gt 0.2$
B	$-1.80$e-3	0.015	∗
C	$-5.84$e-4	0.158
D	8.13e-5	$\gt 0.2$
E	$-3.44$e-4	$\gt 0.2$
F	$-5.38$e-4	0.18
G	$-3.42$e-6	$\gt 0.2$

Note: Any p-value less than 0.1 is considered statistically significant which is indicated by ∗.

Table 14

Results for iOS.

Effect	Estimate	P-value
A	1.78e-4	$\gt 0.2$
B	$-1.15$e-3	0.074	∗
C	6.03e-4	$\gt 0.2$
D	$-5.16$e-4	$\gt 0.2$
E	$-1.14$e-4	$\gt 0.2$
F	$-2.71$e-3	0.014	∗
G	$-2.68$e-4	$\gt 0.2$

Note: Any p-value less than 0.1 is considered statistically significant which is indicated by ∗.

Table 15

Results for Windows.

Effect	Estimate	P-value
A	2.07e-3	$\gt 0.2$
B	$-3.72$e-3	$\gt 0.2$
C	1.11e-3	$\gt 0.2$
D	$-2.57$e-3	$\gt 0.2$
E	$-3.60$e-3	$\gt 0.2$
F	$-4.95$e-3	0.183
G	$-1.51$e-3	$\gt 0.2$

Note: Any p-value less than 0.1 is considered statistically significant which is indicated by ∗.

Table 16

Results for macOS.

Effect	Estimate	P-value
A	7.76e-5	$\gt 0.2$
B	2.30e-4	$\gt 0.2$
C	$-1.17$e-5	$\gt 0.2$
D	$-1.10$e-3	0.061	∗
E	$-3.66$e-4	$\gt 0.2$
F	3.46e-4	$\gt 0.2$
G	$-6.36$e-4	0.195

Note: Any p-value less than 0.1 is considered statistically significant which is indicated by ∗.

Table 17

Relation between ${P_{j}}$ and $({s_{1}},{s_{2}},{s_{3}})$.

${s_{1}}$	${s_{2}}$	${s_{3}}={s_{1}}{s_{2}}$	Platform
−	−	+	${P_{1}}$
−	+	−	${P_{2}}$
+	−	−	${P_{3}}$
+	+	+	${P_{4}}$

Table 18

Slice factor behavior.

Effect	Estimate	P-value
${s_{1}}$	1.60e-2	$\lt 0.001$	∗
${s_{2}}$	$-1.30$e-2	$\lt 0.001$	∗
${s_{3}}$	$-2.11$e-2	$\lt 0.001$	∗
$\mathbf{2}{s_{1}}$	$-1.34$e-4	$\gt 0.2$
$\mathbf{2}{s_{2}}$	1.15e-3	0.193
$\mathbf{2}{s_{3}}$	8.24e-4	$\gt 0.2$
$\mathbf{4}{s_{1}}$	$-8.09$e-4	$\gt 0.2$
$\mathbf{4}{s_{2}}$	2.18e-4	$\gt 0.2$
$\mathbf{4}{s_{3}}$	5.17e-4	$\gt 0.2$
$\mathbf{6}{s_{1}}$	$-3.39$e-4	$\gt 0.2$
$\mathbf{6}{s_{2}}$	7.82e-4	$\gt 0.2$
$\mathbf{6}{s_{3}}$	1.87e-3	0.046	∗

Note: Any p-value less than 0.1 is considered statistically significant which is indicated by ∗.

Table 19

	Version A	Version B
	$(\mathbf{B},\mathbf{D},\mathbf{F})=(-,-,-)$	$(\mathbf{B},\mathbf{D},\mathbf{F})=(-,-,+)$
${P_{1}}$	0.00566	0.00556
${P_{2}}$	0.01556	0.01173
${P_{3}}$	0.04464	0.04081
${P_{4}}$	0.00867	0.00858

Theorem 1.

A minimum sliced aberration design as defined above corresponds to a defining relation in which all words are type 0.

Theorem.

A minimum sliced aberration design as defined above corresponds to a defining relation in which all words are type 0.

Authors

Abstract

1 Introduction

2 Motivating Example

Table 1

Table 2

Table 3

Table 4

3 Sliced Factorial Designs with Four Platforms

3.1 Four-Platform Experiment: Android, iOS, Windows, and macOS

Property 1.

Property 2.

Sliced Effect Hierarchy.

Table 5

(3.1)

Definition 1 (Minimum Sliced Aberration Designs).

Theorem 1.

Table 6

3.2 Generalization to a General Number of Factors

Table 7

Table 8

Table 9

4 Results

4.1 Summary of the Design

4.2 Data Display and Summary

Table 10

Table 11

4.3 Identification of Platform-Specific Significant Effects

Table 12

Table 13

Table 14

Table 15

Table 16

4.4 Calculation of the Factorial Effects for the Multiple Operating Systems

Table 17

Table 18

(4.1)

Table 19

5 Discussion and Conclusions

Appendix A Review of Factorial Designs at Two Levels

(A.1)

Appendix B Proof of Theorem 1

Theorem.

Proof.

Acknowledgements

References

Export citation

Copy and paste formatted citation

Download citation in file

Table 1

Table 2

Table 3

Table 4

Table 5

Table 6

Table 7

Table 8

Table 9

Table 10

Table 11

Table 12

Table 13

Table 14

Table 15

Table 16

Table 17

Table 18

Table 19

Theorem 1.

Theorem.