1 Introduction
The spatial organization of cells, such as orientation, density, and shape, are fundamental to characterizing physical mechanisms and biological functions [8, 37], as the arrangement of cells influences their functions, including proliferation and differentiation [30]. For example, high cell density can signal cells to die to maintain tissue homeostasis [9], and cell shape has been linked to DNA synthesis and cell growth [12]. Advances in microscopy techniques enable detailed video recordings of cellular motions. Image segmentation approaches have been developed for microscopy images [33], where the segmentation results, including cell locomotion, alignment, and density, can be used for modeling cell behaviors [47, 15, 10, 25]. Therefore, precise cell image segmentation is essential for quantifying cell location, alignment, and morphology, as these statistics enhance our understanding of how cellular spatial organization influences human health outcomes [53, 2].
Cellular segmentation tools can be broadly classified as unsupervised and supervised methods, based on whether labeled data is needed for training the models. One of the most popular tools of unsupervised cellular segmentation is the ImageJ [1, 46, 44], which enables segmenting both cell nuclei and whole cells, and linking cells across frames of a microscopy video via plug-ins such as TrackMate [51]. Prior to using ImageJ for cell detection and segmentation, a series of user-driven image pre-processing steps must be completed in the program. For instance, the image is typically converted into a binary image first, based on the threshold chosen either manually or automatically via several global threshold setting methods [34, 29]. Additional image processing steps, such as despeckling for noise reduction [11], typically need to be run before segmentation. To detect cell objects within an image, the algorithm scans the corresponding binary image until the edge of an object is reached. Then, the algorithm traces around the edge of the object until the starting point is reached. This process is repeated for pixels of each object, or the foreground pixels, in the binary image. The centroid of each cell object can be identified by fitting an ellipse for each object using the second-order central moments [46], which assumes the cell boundary is elliptical. The parameters for the best-fit ellipse, including the major and minor axes and centroids, and matrices for the object masks for each object boundary, are recorded in ImageJ [11]. Additional functionality, such as extension of elliptical outlines, can be added to the user interface by writing macro code to streamline image analysis or adding third-party plug-ins [46]. However, achieving reasonable segmentation results is often labor-intensive and requires human interaction at each step. Furthermore, the assumption of the shape of the cells in ImageJ is restrictive. Though the ellipse can approximate nuclei of certain cells, it cannot approximate the shape of cytoplasm or the whole cell of frequently used cells in experiments, such as fibroblasts or epithelial cells.
Supervised cellular segmentation methods, are typically built upon a convolutional neural network structure, such as the U-net [41]. A popular tool is Cellpose, which builds U-nets for segmenting images that contain one type of cells that are trained on a dataset of over 70,000 segmented objects, including annotated images that contain numerous types of cells [50]. The model first compresses the image and then reconstructs the image to its original dimensions through a U-net to capture both local and global features. It further generates a topological map from diffusion simulation to product gradient maps for the model to learn the horizontal and vertical gradients of the image. The final layer of Cellpose produces three feature maps, including the horizontal and vertical directional gradients, and the sigmoid transformation of the probability map of the output image from the U-net, and the parameters of the Cellpose are then estimated by fitting a loss function of the gradient vector and binary label of each pixel of the image. As the Cellpose segmentation requires massive training data, it may not work well for detecting novel cell types, where high-labeled image data that may not be readily available. Furthermore, training Cellpose and using it for segmenting large images with several million pixels can be computationally expensive.
In response to the challenges the current methods face, we present a novel unsupervised cell image segmentation method for segmenting images containing one type of cells, based on separable Gaussian processes, automated thresholding, and watershed operations for image segmentation. The proposed method has several advantages. First, the separable covariance structure on image data enables us to decompose the large covariance into a Kronecker product of two small covariances, which substantially reduces the computational cost for computing the matrix inversion and the logarithm of the determinant of the covariance matrix. Second, unlike conventional unsupervised image segmentation methods such as ImageJ, which require manual tuning of multiple hyperparameters, including Gaussian blur radius, thresholding method, and minimum object size, our proposed segmentation pipeline is free of manual parameter selection. We developed fast algorithms for maximum likelihood estimation of the parameters in GPs and the predictive distribution. The predictive mean surface naturally facilitates foreground-background separation by applying a data-driven threshold with the optimal threshold automatically estimated based on the absolute second difference in foreground pixel counts between the thresholds. Third, the cell masks are segmented using the watershed algorithm, which simulates water flowing through a topographical map of the image to separate cell objects. Fourth, our approach is entirely unsupervised, negating any need for annotated training data, in contrast to the supervised methods, such as Cellpose. Thus, the proposed method is particularly suitable for segmenting new cell types or other biological structures. We studied our methodology using simulated data for noise filtering. We also compared our approaches with the alternatives using real microscopy images for segmenting both the cell nucleus and whole cell in optically dense images, a challenging scenario even when labeled data are available. These studies demonstrate that our approach substantially outperforms the alternatives.
The rest of the article is organized as follows. We discuss GP models of images and a fast algorithm for parameter estimation and predictive distribution in Section 2. In Section 3, we introduce our workflow for cellular image segmentation from Gaussian processes, including image smoothing, binary thresholding, and segmentation. In Section 4, we use simulated data and microscopy images of cells to compare our methods and other alternatives for denoising, including principal component analysis and dynamic mode decomposition. We also demonstrate the computational advantage of the fast algorithm against direct computation. Finally, we generate cell masks for both the cell nucleus and cytoplasm, or the whole cell, by our methods and ImageJ segmentation method to compare their accuracy with the manual labels in Section 5. Our findings not only present a novel cell segmentation technique but also highlight the applicability of fast Gaussian processes in image processing tasks. The data and code of this article are made publicly available: https://github.com/UncertaintyQuantification/cell_segmentation.
2 Gaussian Process Models of Images
2.1 The Likelihood Function and Predictive Distribution
Let us consider a Gaussian process (GP) model for a two-dimensional (2D) microscopy image, where the intensity at the pixel location x is modeled as $y(\mathbf{x})=f(\mathbf{x})+\epsilon $, with $f(\cdot )$ being a latent Gaussian process (GP) and $\epsilon \sim \mathcal{N}(0,{\sigma _{0}^{2}})$ being a Gaussian noise with variance ${\sigma _{0}^{2}}$ [38]. Consider an ${n_{1}}\times {n_{2}}$ image Y, where the $(i,j)$th term is $y({\mathbf{x}_{i,j}})$ for the pixel location ${\mathbf{x}_{i,j}}=({x_{i,1}},{x_{j,2}})$. The data matrix can be written as:
where F follows a matrix normal distribution $\mathbf{F}\sim \text{Matrix-Normal}\left(\mu {\mathbf{1}_{{n_{1}}\times {n_{2}}}},\hspace{0.1667em}{\sigma ^{2}}{\mathbf{R}_{1}},{\mathbf{R}_{2}}\right)$ with μ being a mean parameter, ${\mathbf{1}_{{n_{1}}\times {n_{2}}}}$ being a ${n_{1}}\times {n_{2}}$ matrix of ones, ${\sigma ^{2}}$ being the variance parameter, ${\mathbf{R}_{1}}$ and ${\mathbf{R}_{2}}$ being correlation matrices for two inputs, respectively, and the vectorized noise matrix follows $\text{Vec}(\mathbf{E})\sim \mathcal{MN}(0,{\sigma _{0}^{2}}{\mathbf{I}_{N\times N}})$, with $N={n_{1}}{n_{2}}$. Additional trend structure can be modeled in the mean function, and the covariance can be generalized to be semi-separable in the model [14].
Vectorize the observations matrix ${\mathbf{y}_{v}}=\text{Vec}(\mathbf{Y})$. After marginalizing out the latent signal matrix F, the observation vector of the image follows a multivariate normal distribution,
where $\mathbf{\tilde{R}}=\mathbf{R}+\eta {\mathbf{I}_{N}}$, $\mathbf{R}={\mathbf{R}_{2}}\otimes {\mathbf{R}_{1}}$, ⊗ is a Kronecker product and $\eta ={\sigma _{0}^{2}}/{\sigma ^{2}}$ is a nugget parameter. Here the correlation matrices are parameterized kernel functions ${R_{1}}(i,{i^{\prime }})={K_{1}}({x_{i,1}},{x_{{i^{\prime }},1}})$ and ${R_{2}}(j,{j^{\prime }})={K_{2}}({x_{j,2}},{x_{{j^{\prime }},2}})$. Frequently used kernel functions include the power exponential kernel and Matérn kernel [38]. Denote ${K_{l}}({x_{i,l}},{x_{{i^{\prime }},l}})={K_{l}}({d_{l}})$ with ${d_{l}}=|{x_{i,l}}-{x_{{i^{\prime }},l}}|$. For instance, the Matérn kernel with roughness parameter $5/2$ follows [20]:
where ${\gamma _{l}}$ is a range parameter to be estimated for $l=1,2$. A GP with the kernel function in Equation (2.3) is twice mean-squared differentiable [38], and it is used as a default choice of some software packages of GP surrogate models [42, 16]. We use the kernel function in (2.3) for demonstration purposes.
(2.2)
\[ ({\mathbf{y}_{v}}\mid {\sigma ^{2}},\eta ,{\mathbf{R}_{1}},{\mathbf{R}_{2}})\sim \mathcal{MN}(\mu {\mathbf{1}_{N}},{\sigma ^{2}}\mathbf{\tilde{R}}),\](2.3)
\[ {K_{l}}({d_{l}})=\left(1+\frac{\sqrt{5}{d_{l}}}{{\gamma _{l}}}+\frac{5{d_{l}^{2}}}{3{\gamma _{l}^{2}}}\right)\exp \left(-\frac{\sqrt{5}{d_{l}}}{{\gamma _{l}}}\right),\]After specifying the model, we compute the likelihood function and predictive distribution for parameter estimation and prediction, respectively. The parameters of the model in (2.2) include the mean, variance, range and nugget parameters $\{\mu ,{\sigma ^{2}},\boldsymbol{\gamma },\eta \}$ with $\boldsymbol{\gamma }=({\gamma _{1}},{\gamma _{2}})$ being the range parameters in the kernel function in Equation (2.3). As the number of pixels is large, we employed the maximum likelihood estimator (MLE) to estimate the parameters, as the result is similar to the robust estimation by marginal posterior mode [17]. Given the range and nugget parameters, the MLE of the mean and variance parameters has a closed-form expression: $\hat{\mu }={({\mathbf{1}_{N}^{T}}{\mathbf{\tilde{R}}^{-1}}{\mathbf{1}_{N}})^{-1}}{\mathbf{1}_{N}^{T}}{\mathbf{\tilde{R}}^{-1}}{\mathbf{y}_{v}}$, where ${\hat{\sigma }^{2}}=\frac{{S^{2}}}{N}$ with ${S^{2}}={({\mathbf{y}_{v}}-\hat{\mu }{\mathbf{1}_{N}})^{T}}{\tilde{\mathbf{R}}^{-1}}({\mathbf{y}_{v}}-\hat{\mu }{\mathbf{1}_{N}})$. The logarithm of the profile likelihood after plugging the MLE of the mean and variance parameters follows:
where C is a constant not related to $(\boldsymbol{\gamma },\eta )$.
(2.4)
\[ \log (L(\boldsymbol{\gamma },\eta ))=C-\frac{1}{2}\log (|\tilde{\mathbf{R}}|)-\frac{N}{2}\log ({S^{2}}),\]The range and nugget parameters are then estimated by maximizing the logarithm of the profile likelihood function:
Vectorize signal ${\mathbf{f}_{v}}=\text{Vec}(\mathbf{F})$ from model (2.1). After plugging the MLE of the parameters, the posterior distribution of the image vector follows
where the predictive mean and predictive covariance follow
The predictive mean ${\mathbf{f}_{v}^{\ast }}$ and predictive variances, the diagonal terms of ${\hat{\sigma }^{2}}{\mathbf{R}^{\ast }}$, are required for prediction and uncertainty quantification. However, direct computation of the likelihood function involves inverting the covariance matrix, which takes $\mathcal{O}({N^{3}})$ operations, where the number of pixels N can be at the order of ${10^{6}}-{10^{7}}$ for microscopy images. A fast computational way without approximation is introduced in Section 2.2 to solve this computational challenge.
(2.5)
\[ (\hat{\boldsymbol{\gamma }},\hat{\eta })=\underset{(\boldsymbol{\gamma },\eta )}{\operatorname{argmax}}\log (L(\boldsymbol{\gamma },\eta )).\](2.6)
\[ ({\mathbf{f}_{v}}\mid {\mathbf{y}_{v}},\hat{\mu },{\hat{\sigma }^{2}},\hat{\eta },\hat{\boldsymbol{\gamma }})\sim \mathcal{MN}({\mathbf{f}_{v}^{\ast }},\hspace{0.1667em}{\hat{\sigma }^{2}}{\mathbf{R}^{\ast }}),\]2.2 Fast Computation for Images
A wide range of approximation approaches of GPs have been developed [48, 28, 43, 7, 23, 13, 19, 24, 57]. For GPs with product covariances on image data, no approximation is required. Denote the eigendecomposition of the subcovariance ${\mathbf{R}_{l}}={\mathbf{U}_{l}}{\boldsymbol{\Lambda }_{l}}{\mathbf{U}_{l}^{T}}$, where ${\mathbf{U}_{l}}$ and ${\boldsymbol{\Lambda }_{l}}$ are matrices of eigenvectors and a diagonal matrix of eigenvalues for $l=1,2$, respectively. Furthermore, denote ${\boldsymbol{\lambda }_{l}}={({\lambda _{1,l}},\dots ,{\lambda _{{n_{l}},l}})^{T}}$, an ${n_{l}}$-vector of the diagonal terms in ${\boldsymbol{\Lambda }_{l}}$, for $l=1,2$.
The log profile likelihood in Equation (2.4) and the predictive distributions in Equation (2.6) can be written as a function in terms of eigenpairs in Lemma 1 below. Lemma 1 eliminates the need to invert the $N\times N$ covariance matrix and reduce the computational complexity to $\mathcal{O}({n_{1}^{3}}+{n_{2}^{3}})$, which is significantly smaller than $\mathcal{O}({n_{1}^{3}}{n_{2}^{3}})$. The proof of Lemma 1 is provided in the Appendix.
Lemma 1.
-
1. (Profile likelihood). Denote the transformed likelihood vector ${\tilde{\mathbf{y}}_{v}}=\textit{Vec}(\tilde{\mathbf{Y}})=\textit{Vec}({\mathbf{U}_{1}^{T}}(\mathbf{Y}-\hat{\mu }{\mathbf{1}_{{n_{1}}\times {n_{2}}}}){\mathbf{U}_{2}})$. The logarithm of the profile likelihood in Equation (2.4) can be written aswhere
(2.9)
\[\begin{aligned}{}\log (L(\boldsymbol{\gamma },\eta ))=& C-\frac{1}{2}{\sum \limits_{i=1}^{{n_{1}}}}{\sum \limits_{j=1}^{{n_{2}}}}\log \left({\lambda _{i,1}}{\lambda _{j,2}}+\eta \right)-\\ {} & \frac{N}{2}\log \left({\sum \limits_{i=1}^{{n_{1}}}}{\sum \limits_{j=1}^{{n_{2}}}}\frac{{\tilde{Y}_{i,j}^{2}}}{{\lambda _{i,1}}{\lambda _{j,2}}+\eta }\right),\end{aligned}\]with ${\tilde{Y}_{i,j}}$ being the $(i,j)$th entry in $({\mathbf{U}_{1}^{T}}(\mathbf{Y}-\hat{\mu }{\mathbf{1}_{{n_{1}}\times {n_{2}}}}){\mathbf{U}_{2}})$, ${\tilde{Y}_{i,j,0}}$ being the $(i,j)$th entry of the ${n_{1}}\times {n_{2}}$ matrix ${\mathbf{U}_{1}^{T}}\mathbf{Y}{\mathbf{U}_{2}}$, ${\tilde{u}_{i,1}}$ being the ith term of the ${n_{1}}$ vector ${\mathbf{1}_{{n_{1}}}^{T}}{\mathbf{U}_{1}}$ $i=1,\dots ,{n_{1}}$ and ${\tilde{u}_{j,2}}$ being the jth term of the ${n_{2}}$ vector ${\mathbf{1}_{{n_{2}}}^{T}}{\mathbf{U}_{2}}$, for $j=1,\dots ,{n_{2}}$.(2.10)
\[\begin{aligned}{}\hat{\mu }=& {\left({\sum \limits_{j=1}^{{n_{2}}}}{\sum \limits_{i=1}^{{n_{1}}}}\frac{{\tilde{u}_{i,1}^{2}}{\tilde{u}_{j,2}^{2}}}{{\lambda _{i,1}}{\lambda _{j,2}}+\eta }\right)^{-1}}\times \\ {} & \left({\sum \limits_{j=1}^{{n_{2}}}}{\sum \limits_{i=1}^{{n_{1}}}}\frac{{\tilde{u}_{i,1}}{\tilde{Y}_{i,j,0}}{\tilde{u}_{j,2}}}{{\lambda _{i,1}}{\lambda _{j,2}}+\eta }\right),\end{aligned}\] -
2. (Predictive distribution). The predictive mean vector in Equation (2.7) followswhere ${\mathbf{\tilde{Y}}_{0}}$ is a ${n_{1}}\times {n_{2}}$ with the $(i,j)$th entry being $\frac{{\tilde{Y}_{i,j}}}{{\lambda _{i,1}}{\lambda _{j,2}}+\eta }$ for $i=1,\dots ,{n_{1}}$ and $j=1,\dots ,{n_{2}}$. The predictive variance at the $(i,j)$th pixel follows
(2.11)
\[ {\mathbf{f}_{v}^{\ast }}=\hat{\mu }{\mathbf{1}_{N}}+\textit{Vec}({\mathbf{U}_{1}}{\boldsymbol{\Lambda }_{1}}{\mathbf{\tilde{Y}}_{0}}{\boldsymbol{\Lambda }_{2}}{\mathbf{U}_{2}^{T}}),\]\[ {\hat{\sigma }^{2}}{c_{i,j}^{\ast }}={\hat{\sigma }^{2}}\left(1-{\sum \limits_{{j^{\prime }}=1}^{{n_{2}}}}{\sum \limits_{{i^{\prime }}=1}^{{n_{1}}}}\frac{{\tilde{r}_{i,{i^{\prime }},1}^{2}}{\tilde{r}_{j,{j^{\prime }},2}^{2}}}{{\lambda _{i,1}}{\lambda _{j,2}}+\eta }\right)\]where ${\tilde{r}_{i,{i^{\prime }},1}}$ is the ${i^{\prime }}$th term of the vector ${\mathbf{r}_{i,1}^{T}}{\mathbf{U}_{1}}$ and ${\tilde{r}_{j,{j^{\prime }},2}}$ is the ${j^{\prime }}$th term of the vector ${\mathbf{r}_{j,2}^{T}}{\mathbf{U}_{2}}$, for ${i^{\prime }}=1,\dots ,{n_{1}}$ and ${j^{\prime }}=1,\dots ,{n_{2}}$, with ${\mathbf{r}_{i,1}^{T}}={({K_{1}}({x_{i,1}},{x_{1,1}}),\dots ,{K_{1}}({x_{i,1}},{x_{{n_{1}},1}}))^{T}}$ and ${\mathbf{r}_{j,2}^{T}}={({K_{2}}({x_{j,2}},{x_{1,2}}),\dots ,{K_{2}}({x_{j,2}},{x_{{n_{2}},2}}))^{T}}$.
3 Image Segmentation from Gaussian Processes
Figure 1
Workflow for segmenting and labeling cell images: (A) Divide the image into different sub-images to enable locally estimated mean and variance parameters for capturing local properties such as the change of brightness. (B) Compute the predictive mean of fast GPs in Section 2.2 to each sub-image, which greatly reduces image noise. (C) Threshold each smoothed sub-image based on the criterion discussed in Section 3.2 to produce binary images, separating cells from the background. The optimal threshold is estimated for each sub-image. (D) Recombine the binary sub-images into a single binary image, and apply the watershed algorithm discussed in Section 3.3 to the image for segmentation and labeling, with each cell marked by a unique color.
Though Gaussian processes have been widely used in as surrogate models to emulate expensive computer simulations, nonparametric regression, and modeling spatially correlated data [38, 6], the flexibility and scalability of GPs in unsupervised image segmentation have not been studied yet. The overall flow of our cellular image segmentation process is summarized in Figure 1. First, some microscopy images of cells are particularly large with ${10^{6}}-{10^{7}}$ pixels. Similar to other image segmentation methods [46, 50], we segment each large image into multiple sub-images. We then apply the fast algorithm of GPs in Lemma 1 to get the predictive mean of each sub-image, illustrated in Parts (A) and (B) in Figure 1, respectively. We assume the range parameters and nugget of the GPs are the same for each subimage estimated by MLE in Lemma 1, yet the mean and variance parameters are distinct for each separable image, and the estimation of these parameters has a closed-form expression. Such a step enables different estimated parameters to capture the change of optical properties, such as brightness difference and out-of-focus blur from distinct parts of a large microscopy image, which are crucial in practice. Second, we developed a robust measure that will be introduced in Section 3.2 to threshold the predictive mean of each sub-image into background and objects, as shown in Panel (C). Finally, the binary sub-images are restitched in Panel (D), and the watershed algorithm is applied to the binary matrix to retrieve the labeled cell masks, which will be introduced in Section 3.3.
3.1 Image Denoising from Gaussian Processes on Lattice
Let Y represent the original cell image, which has size $N={n_{1}}\times {n_{2}}$. We denote $y({\mathbf{x}_{i,j}})$ as the value of the image at pixel ${\mathbf{x}_{i,j}}=({x_{i,1}},{x_{j,2}})$. All possible pixel values are between 0 and 1, as each image in this analysis is in grayscale. Microscopy images of cells can contain more than ${10^{6}}$ pixels, and some areas of an image can have substantially higher pixel values than other areas. Thus, for a large image Y, it is normally partitioned into a set of cropped images, denoted as ${\{{\mathbf{Y}_{k}}\}_{k=1}^{\tilde{K}}}$, where $\tilde{K}$ is the total number of cropped images as illustrated in Figure 1. This step can substantially reduce the computational cost, and enable segmentation outcomes more adaptive to the features of the local areas in a large image.
As experimental images often contain substantial noise, for each cropped image, we first use GPs with fast computation introduced in Section 2.1 for denoising the images. The range and nugget parameters of GPs are estimated by maximizing the likelihood in Equation (2.9), and they are held the same for all sub-images as they have similar smoothness properties. The mean and variance parameters are estimated differently for each cropped image to capture the local change of the optical properties. As the sub-images will be turned into a binary image, the discontinuity of the boundary between different sub-images does not impact the image segmentation task.
3.2 Generating Binary Cell Masks
Figure 2
Comparison of binary image results across thresholds based on the absolute second difference in foreground pixels. The threshold, which ranges from 0-1, refers to the proportion of the maximum value of the predictive mean that is set as the binary cutoff. Images A, B, C, D, and E are the binary images generated by setting the corresponding threshold on cropped predictive mean image F. Note that the Image B corresponds with ${\operatorname{argmax}_{m}}\Delta {c_{k}}({\alpha _{m}})$ in Equation (3.1).
The predictive mean at the interior pixel of a cell object typically has a larger value than the one in the background of a microscopy image. Inspired by the IsoData algorithm [40] in ImageJ [1], we developed an automated and robust way to determine the threshold value for separating objects from background noise using the predictive mean from GP models.
Suppose each cropped image is ${n_{k,1}}\times {n_{k,2}}$ with predictive mean ${\mathbf{F}_{k}^{\ast }}$ with the $(i,j)$th entry being ${F_{k}^{\ast }}({\mathbf{x}_{i,j}})$ and $\text{Vec}({\mathbf{F}_{k}^{\ast }})={\mathbf{f}_{k,v}^{\ast }}$ given in Equation (2.11). As the intensity values of the pixels are between 0 to 1 and the predictive uncertainty is small, the range of ${F_{k}^{\ast }}({\mathbf{x}_{i,j}})$ is also approximately between 0 to 1. As the image has only one type of cells, the change of pixel values ${F_{k}^{\ast }}({\mathbf{x}_{i,j}})$ is large if ${\mathbf{x}_{i,j}}$ is the pixel at the boundary of a cell object, and small elsewhere. Thus we first compute the normalized predictive mean exceeds each threshold value:
\[ {c_{k}}({\alpha _{m}})=\sum \limits_{i,j}{1_{\frac{{F_{k}^{\ast }}({\mathbf{x}_{i,j}})}{\max ({\mathbf{F}_{k}^{\ast }})}\gt {\alpha _{m}}}},\]
where ${\{{\alpha _{m}}\}_{m=1}^{M}}$ a sequence of equally spaced thresholds from 0 to 1 with the default value $M=100$, and $\max ({\mathbf{F}_{k}^{\ast }})$ denotes the maximum value of the predictive mean ${\mathbf{F}_{k}^{\ast }}$ of the kth cropped image for $k=1,\dots ,\tilde{K}$. In general, the range of ${\{{\alpha _{m}}\}_{m=1}^{M}}$ can be chosen to be the range of ${F_{k}^{\ast }}({\mathbf{x}_{i,j}})/\max ({\mathbf{F}_{k}^{\ast }})$. Subsequently, we calculate the difference $\Delta {c_{k}}({\alpha _{m}})$ in pixel counts below
As $\Delta {c_{k}}({\alpha _{m}})$ may not be smooth over ${\alpha _{m}}$, we compute the predictive mean of $\Delta {c_{k}}({\alpha _{m}})$, denoted by $\Delta {c_{k}^{\ast }}({\alpha _{m}})$, via a GP model with the default Matérn kernel in Equation (2.3) by the RobustGaSP package [16].We define ${\alpha ^{\ast }}$ as the smallest threshold ${\alpha _{m}}$ after the point of maximum fluctuation, denoted as $\arg {\max _{m}}\Delta {c_{k}^{\ast }}({\alpha _{m}})$, where the smoothed differences $\Delta {c_{k}^{\ast }}({\alpha _{m}})$ stabilize within a specified tolerance relative to the variability of $\Delta {c_{k}}({\alpha _{m}})$, an example of which is plotted in Figure 2. Formally, let ${\tau _{k}}$ represent the standard deviation of the differences $\Delta {c_{k}^{\ast }}({\alpha _{m}})$. The optimal threshold ${\alpha _{k}^{\ast }}$ is given by:
(3.1)
\[\begin{aligned}{}{\alpha _{k}^{\ast }}=& \underset{m}{\min }\{{\alpha _{m}}:\left|\Delta {c_{k}^{\ast }}({\alpha _{m}})-\Delta {c_{k}^{\ast }}({\alpha _{m-1}})\right|\lt 0.05{\tau _{k}},\hspace{0.1667em}\\ {} & m\gt \arg \underset{m}{\max }\Delta {c_{k}^{\ast }}({\alpha _{m}})\}.\end{aligned}\]Figure 2 illustrates this process for determining the optimal threshold for segmenting cell objects from the background. While the first difference $\Delta {c_{k}}({\alpha _{m}})$ measures the immediate change in the number of pixels above each threshold, the second difference more precisely captures the stabilization process of these changes. Initially, at extremely small thresholds, almost all pixels are classified as foreground or cell objects, leading to minimal changes in pixel counts between threshold values, shown as A in Figure 2. However, once the threshold starts to intersect the main background distribution, even small increments can trigger substantial variations, resulting in high second differences. As the threshold increases, the second difference reaches its maximum value at B in Figure 2 and then becomes erratic between B and C in Figure 2. This erratic behavior occurs because, in this transitional range, as the predictive mean values near the decision boundary are very similar, even minor changes in the threshold result in disproportionately large shifts in the number of pixels classified as foreground or cell objects. Beyond this range, the second difference begins to stabilize and falls below a predetermined tolerance level show as E in Figure 2, indicating that further increases yield only negligible changes in segmentation. Any further increase in the threshold results in over-segmentation.
Figure 3
(A) Frequency of the predictive mean of the intensity values over all pixels from the same microscopy image shown in Figure 2 F. The optimal threshold is annotated and lies right after the bulk of the background pixel values. The thresholds that generate images A, B, C, D, and E from Figure 2 are represented as vertical lines with the same color. All pixel intensity values less than the optimal threshold are background pixels and have a symmetric distribution. (B) The predictive mean is shown for each pixel with the optimal threshold plotted as the horizontal plane.
Panel (A) in Figure 3 plots the location of different threshold values in the distribution of pixel intensity values. The optimal threshold selected by the algorithm lies on the right side of the mode. Panel (B) of Figure 3 plots a horizontal plane of the optimal threshold value. A threshold value much smaller than this value misclassifies the background as cell objects, whereas a threshold value much larger than the threshold value misses some small cell objects with low intensity peaks.
After estimating the threshold value ${\alpha _{k}^{\ast }}$, we construct a binary image ${\mathbf{B}_{k}^{\ast }}$ for each cropped image at each pixel:
\[ {B_{k}^{\ast }}({\mathbf{x}_{i,j}})=\left\{\begin{array}{l@{\hskip10.0pt}l}1\hspace{1em}& \text{if}\hspace{2.5pt}{F_{k}^{\ast }}({\mathbf{x}_{i,j}})\gt {\alpha _{k}^{\ast }}\max ({\mathbf{F}_{k}^{\ast }})\\ {} 0\hspace{1em}& \text{otherwise}\end{array}\right.\]
for $k=1,\dots ,\tilde{K}$. We complete panel (C) in Figure 1 by generating binary cell masks for all sub-images. To address a rare scenario with a higher-than-expected object count, which only happened once for all sub-images, we implemented a re-thresholding step discussed in Section S3 in the Supplementary Material. Finally, each object group is separated and detected via flood fill [4], as described in Section S4 in the Supplementary Material.3.3 Labeling Cell Masks by the Watershed Algorithm
Figure 4
An image of two cell nuclei (upper row) and heights of negative distances to the nearest background pixels (lower row). (A) Heights of negative distances along the red straight line in the upper panel are plotted in the lower panel before the watershed algorithm starts. (B) At water level $=-13$, both catch basins are partially filled, as both local minima are less than the current water level. The two separate water sources have unique labels and are not yet touching. (C) At water level at around $-10.2$, the water sources from the two catch basins flow into each other and the watershed line is formed at water level at around $-10.2$. (D) At water level $=0$, the cell objects are filled with water, and the watershed operation is complete. Each cell is labeled and separated.
After obtaining the binary image, we apply the watershed algorithm to separate cell objects from EBImage package in R [35], which is particularly useful for segmenting objects connecting to each other. The watershed algorithm creates a topological representation of an image based on a distance map, which is derived from the binary image representing foreground objects (cells). The distance map assigns a value to each foreground pixel based on its Euclidean distance to the nearest background pixel; higher values represent larger distances from the background. These distance values act as heights in a topological landscape, providing a basis for the watershed function to simulate water through. The negative distance map is then calculated by taking the negative of the distance map. Examples of the binary image and negative distance map are provided in Section S5 in the Supplementary Material.
The local minima of the negative distance map generally correspond to the centers of cell objects and are used as the starting point for the “flooding” process in the watershed algorithm. The water fills the cell object from these local minima as the negative distance map is dunked into water, so the cell object acts as a catch basin as the map floods. As water from different starting points meets, watershed lines are formed, which delineate boundaries for distinct binary cell objects. Additionally, whenever the water from a catch basin meets a background pixel, a watershed line is formed [55]. These watershed lines separate and create distinct cell objects, and these cell objects can be assigned unique labels.
An example of the watershed algorithm at distinct water levels is provided in Figure 4 with two cells denoted by ${d_{1}}$ and ${d_{2}}$. The top row of Figure 4 plots the cell images where the negative heights of a slice over the red line are plotted in the bottom row. Figure 4 (A) shows depths for the two local minima for the 1D example are $D({d_{1}})=-14.14$ and $D({d_{2}})=-14.21$. Each height acts as a water level, and the local minima act as seed points for water to flow through into each valley or catch basin. When the water level rises to touch a local minimum, the water is given the same label as the local minimum. As the water level rises, all unidentified pixels at or below the water level with neighboring labeled pixels are given the same label.
Figure 4 (B) shows the water levels at $-13$. Each catch basin is partially filled, as the water level is higher than the two local minima. As the water in each valley is still well separated, all emerged pixels in each valley are either labeled corresponding with ${d_{1}}$ or ${d_{2}}$, with no neighboring pixels containing a different label.
At water level around $-10.2$, shown in Figure 4 (C), the water from the two different basins begins to touch. The water from the two catch basins corresponding to ${d_{1}}$ and ${d_{2}}$ meet at the unlabeled pixel ${p_{0}}$ with a position close to 14, plotted in the x-coordinate. The Euclidean distances between ${p_{0}}$ and the local minima are utilized to determine its assignment, and a watershed line is formed. The Euclidean distance between ${p_{0}}$ and ${d_{2}}$ ($\approx 5.01$ units) is smaller than the distance between ${p_{0}}$ and ${d_{1}}$ ($\approx 5.62$ units), so ${p_{0}}$ is grouped with cell 2. The watershed line will serve as the boundary between the two cells.
At the water level 0, the objects are completely segmented, shown in Figure 4 (D). The watershed operation then halts, as this height is used as a stopping point since all cell objects have negative heights in the negative distance map. Additional details of the watershed algorithm, including definitions and edge cases, can be found in [55].
Once all objects are segmented by watershed, the average foreground object pixel count is calculated. To filter out any noise that may have been included as a foreground object, all objects with pixel counts less than or equal to 15% of the average are removed. If an object is touching the image boundary, the threshold count for removal is reduced to 5% of the average, as some objects could be partially cut off during imaging.
Figure 5
Violin plots of RMSE for five methods applied to the Branin function and the linear diffusion equation across various noise levels. Each experiment is repeated 10 times. The Fast-Mat and Fast-Exp represent the fast GPs with Matérn kernels in Equation (2.3) and exponential kernels, respectively.
4 Numerical Studies of Image Denoising
In this section, we numerically compare our fast GPs with several alternative methods by simulated experiments. The comparison includes the fast GP of a Matérn kernel with roughness parameter being 2.5 in Equation (2.3) and the exponential kernels ${K_{l}}({d_{l}})=\exp (-{d_{l}}/{\gamma _{l}})$ where ${d_{l}}$ is the distance of the lth coordinate of the input and ${\gamma _{l}}$ is a range parameter, for $l=1,2$. We also include three other alternative methods, the principal component analysis (PCA) [3], fast algorithm of multivariate Ornstein-Uhlenbeck processes (FMOU) [27], and dynamic mode decomposition (DMD) [45]. PCA is commonly used in image denoising by projecting image data onto low-dimensional spaces spanned by the leading eigenvectors of the data covariance matrix. FMOU provides a fast expectation-maximization (E-M) algorithm for parameter estimation in a latent factor model $\mathbf{Y}={\mathbf{U}_{0}}\mathbf{Z}+\boldsymbol{\epsilon }$, where ${\mathbf{U}_{0}}$ is an orthogonal factor loading matrix and $\mathbf{Z}={[{\mathbf{z}_{1}},\dots ,{\mathbf{z}_{d}}]^{T}}$ consists of d independent latent processes, each modeled as an Ornstein-Uhlenbeck process with distinct correlation and variance parameters, and $\boldsymbol{\epsilon }$ are independent Gaussian noises. In each E-M iteration, parameters are updated using closed-form expressions. DMD is popular in video processing to extract the spatiotemporal structure in nonlinear dynamical systems of high-dimensions. It assumes a linear transition matrix A, estimated as $\hat{\mathbf{A}}={\operatorname{argmin}_{\mathbf{A}}}||{\mathbf{Y}_{2:{n_{2}}}}-\mathbf{A}{\mathbf{Y}_{1:{n_{2}}-1}}||$, where ${\mathbf{Y}_{2:{n_{2}}}}$ and ${\mathbf{Y}_{1:{n_{2}}-1}}$ represent the last and first ${n_{2}}-1$ columns of Y, respectively. An efficient algorithm, known as exact DMD [54], is proposed to identify the leading eigenpairs of A with computational complexity of $\min (\mathcal{O}({n_{1}}{n_{2}^{2}}),\mathcal{O}({n_{1}^{2}}{n_{2}}))$.
We study two examples. In the first example, we studied two scenarios, where the means of the observations follow the Branin function [36] and the linear diffusion equation [31], respectively. In the second example, the means of the observations are images of cell nuclei and whole cells.
Figure 6
Violin plots of RMSE for five methods applied to images of noisy cell nuclei and whole cells across various noise levels. Each experiment is repeated 10 times. The Fast-Mat and Fast-Exp represent the fast GPs with Matérn kernels in Equation (2.3) and exponential kernels, respectively.
Example 1.
In this example, we use the Branin function and linear diffusion equation to generate the mean of the images.
-
1. (Branin function) The mean of the observation is defined by the Branin function\[\begin{aligned}{}f({x_{1}},{x_{2}})=& a{({x_{2}}-b{x_{1}^{2}}+c{x_{1}}-r)^{2}}+\\ {} & s(1-t)\cos ({x_{1}})+s,\end{aligned}\]where ${x_{1}}\in [-5,10]$ and ${x_{2}}\in [0,15]$. The default parameter values are: $a=1$, $b=\frac{5.1}{4{\pi ^{2}}}$, $c=\frac{5}{\pi }$, $r=6$, $s=10$ and $t=\frac{1}{8\pi }$. The input domain is discretized into a uniform grid with ${n_{1}}=100$ points along the ${x_{1}}$-axis and ${n_{2}}=100$ points along the ${x_{2}}$-axis. The observations are generated by $y({x_{1}},{x_{2}})=f({x_{1}},{x_{2}})+\epsilon $, where ϵ is an independent Gaussian noise with the noise variance ${\sigma _{0}^{2}}\in \{{1^{2}},{5^{2}},{10^{2}}\}$.
-
2. (Linear diffusion) The mean of the observation is governed by the partial differential equation where $f(x,t)$ represents the concentration of the diffusing material at location x and time t, and D is the diffusion coefficient. We set $D=1$ and discretize the spatial domain $[0,1]$ into 200 equally spaced grid points. The initial condition is set as $f(x,0)=0$, with a boundary condition at one end, maintaining a constant external concentration of 1. The signal is simulated over the time interval $t\in [0,0.2]$ using 200 time steps, computed with a numerical solver [49]. The observations are generated by $y(x,t)=f(x,t)+\epsilon $, where ϵ is an independent Gaussian noise. We evaluate the model performance under three configurations with noise variances ${\sigma _{0}^{2}}\in \{{0.05^{2}},0.{1^{2}},0.{3^{2}}\}$.
Example 2 (Cell images).
We consider two noisy cell microscopy images [32]: one for cell nuclei and the other for the whole cell. Independent Gaussian noise with ${\sigma _{0}^{2}}\in \{0.{1^{2}},0.{3^{2}},0.{5^{2}}\}$ is added to each image to generate the observations.
To compare the model performance in noise filtering, we consider the error of the predictive mean of the noisy observations, quantified by the root mean squared error (RMSE):
\[ \text{RMSE}=\sqrt{\frac{{\textstyle\textstyle\sum _{i=1}^{{n_{1}}}}{\textstyle\textstyle\sum _{j=1}^{{n_{2}}}}{(\hat{y}({\mathbf{x}_{i,j}})-\mathbb{E}[y({\mathbf{x}_{i,j}})])^{2}}}{{n_{1}}{n_{2}}}},\]
where $\hat{y}({\mathbf{x}_{i,j}})$ is the estimated mean of $y({\mathbf{x}_{i,j}})$.Figure 5 presents the RMSE in estimating the mean, based on the data-generating processes described in Example 1. Our fast GPs with the Matérn kernel in Equation (2.3) consistently achieve the highest accuracy across varying noise levels in both scenarios. This is because GPs with separable kernels capture spatial dependencies in both input directions, which are crucial for modeling images. In contrast, PCA implicitly assumes that the prior in one direction is independent, as the corresponding probabilistic model assumes that the latent factors are distributed as a standard Gaussian distribution [52]. Though DMD and FMOU assume the latent factors are from Gaussian processes, which capture output correlation over one input dimension, the latent factor loading matrix is estimated without modeling the correlation over the other input by a kernel function. In comparison, Fast GPs with Matérn kernel directly model the correlation over two inputs through a product kernel, which induces better predictions of the signals. Additionally, since DMD implicitly assumes a noise-free model [18], leading to degraded performance as noise variance increases. Figure S6 in the Supplementary Material shows the predictive signal by Fast GPs with Matérn kernel from noisy observations generated by the Branin function and the linear diffusion equation, which demonstrates close alignment with the ground truth.
Figure 7
(A) True mean of cell nuclei. (B) Noisy observation of cell nuclei (${\sigma _{0}}=0.1$). (C) Predictive mean of cell nuclei by fast GPs on lattice data with Matérn kernels in Equation (2.3). (D) True mean of the whole cell. (E) Noisy observation of the whole cell (${\sigma _{0}}=0.1$). (F) Predictive mean of the whole cell by Fast GPs on lattice data with Matérn kernels.
Figure 6 compares the accuracy of five approaches in denoising cell nuclei and whole-cell images in Example 2. The fast GPs with Matérn and exponential kernels significantly outperform PCA, FMOU and DMD across all settings, owing to their ability to estimate correlation parameters in both directions. Figure 7 plots the signal, noisy observations, and predictive mean from the fast GPs with a Matérn kernel in Equation (2.3) for denoising the noisy cell images. The predictive mean is close to the latent mean of the observations.
Figure 8
Comparison of the computational time in seconds for evaluating the profile likelihood of ${\mathbf{Y}_{v}}$ using direct and fast computation under varying sample sizes N. The direct computation with the exponential kernel (Direct-Exp) is plotted as orange dots; the direct computation with the Matérn kernels in Equation (2.3) (Direct-Mat) is plotted as green triangles; 3) the fast computation with the exponential kernel (Fast-Exp) is plotted as pink squares; The fast computation with the Matérn kernels in Equation (2.3) (Fast-Mat) is plotted as blue crosses. The left panel shows the computational time in seconds (original scale), while the right panel displays the logarithmic scale.
Additionally, we compare the computational cost of evaluating the profile likelihood by direct computation and fast computation. Simulations are conducted across varying sample sizes, ranging from ${10^{2}}$ to ${80^{2}}$. As shown in Figure 8, the fast algorithm in Equation (2.9) significantly reduces computational time compared to the direct approach, particularly for large sample sizes.
5 Numerical Studies of Cell Segmentation
5.1 Criteria and Methods
We use microscopy images of human dermal fibroblasts (hDFs) in [32] for testing different approaches. More details on the experimental conditions and techniques can be found in Section S7 in the Supplementary Material. These images contain both nuclei and cytoplasm channels, and evaluation of the proposed GP-based segmentation method was performed on both channels.
To evaluate the accuracy of the truth and segmentation results, we compare the area of overlap between the detected object and the true object relative to the overall union area of the two objects [39]. Let g represent a ground truth mask and p a predicted mask. The Intersection over Union (IoU) metric between g and p is defined as:
A higher IoU indicates a better match between the predictive mask and true mask of an object, with the perfect match occurring at an IoU of 1. We test several levels of IoU for each method: $\alpha =[0.5,0.55,0.6,0.65,0.7,0.75,0.8]$. When $\alpha =0.6$, for instance, it means that whenever the IoU defined in Equation (5.1) of a cell is larger than 0.6 by a certain method, this object will be classified as a true positive for this method. The range of thresholds allows us to assess image segmentation performance under distinct precision levels for defining true positives.
The average precision metric ($\text{AP}(\alpha )$) at each IoU threshold $\alpha \in [0,1]$ is often used for evaluating the correction of image segmentation [50]. For $\alpha \in [0,1]$, a True Positive (TP) means that the ground truth mask g is matched to a predicted mask p with $\text{IoU}(g,p)\ge \alpha $. Otherwise, unmatched predicted masks are counted as False Positives (FP), and unmatched ground truth masks are counted as False Negatives (FN). The $\text{AP}(\alpha )$ is calculated as follows:
We compare ImageJ segmentation and the GP-based unsupervised detection algorithm developed in this work. We followed the conventional processing steps in ImageJ to generate the cell masks [46]. All input images are converted to grayscale. To detect the foreground and background pixels, a threshold was set for each image using the default methods. To reduce noises in ImageJ, a despeckle operation, which applies a $3\times 3$ median filter, was then applied to each image. A watershed algorithm was used to ensure better separation of connecting cells. Finally, the cell masks were generated. The cell size is selected to be larger than 30 pixels. Then, the binary image and the labeled mask can be generated in ImageJ. A detailed description of each step in this process for ImageJ can be found in Section S8 in the Supplementary Material.
Figure 9
Results for segmenting cell nuclei. (A) AP scores for GP-based segmentation and ImageJ across different thresholds. (B) Boxplots of AP scores for each method. These plots are created using the APs for each of the 5 images using each method across different thresholds. (C) True boundaries of the fifth test image with $1024\times 1024$ pixels. (D) Boundaries generated by the GP-based method for the fifth test image. (E) Boundaries generated using ImageJ for the fifth test image.
5.2 Image Segmentation Results for Cell Nuclei
We first use five images of cell nuclei to test different methods. We choose a wide range of image sizes to test methods for images with different sizes. The smallest dimension of these images is $374\times 250$, and the largest image is $962\times 1128$. The experimental details of these images are shown in Table S1 in the Supplementary Material.
In panel (A) of Figure 9, we plot the $\text{AP}(\alpha )$ between the GP-based segmentation and ImageJ segmentation methods by solid lines and dashed lines, respectively, for 5 test images of nuclei at distinct threshold α. For any image and threshold level, the solid line is almost always above the dashed line with the same color, as the GP-based method consistently achieves higher AP than ImageJ at any threshold level α. The performance gap becomes more obvious at intermediate IoU thresholds between 0.6 to 0.7, which means that the GP-based method performs better at recovering the overall structure of the object than ImageJ. The AP score of both methods drops at a high IoU threshold, as the annotated object may not be fully accurate, which can affect defining the truth. Panel (B) in Figure 9 shows the distribution of these AP scores from GPs and ImageJ segmentation methods averaged at each threshold. Similar to the results in panel (A), the GP-based method achieves higher median AP scores across all thresholds. The difference between the GP-based method and ImageJ is more pronounced at a high threshold level, such as $\alpha =0.75$ or $\alpha =0.8$, where the boxes of two colors nearly have no overlap, indicating that GP-based method better capture the fine-scale boundary details than ImageJ.
The true annotated boundaries, those generated by the GP-based method, and those generated by ImageJ of the fifth test image are plotted in Figure 9 (C)-(E). The GP-based image segmentation method generally produces smoother, more continuous boundaries capturing the actual cell nuclei, even in cases where cells were closely clustered. In contrast, the ImageJ segmentation method sometimes produces fragmented boundaries and erroneously segmented individual nuclei into multiple smaller regions, especially in regions where the variations of pixel intensity are large. Additionally, ImageJ fails to detect multiple cells entirely. Further image refinement may be achieved using additional plug-ins or filters from ImageJ. However, this process would require extensive human intervention for trial and error, which would substantially increase the image processing time.
5.3 Image Segmentation Results for Whole Cells
Figure 10
Results for segmenting whole cells. (A) AP scores for GP-based segmentation and ImageJ across different thresholds. (B) Boxplots of AP scores for each method. (C) True boundaries of the first test image with $602\times 600$ pixels. (D) Boundaries generated by the GP-based method of the first test image. (E) Same for panel (D) but generated using ImageJ.
We compare the performance of the GP-based segmentation method and the ImageJ method for five whole-cell images. The smallest image is $602\times 600$, and the largest is $1000\times 1000$.
Panel (A) of Figure 10 shows the AP with different thresholds by the two methods for five images of whole cells. The variation of AP by both methods for the five images of whole cells is larger than the five images of the nuclei images. This is expected as segmenting whole cells is much more challenging due to the distinct shapes of the whole-cell images and more objects connecting to each other. Nonetheless, the solid line representing the AP for the GP-based method is again almost always above the dashed line representing the AP of the ImageJ segmentation results with the same color for all thresholds, meaning that the GP method consistently achieves higher AP scores than ImageJ. Both methods show much lower precision at a smaller IoU threshold, as correctly identifying irregular boundary shapes in connecting whole cells is extremely hard to capture. Such boundaries also pose challenges for humans to annotate, and for supervised methods to segment, even when annotated data are available.
Panel (B) of Figure 10 shows the distribution of the AP scores between the two methods. Compared to the segmentation results of the cell nuclei, the improvement by the GP-based segmentation methods is more pronounced for images of whole cells. The variation of AP by the GP-based segmentation is larger for whole-cell images than the ones for the images of nuclei, due to the difficulty in identifying the whole-cell shape either by statistical learning methods or by humans.
The ground truth boundaries, the GP-based method generated boundaries, and the ImageJ generated boundaries of the first text image are given in panels (C)-(E) in Figure 10. The GP-based method generally produces boundaries that better fit the actual cell shapes, preserving the irregular shapes of the cell cytoplasm. Although the GP-based method generally better preserves the true cell shapes, cell objects were also often grouped together. In contrast, ImageJ segmentation often produces fragmented boundaries and splits individual cell shapes into multiple smaller regions.
6 Concluding Remarks
In this study, we have presented a scalable cell image segmentation method for both the cell nucleus and cytoplasm. By leveraging fast Gaussian processes, automated thresholding and watershed operations, we have shown that the process is not only more automated, but the results are more precise than other unsupervised segmentation methods, such as ImageJ. A key strength of the proposed segmentation algorithm is that it does not rely on parameter tuning or labeled training datasets. The results of our unsupervised method demonstrate versatility across different cell channels, making it an appealing option for more general segmentation tasks beyond cell segmentation.
Several future directions are worth exploring. First, modern microscopes provide time-lapse videos for tracking the changes in cellular behavior over time. A future goal is to integrate our segmentation methods with particle tracking and linking algorithms that solve a linear assignment problem [22]. As our approach does not restrict the shape of the object, it can be extended for image segmentation with objects undergoing dynamical changes, including microscopy images for the nucleation and growth process in crystallization [56] and protein dynamics [26]. Second, when labeled image data are available, it is worthwhile to explore how these data can be used for constraining the shapes of the objects to improve estimation and simultaneously enable segmenting new objects not in the database. Third, the majority of cell segmentation methods are developed for 2D microscopy images. It is of interest to develop cell segmentation methods for microscopy that capture cell behaviors in a 3D environment, such as the orientational order and alignment [21]. Similar to some popular cell segmentation tools, such as Cellpose [50], our approach aims to segment images that contain the same type of cell. A future direction is to leverage techniques in tools such as CellProfiler [5] to segment images that contain multiple types of cells. Furthermore, the uncertainty is often not quantified by the image segmentation methods, as the segmentation process often contains multiple steps, and the uncertainty from the convolutional neural network is an open question. As we model the image data with a probabilistic model, a future goal is to propagate the uncertainty throughout the analysis for uncertainty quantification.