Unsupervised Cell Segmentation by Fast Gaussian Processes
Pub. online: 28 January 2026
Type: Methodology Article
Open Access
Area: Engineering Science
Accepted
5 January 2026
5 January 2026
Published
28 January 2026
28 January 2026
Abstract
Cell boundary information is crucial for analyzing cell behaviors from time-lapse microscopy videos. Existing supervised cell segmentation tools, such as ImageJ, require tuning various parameters and rely on restrictive assumptions about the shape of the objects. While recent supervised segmentation tools based on convolutional neural networks enhance accuracy, they depend on high-quality labeled images, making them unsuitable for segmenting new types of objects not in the database. We developed a novel unsupervised cell segmentation algorithm based on fast Gaussian processes for noisy microscopy images without the need for parameter tuning or restrictive assumptions about the shape of the object. We derived robust thresholding criteria adaptive for heterogeneous images containing distinct brightness at different parts to separate objects from the background, and employed watershed segmentation to distinguish touching cell objects. Both simulated studies and real-data analysis of large microscopy images demonstrate the scalability and accuracy of our approach compared with the alternatives.
Supplementary material
Supplementary MaterialThe supplementary material provides additional details for image segmentation, experiments, and generation of ground truth.
References
Berkooz, G., Holmes, P. and Lumley, J. L. (1993). The proper orthogonal decomposition in the analysis of turbulent flows. Annual review of Fluid Mechanics 25(1) 539–575. MR1204279
Bhawnesh, K., Tiwari, U., Kumar, S., Tomer, V. and Kalra, J. (2020). Comparison and performance evaluation of boundary fill and flood fill algorithm. International Journal of Innovative Technology and Exploring Engineering 8. https://doi.org/10.35940/ijitee.L1002.10812S319
Chang, W., Haran, M., Applegate, P. and Pollard, D. (2016). Calibrating an ice sheet model using high-dimensional binary spatial data. Journal of the American Statistical Association 111(513) 57–72. https://doi.org/10.1080/01621459.2015.1108199. MR3494638
Datta, A., Banerjee, S., Finley, A. O. and Gelfand, A. E. (2016). Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association 111(514) 800–812. https://doi.org/10.1080/01621459.2015.1044091. MR3538706
Fang, X. and Gu, M. (2024). The inverse Kalman filter. arXiv preprint arXiv:2407.10089. https://doi.org/10.1093/biomet/asaf054. MR4985298
Gramacy, R. B. and Apley, D. W. (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics 24(2) 561–578. https://doi.org/10.1080/10618600.2014.914442. MR3357395
Gu, M. and Li, H. (2022). Gaussian Orthogonal Latent Factor Processes for Large Incomplete Matrices of Correlated Data. Bayesian Analysis 17(4) 1219–1244. https://doi.org/10.1214/21-ba1295. MR4506027
Gu, M., Palomo, J. and Berger, J. O. (2019). RobustGaSP: Robust Gaussian Stochastic Process Emulation in R. The R Journal 11(1) 112–136. https://doi.org/10.32614/RJ-2019-011
Gu, M., Wang, X. and Berger, J. O. (2018). Robust Gaussian stochastic process emulation. Annals of Statistics 46(6A) 3038–3066. https://doi.org/10.1214/17-AOS1648. MR3851764
Gu, M., Lin, Y., Lee, V. C. and Qiu, D. Y. (2024). Probabilistic forecast of nonlinear dynamical systems with uncertainty quantification. Physica D: Nonlinear Phenomena 457 133938. https://doi.org/10.1016/j.physd.2023.133938. MR4660232
Guinness, J. and Fuentes, M. (2017). Circulant embedding of approximate covariances for inference from Gaussian data on large lattices. Journal of computational and Graphical Statistics 26(1) 88–97. https://doi.org/10.1080/10618600.2016.1164534. MR3610410
Katzfuss, M. (2017). A multi-resolution approximation for massive spatial datasets. Journal of the American Statistical Association 112(517) 201–214. https://doi.org/10.1080/01621459.2015.1123632. MR3646566
Kaufman, C. G., Schervish, M. J. and Nychka, D. W. (2008). Covariance tapering for likelihood-based estimation in large spatial data sets. Journal of the American Statistical Association 103(484) 1545–1555. https://doi.org/10.1198/016214508000000959. MR2504203
Lin, Y., Liu, X., Segall, P. and Gu, M. (2025). Fast data inversion for high-dimensional dynamical systems from noisy measurements. arXiv preprint arXiv:2501.01324.
Lindgren, F., Rue, H. and Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(4) 423–498. https://doi.org/10.1111/j.1467-9868.2011.00777.x. MR2853727
Lu, H. and Tartakovsky, D. M. (2020). Prediction accuracy of dynamic mode decomposition. SIAM Journal on Scientific Computing 42(3) 1639–1662. https://doi.org/10.1137/19M1259948. MR4102719
Luo, Y., Gu, M., Park, M., Fang, X., Kwon, Y., Urueña, J. M., Read de Alaniz, J., Helgeson, M. E., Marchetti, C. M. and Valentine, M. T. (2023). Molecular-scale substrate anisotropy, crowding and division drive collective behaviours in cell monolayers. Journal of the Royal Society Interface 20(204) 20230160.
Ronneberger, O., Fischer, P. and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18 234–241. Springer.
Roustant, O., Ginsbourger, D. and Deville, Y. (2012). DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization. Journal of Statistical Software 51(1) 1–55. https://doi.org/10.18637/jss.v051.i01
Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the royal statistical society: Series b (statistical methodology) 71(2) 319–392. https://doi.org/10.1111/j.1467-9868.2008.00700.x. MR2649602
Schmid, P. J. (2010). Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics 656 5–28. https://doi.org/10.1017/S0022112010001217. MR2669948
Tipping, M. E. and Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61(3) 611–622. https://doi.org/10.1111/1467-9868.00196. MR1707864
Tsukui, T., Sun, K. -H., Wetter, J. B., Wilson-Kanamori, J. R., Hazelwood, L. A., Henderson, N. C., Adams, T. S., Schupp, J. C., Poli, S. D., Rosas, I. O. et al. (2020). Collagen-producing lung cell atlas identifies multiple subsets with distinct localization and relevance to fibrosis. Nature communications 11(1) 1920.
Tu, J. H., Rowley, C. W., Luchtenburg, D. M., Brunton, S. L. and Kutz, J. N. (2014). On dynamic mode decomposition: theory and applications. Journal of Computational Dynamics 1(2) 391–421. https://doi.org/10.3934/jcd.2014.1.391. MR3415261
Vincent, L. and Soille, P. (1991). Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(6) 583–598. https://doi.org/10.1109/34.87344
Zhu, Y., Peruzzi, M., Li, C. and Dunson, D. B. (2024). Radial neighbours for provably accurate scalable approximations of Gaussian processes. Biometrika 111(4) 1151–1167. https://doi.org/10.1093/biomet/asae029. MR4830051