The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. To appear
  3. Unsupervised Cell Segmentation by Fast G ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • Related articles
  • More
    Article info Full article Related articles

Unsupervised Cell Segmentation by Fast Gaussian Processes
Laura Baracaldo   Blythe King   Haoran Yan     All authors (6)

Authors

 
Placeholder
https://doi.org/10.51387/26-NEJSDS97
Pub. online: 28 January 2026      Type: Methodology Article      Open accessOpen Access
Area: Engineering Science

Accepted
5 January 2026
Published
28 January 2026

Abstract

Cell boundary information is crucial for analyzing cell behaviors from time-lapse microscopy videos. Existing supervised cell segmentation tools, such as ImageJ, require tuning various parameters and rely on restrictive assumptions about the shape of the objects. While recent supervised segmentation tools based on convolutional neural networks enhance accuracy, they depend on high-quality labeled images, making them unsuitable for segmenting new types of objects not in the database. We developed a novel unsupervised cell segmentation algorithm based on fast Gaussian processes for noisy microscopy images without the need for parameter tuning or restrictive assumptions about the shape of the object. We derived robust thresholding criteria adaptive for heterogeneous images containing distinct brightness at different parts to separate objects from the background, and employed watershed segmentation to distinguish touching cell objects. Both simulated studies and real-data analysis of large microscopy images demonstrate the scalability and accuracy of our approach compared with the alternatives.

Supplementary material

 Supplementary Material
The supplementary material provides additional details for image segmentation, experiments, and generation of ground truth.

References

[1] 
Abràmoff, M. D., Magalhães, P. J. and Ram, S. J. (2004). Image processing with ImageJ. Biophotonics International 11(7) 36–42.
[2] 
Barbazan, J., Pérez-González, C., Gómez-González, M., Dedenon, M., Richon, S., Latorre, E., Serra, M., Mariani, P., Descroix, S., Sens, P. et al. (2023). Cancer-associated fibroblasts actively compress cancer cells and modulate mechanotransduction. Nature Communications 14(1) 6966.
[3] 
Berkooz, G., Holmes, P. and Lumley, J. L. (1993). The proper orthogonal decomposition in the analysis of turbulent flows. Annual review of Fluid Mechanics 25(1) 539–575. MR1204279
[4] 
Bhawnesh, K., Tiwari, U., Kumar, S., Tomer, V. and Kalra, J. (2020). Comparison and performance evaluation of boundary fill and flood fill algorithm. International Journal of Innovative Technology and Exploring Engineering 8. https://doi.org/10.35940/ijitee.L1002.10812S319
[5] 
Carpenter, A. E., Jones, T. R., Lamprecht, M. R., Clarke, C., Kang, I. H., Friman, O., Guertin, D. A., Chang, J. H., Lindquist, R. A., Moffat, J. et al. (2006). CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome biology 7(10) 1–11.
[6] 
Chang, W., Haran, M., Applegate, P. and Pollard, D. (2016). Calibrating an ice sheet model using high-dimensional binary spatial data. Journal of the American Statistical Association 111(513) 57–72. https://doi.org/10.1080/01621459.2015.1108199. MR3494638
[7] 
Datta, A., Banerjee, S., Finley, A. O. and Gelfand, A. E. (2016). Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association 111(514) 800–812. https://doi.org/10.1080/01621459.2015.1044091. MR3538706
[8] 
Eisenbarth, S. (2019). Dendritic cell subsets in T cell programming: location dictates function. Nature Reviews Immunology 19(2) 89–103.
[9] 
Eisenhoffer, G. T., Loftus, P. D., Yoshigi, M., Otsuna, H., Chien, C. -B., Morcos, P. A. and Rosenblatt, J. (2012). Crowding induces live cell extrusion to maintain homeostatic cell numbers in epithelia. Nature 484(7395) 546–549.
[10] 
Fang, X. and Gu, M. (2024). The inverse Kalman filter. arXiv preprint arXiv:2407.10089. https://doi.org/10.1093/biomet/asaf054. MR4985298
[11] 
Ferreira, T. and Rasband, W. (2011). ImageJ user guide. USA: National Institutes of Health.
[12] 
Folkman, J. and Moscona, A. (1978). Role of cell shape in growth control. Nature 273(5661) 345–349.
[13] 
Gramacy, R. B. and Apley, D. W. (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics 24(2) 561–578. https://doi.org/10.1080/10618600.2014.914442. MR3357395
[14] 
Gu, M. and Li, H. (2022). Gaussian Orthogonal Latent Factor Processes for Large Incomplete Matrices of Correlated Data. Bayesian Analysis 17(4) 1219–1244. https://doi.org/10.1214/21-ba1295. MR4506027
[15] 
Gu, M., Fang, X. and Luo, Y. (2023). Data-driven model construction for anisotropic dynamics of active matter. PRX Life 1(1) 013009.
[16] 
Gu, M., Palomo, J. and Berger, J. O. (2019). RobustGaSP: Robust Gaussian Stochastic Process Emulation in R. The R Journal 11(1) 112–136. https://doi.org/10.32614/RJ-2019-011
[17] 
Gu, M., Wang, X. and Berger, J. O. (2018). Robust Gaussian stochastic process emulation. Annals of Statistics 46(6A) 3038–3066. https://doi.org/10.1214/17-AOS1648. MR3851764
[18] 
Gu, M., Lin, Y., Lee, V. C. and Qiu, D. Y. (2024). Probabilistic forecast of nonlinear dynamical systems with uncertainty quantification. Physica D: Nonlinear Phenomena 457 133938. https://doi.org/10.1016/j.physd.2023.133938. MR4660232
[19] 
Guinness, J. and Fuentes, M. (2017). Circulant embedding of approximate covariances for inference from Gaussian data on large lattices. Journal of computational and Graphical Statistics 26(1) 88–97. https://doi.org/10.1080/10618600.2016.1164534. MR3610410
[20] 
Handcock, M. S. and Stein, M. L. (1993). A Bayesian analysis of kriging. Technometrics 35(4) 403–410.
[21] 
Huang, J., Chen, J. and Luo, Y. (2025). Cell-Sheet Shape Transformation by Internally-Driven, Oriented Forces. Advanced Materials 2416624.
[22] 
Jaqaman, K., Loerke, D., Mettlen, M., Kuwata, H., Grinstein, S., Schmid, S. L. and Danuser, G. (2008). Robust single-particle tracking in live-cell time-lapse sequences. Nature methods 5(8) 695–702.
[23] 
Katzfuss, M. (2017). A multi-resolution approximation for massive spatial datasets. Journal of the American Statistical Association 112(517) 201–214. https://doi.org/10.1080/01621459.2015.1123632. MR3646566
[24] 
Kaufman, C. G., Schervish, M. J. and Nychka, D. W. (2008). Covariance tapering for likelihood-based estimation in large spatial data sets. Journal of the American Statistical Association 103(484) 1545–1555. https://doi.org/10.1198/016214508000000959. MR2504203
[25] 
Khang, A., Barmore, A., Tseropoulos, G., Bera, K., Batan, D. and Anseth, K. S. (2025). Automated prediction of fibroblast phenotypes using mathematical descriptors of cellular features. Nature Communications 16(1) 1–17.
[26] 
Lee, G., Leech, G., Rust, M. J., Das, M., McGorty, R. J., Ross, J. L. and Robertson-Anderson, R. M. (2021). Myosin-driven actin-microtubule networks exhibit self-organized contractile dynamics. Sci. Adv. 7(6) 4334.
[27] 
Lin, Y., Liu, X., Segall, P. and Gu, M. (2025). Fast data inversion for high-dimensional dynamical systems from noisy measurements. arXiv preprint arXiv:2501.01324.
[28] 
Lindgren, F., Rue, H. and Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(4) 423–498. https://doi.org/10.1111/j.1467-9868.2011.00777.x. MR2853727
[29] 
Liu, D. and Yu, J. (2009). Otsu method and K-means. In 2009 Ninth International Conference on Hybrid Intelligent Systems 1 344–349. IEEE.
[30] 
Liu, W. F. and Chen, C. S. (2007). Cellular and multicellular form and function. Advanced Drug Delivery Reviews 59(13) 1319–1328.
[31] 
Lu, H. and Tartakovsky, D. M. (2020). Prediction accuracy of dynamic mode decomposition. SIAM Journal on Scientific Computing 42(3) 1639–1662. https://doi.org/10.1137/19M1259948. MR4102719
[32] 
Luo, Y., Gu, M., Park, M., Fang, X., Kwon, Y., Urueña, J. M., Read de Alaniz, J., Helgeson, M. E., Marchetti, C. M. and Valentine, M. T. (2023). Molecular-scale substrate anisotropy, crowding and division drive collective behaviours in cell monolayers. Journal of the Royal Society Interface 20(204) 20230160.
[33] 
Meijering, E. (2012). Cell segmentation: 50 years down the road. IEEE Signal Processing Magazine 29(5) 140–145.
[34] 
Nichele, L., Persichetti, V., Lucidi, M. and Cincotti, G. (2020). Quantitative evaluation of ImageJ thresholding algorithms for microbial cell counting. OSA Continuum 3(6) 1417–1427.
[35] 
Pau, G., Fuchs, F., Sklyar, O., Boutros, M. and Huber, W. (2010). EBImage—an R package for image processing with applications to cellular phenotypes. Bioinformatics 26(7) 979–981.
[36] 
Picheny, V., Wagner, T. and Ginsbourger, D. (2013). A benchmark of kriging-based infill criteria for noisy optimization. Structural and Multidisciplinary Optimization 48(3) 607–626.
[37] 
Prasad, A. and Alizadeh, E. (2019). Cell form and function: interpreting and controlling the shape of adherent cells. Trends in Biotechnology 37(4) 347–357.
[38] 
Rasmussen, C. E. (2006) Gaussian processes for Machine Learning. MIT Press.
[39] 
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I. and Savarese, S. (2019). Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] 
Ridler, T. W., Calvard, S. et al. (1978). Picture thresholding using an iterative selection method. IEEE Trans. Syst. Man Cybern 8(8) 630–632.
[41] 
Ronneberger, O., Fischer, P. and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18 234–241. Springer.
[42] 
Roustant, O., Ginsbourger, D. and Deville, Y. (2012). DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization. Journal of Statistical Software 51(1) 1–55. https://doi.org/10.18637/jss.v051.i01
[43] 
Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the royal statistical society: Series b (statistical methodology) 71(2) 319–392. https://doi.org/10.1111/j.1467-9868.2008.00700.x. MR2649602
[44] 
Schindelin, J., Arganda-Carreras, I., Frise, E., Kaynig, V., Longair, M., Pietzsch, T., Preibisch, S., Rueden, C., Saalfeld, S., Schmid, B. et al. (2012). Fiji: an open-source platform for biological-image analysis. Nature Methods 9(7) 676–682.
[45] 
Schmid, P. J. (2010). Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics 656 5–28. https://doi.org/10.1017/S0022112010001217. MR2669948
[46] 
Schneider, C. A., Rasband, W. S. and Eliceiri, K. W. (2012). NIH Image to ImageJ: 25 years of image analysis. Nature methods 9(7) 671–675.
[47] 
Selmeczi, D., Mosler, S., Hagedorn, P. H., Larsen, N. B. and Flyvbjerg, H. (2005). Cell motility as persistent random motion: theories from experiments. Biophysical journal 89(2) 912–931.
[48] 
Snelson, E. and Ghahramani, Z. (2006). Sparse Gaussian processes using pseudo-inputs. Advances in neural information processing systems 18 1257.
[49] 
Soetaert, K. E., Petzoldt, T. and Setzer, R. W. (2010). Solving differential equations in R: package deSolve. Journal of Statistical Software 33(9).
[50] 
Stringer, C., Wang, T., Michaelos, M. and Pachitariu, M. (2021). Cellpose: a generalist algorithm for cellular segmentation. Nature Methods 18(1) 100–106.
[51] 
Tinevez, J. -Y., Perry, N., Schindelin, J., Hoopes, G. M., Reynolds, G. D., Laplantine, E., Bednarek, S. Y., Shorte, S. L. and Eliceiri, K. W. (2017). TrackMate: an open and extensible platform for single-particle tracking. Methods 115 80–90.
[52] 
Tipping, M. E. and Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61(3) 611–622. https://doi.org/10.1111/1467-9868.00196. MR1707864
[53] 
Tsukui, T., Sun, K. -H., Wetter, J. B., Wilson-Kanamori, J. R., Hazelwood, L. A., Henderson, N. C., Adams, T. S., Schupp, J. C., Poli, S. D., Rosas, I. O. et al. (2020). Collagen-producing lung cell atlas identifies multiple subsets with distinct localization and relevance to fibrosis. Nature communications 11(1) 1920.
[54] 
Tu, J. H., Rowley, C. W., Luchtenburg, D. M., Brunton, S. L. and Kutz, J. N. (2014). On dynamic mode decomposition: theory and applications. Journal of Computational Dynamics 1(2) 391–421. https://doi.org/10.3934/jcd.2014.1.391. MR3415261
[55] 
Vincent, L. and Soille, P. (1991). Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(6) 583–598. https://doi.org/10.1109/34.87344
[56] 
Wu, K. -J., Edmund, C., Shang, C. and Guo, Z. (2022). Nucleation and growth in solution synthesis of nanostructures–from fundamentals to advanced applications. Progress in Materials Science 123 100821.
[57] 
Zhu, Y., Peruzzi, M., Li, C. and Dunson, D. B. (2024). Radial neighbours for provably accurate scalable approximations of Gaussian processes. Biometrika 111(4) 1151–1167. https://doi.org/10.1093/biomet/asae029. MR4830051

Full article Related articles PDF XML
Full article Related articles PDF XML

Copyright
© 2026 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Bayesian inference Image segmentation Lattice Microscopy Scalability

Funding
This research is partially supported by the Materials Research Science and Engineering Center (MRSEC, Data Expert Group and IRG-2) by the National Science Foundation under Award No. DMR-2308708 and the Cyberinfrastructure for Sustained Scientific Innovation program by the National Science Foundation under Award No. OAC-2411043.

Metrics
since December 2021
37

Article info
views

11

Full article
views

12

PDF
downloads

4

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy