The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 2, Issue 3 (2024)
  4. Automatically Score Tissue Images Like a ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • More
    Article info Full article

Automatically Score Tissue Images Like a Pathologist by Transfer Learning
Volume 2, Issue 3 (2024), pp. 330–338
Iris Yan  

Authors

 
Placeholder
https://doi.org/10.51387/23-NEJSDS53
Pub. online: 14 December 2023      Type: Methodology Article      Open accessOpen Access
Area: NextGen

Accepted
2 November 2023
Published
14 December 2023

Abstract

Cancer is the second leading cause of death in the world. Diagnosing cancer early on can save many lives. Pathologists have to look at tissue microarray (TMA) images manually to identify tumors, which can be time-consuming, inconsistent and subjective. Existing automatic algorithms either have not achieved the accuracy level of a pathologist or require substantial human involvements. A major challenge is that TMA images with different shapes, sizes, and locations can have the same score. Learning staining patterns in TMA images requires a huge number of images, which are severely limited due to privacy and regulation concerns in medical organizations. TMA images from different cancer types may share certain common characteristics, but combining them directly harms the accuracy due to heterogeneity in their staining patterns. Transfer learning is an emerging learning paradigm that allows borrowing strength from similar problems. However, existing approaches typically require a large sample from similar learning problems, while TMA images of different cancer types are often available in small sample size and further existing algorithms are limited to transfer learning from one similar problem. We propose a new transfer learning algorithm that could learn from multiple related problems, where each problem has a small sample and can have a substantially different distribution from the original one. The proposed algorithm has made it possible to break the critical accuracy barrier (the 75% accuracy level of pathologists), with a reported accuracy of 75.9% on breast cancer TMA images from the Stanford Tissue Microarray Database. It is supported by recent developments in transfer learning theory and empirical evidence in clustering technology. This will allow pathologists to confidently adopt automatic algorithms in recognizing tumors consistently with a higher accuracy in real time.

References

[1] 
Agarwal, N., Sondhi, A., Chopra, K. and Singh, G. Transfer learning: Survey and classification. In Smart Innovations in Communication and Computational Sciences 145–155 (2021).
[2] 
Breiman, L. Random Forests. Machine Learning 45(1) 5–32 (2001). MR3874153
[3] 
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS) (2020).
[4] 
Brunye, T., Mercan, E., Weaver, D. L. and Elmore, J. G. Accuracy is in the eyes of the pathologist: The visual interpretive process and diagnostic accuracy with digital whole slide images. Journal of Biomedical Informatics 66 171–179 (2017).
[5] 
Cai, T. and Wei, H. Transfer learning for nonparametric classification: Minimax rate and adaptive classifier. The Annals of Statistics 49(1) 100–128 (2021). https://doi.org/10.1214/20-AOS1949. MR4206671
[6] 
Camp, R., Chung, G., Rimm, D. et al. Automated subcellular localization and quantification of protein expression in tissue microarrays. Nature Medicine 8(11) 1323–1327 (2002).
[7] 
Camp, R., Neumeister, V. and Rimm, D. A decade of tissue microarrays: progress in the discovery and validation of cancer biomarkers. Journal of Clinical Oncology 26(34) 5630–5637 (2008).
[8] 
Caruana, R. Multitask learning. Machine Learning 28 41–75 (1997).
[9] 
Caruana, R., Karampatziakis, N. and Yessenalina, A. An empirical evaluation of supervised learning in high dimensions. In Proceedings of ICML 96–103 (2008).
[10] 
Cortes, C. and Vapnik, V. N. Support-vector networks. Machine Learning 20(3) 273–297 (1995).
[11] 
Dai, W., Yang, Q., Xue, G. and Yu, Y. Boosting for transfer learning. In Proceedings of the 24th International Conference on Machine Learning (ICML) (2007).
[12] 
Devlin, J., Lee, M. -W. and Bert, K. T. Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019).
[13] 
Freund, Y. and Schapire, R. Experiments with a new boosting algorithm. In Proceedings of the 13rd International Conference on Machine Learning (ICML) (1996). MR2920188
[14] 
Goodfellow, I., Bengio, Y. and Courville, A. Deep Learning. The MIT Press (2016). MR3617773
[15] 
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS) (2014).
[16] 
Grace-Jones, W. Tissue microarray. In Theory and Practice of Histological Techniques, 6th Edition 527–535 (2012).
[17] 
Haralick, R. M. Statistical and structural approaches to texture. Proceedings of IEEE 67(5) 786–803 (1979).
[18] 
Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer (2001). https://doi.org/10.1007/978-0-387-21606-5. MR1851606
[19] 
Holmes, S., Kapelner, A. and Lee, P. An interactive Java statistical image segmentation system: Gemident. Journal of Statistical Software 30(10) 1–20 (2009).
[20] 
Jawhar, N. Tissue microarray: A rapidly evolving diagnostic and research tool. Annals of Saudi Medicine 29(2) 123–127 (2009).
[21] 
Kpotufe, S. and Martinet, G. Marginal singularity and the benefits of labels in covariate-shift. The Annals of Statistics 49(6) 3299–3323 (2021). https://doi.org/10.1214/21-aos2084. MR4352531
[22] 
Long, M., Cao, Y., Wang, J. and Jordan, M. Learning transferable features with deep adaptation networks. In Proceedings of ICML (2015).
[23] 
Marinelli, R., Montgomery, K., Liu, C., Shah, N., Prapong, W., Nitzberg, M., Zachariah, Z., Sherlock, G., Natkunam, Y., West, R. et al. The Stanford tissue microarray database. Nucleic Acids Research 36 D871–D877 (2007).
[24] 
Oquab, M., Bottou, L., Laptev, I. and Sivic, J. Learning and transferring mid-level image representations using convolutional neural network. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1717–1724 (2014).
[25] 
Page, C., Mes-Masson, A. and Magliocco, A. M. Tissue microarrays in studying gynecological cancers. In Cancer Genomics 65–76 (2014).
[26] 
Pan, S. and Yang, Q. A survey on transfer learning. IEEE Transaction on Knowledge and Data Engineering 22(10) 1345–1359 (2010).
[27] 
Shafer, G. and Vovk, V. A tutorial on conformal prediction. Journal of Machine Learning Research 9 371–421 (2008). MR2417240
[28] 
Singh, A., Nowak, R. and Zhu, X. Unlabeled data: Now it helps, now it doesn’t. In Proceedings of Neural Information Processing Systems (NeurIPS) 21 1513–1520 (2009).
[29] 
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C. and Liu, C. A survey on deep transfer learning. In Proceedings of 27th International Conference on Artificial Neural Networks (ICANN) (2018).
[30] 
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C. and Liu, C. Adversarially robust transfer learning. In International Conference on Learning Representations (ICLR) (2020).
[31] 
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K. and Darrell, T. Deep domain confusion: Maximizing for domain invariance (2014). arXiv:1412.3474
[32] 
Vrolijk, H., Sloos, W., Mesker, W., Franken, P., Fodde, R., Morreau, H. and Tanke, H. Automated acquisition of stained tissue microarrays for high throughput evaluation of molecular targets. Journal of Molecular Diagnostics 5(3) 160–167 (2003).
[33] 
Yan, D., Wang, P., Knudsen, B. S., Linden, M. and Randolph, T. W. Statistical methods for tissue microarray images – algorithmic scoring and co-training. The Annals of Applied Statistics 6(3) 1280–1305 (2012). https://doi.org/10.1214/12-AOAS543. MR3012530

Full article PDF XML
Full article PDF XML

Copyright
© 2024 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Tissue image scoring Transfer learning Small training sample Multiple auxiliary sets

Metrics
since December 2021
237

Article info
views

143

Full article
views

122

PDF
downloads

46

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy