{"id":231,"date":"2021-09-15T16:00:57","date_gmt":"2021-09-15T07:00:57","guid":{"rendered":"https:\/\/aidalab.cafe24.com\/?page_id=231"},"modified":"2025-06-25T19:22:24","modified_gmt":"2025-06-25T10:22:24","slug":"deep-learning-image-segmentation","status":"publish","type":"page","link":"https:\/\/aida.korea.ac.kr\/?page_id=231","title":{"rendered":""},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">Deep Learning \u2013 Image Segmentation<\/h1>\n\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><strong>PrevMatch: Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation<\/strong><\/strong><\/h2>\n\n\n\n<p><strong>Objective<\/strong><\/p>\n\n\n\n<p>In semi-supervised semantic segmentation, the Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. However, despite their high performance, these approaches frequently involve complex training pipelines and a substantial computational burden, limiting the scalability and compatibility of these methods. In this paper, we propose a PrevMatch framework that effectively mitigates the aforementioned limitations by maximizing the utilization of the temporal knowledge obtained during the training process.<\/p>\n\n\n\n<p><strong>Data<\/strong><\/p>\n\n\n\n<p>We use the Pascal-VOC [1], Cityscapes [2] , COCO [3], and ADE20K [4] datasets.<\/p>\n\n\n\n<p class=\"has-small-font-size\">[1] M. Everingham et al., \u201cThe pascal visual object classes (voc) challenge\u201d, 2010.<br>\n[2] M. Cordts et al., \u201cThe cityscapes dataset for semantic urban scene understanding\u201d, 2016.<br>\n[3] T.-Y. Lin et al., \u201cMicrosoft coco: Common objects in context\u201d, 2014.<br>\n[4] B. Zhou et al., \u201cScene parsing through ade20k dataset,\u201d, 2017\n<\/p>\n\n\n\n<p><strong>Related Work<\/strong><\/p>\n\n\n\n<p>PS-MT and Dual Teacher methods implement a dual EMA teacher-based framework to mitigate the coupling problem between the teacher and student models, with two teachers alternately updating each epoch based on the EMA of the student\u2019s weights.<\/p>\n\n\n\n<p>Co-training approaches provide diverse pseudo-label guidance with stability and without concerns regarding the coupling problem.<\/p>\n\n\n<figure class=\"wp-block-image aligncenter size-full\">\n    <img decoding=\"async\" src=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2025\/02\/Prematch.png\" alt=\"\" class=\"wp-image-1727\"\/>\n    <figcaption class=\"wp-element-caption\">\n        [5] Shin, Wooseok, et al. &#8220;Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation\u201c, 2024.\n    <\/figcaption>\n<\/figure>\n\n\n<p><strong>Proposed Method<\/strong><\/p>\n\n\n\n<p>We propose the PrevMatch framework, which efficiently expands pseudo-label views by maximizing the utilization of previous models obtained during training. The PrevMatch framework is based on two main ideas. First, to efficiently address the coupling problem, we revisit the utilization of temporal knowledge. Specifically, we save several models at specific epochs during training and utilize their predictions as additional guidance, referred to as previous guidance, which acts as a regularizer in conjunction with standard guidance. Second, we design a highly randomized ensemble strategy to maximize the effectiveness of utilizing the previous guidance. This approach involves selecting a random number of models from those previously saved and ensembling their predictions using randomized weights. These strategies can efficiently provide diverse and reliable pseudo-labels while avoiding the complexities inherent in dual EMA and co-training-based approaches.<\/p>\n\n\n<figure class=\"wp-block-image aligncenter size-full\">\n    <img decoding=\"async\" src=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2025\/02\/Prematch2.png\" alt=\"\" class=\"wp-image-1727\"\/>\n    <figcaption class=\"wp-element-caption\">\n        [5] Shin, Wooseok, et al. &#8220;Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation\u201c, 2024.\n    <\/figcaption>\n<\/figure>\n\n\n<p> Extensive experiments conducted across various evaluation protocols on the PASCAL, Cityscapes, COCO, and ADE20K datasets reveal that the proposed PrevMatch significantly outperforms existing methods.<\/p>\n\n\n<figure class=\"wp-block-image aligncenter size-full\">\n    <img decoding=\"async\" src=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2025\/02\/Prematch3.png\" alt=\"\" class=\"wp-image-1727\"\/>\n    <figcaption class=\"wp-element-caption\">\n        [5] Shin, Wooseok, et al. &#8220;Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation\u201c, 2024.\n    <\/figcaption>\n<\/figure>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><strong>Leveraging Out-of-Distribution Unlabeled Images: Semi-Supervised Semantic Segmentation with an Open-Vocabulary Model<\/strong><\/strong><\/h2>\n\n\n\n<p><strong>Objective<\/strong><\/p>\n\n\n\n<p>In semi-supervised semantic segmentation, existing studies have shown promising results in academic settings with controlled splits of benchmark datasets. However, the potential benefits of leveraging significantly larger sets of unlabeled images remain unexplored. In real-world scenarios, abundant unlabeled images are often available from online sources (web-scraped images) or large-scale datasets. However, these images may have different distributions from those of the target dataset, a situation known as out-of-distribution (OOD). Using these images as unlabeled data in semi-supervised learning can lead to inaccurate pseudo-labels, potentially misguiding network training. <\/p>\n\n\n\n<p><strong>Data<\/strong><\/p>\n\n\n\n<p>We use the Pascal VOC [1], Pascal Context [2], and COCO [3] datasets.<\/p>\n\n\n\n<p class=\"has-small-font-size\">[1] M. Everingham et al., \u201cThe pascal visual object classes (voc) challenge\u201d, 2010.<br>\n[2] R. Mottaghi et al., \u201cThe role of context for object detection and semantic segmentation in the wild\u201d, 2014.<br>\n[3] T.-Y. Lin et al., \u201cMicrosoft coco: Common objects in context\u201d, 2014.<br>\n\n<\/p>\n\n\n\n<p><strong>Related Work<\/strong><\/p>\n\n\n\n<p>Existing semi-supervised semantic segmentation studies have shown promising results in academic settings where benchmark datasets are split into various setups based on different proportions or numbers of labeled and unlabeled images. However, there has been little exploration of the potential benefits of leveraging more unlabeled images.<\/p>\n\n\n<figure class=\"wp-block-image aligncenter size-full\">\n    <img decoding=\"async\" src=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2025\/02\/SemiOVS.png\" alt=\"\" class=\"wp-image-1727\"\/>\n<\/figure>\n\n\n<p><strong>Proposed Method<\/strong><\/p>\n\n\n\n<p>We propose a new semi-supervised segmentation framework with an open-vocabulary segmentation model (SemiOVS) to effectively utilize unlabeled OOD images. we integrate an open-vocabulary segmentation model into the existing semi-supervised learning process. In particular, the OVS model generates pseudo-labels for OOD images. Then, the standard segmentation model uses these pseudo-labels to learn OOD objects. This strategy provides the standard segmentation model with reliable guidance for OOD images, expanding its understanding to objects and scenes beyond the in-distribution data.<\/p>\n\n\n<figure class=\"wp-block-image aligncenter size-full\">\n    <img decoding=\"async\" src=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2025\/02\/SemiOVS2.png\" alt=\"\" class=\"wp-image-1727\"\/>\n    <figcaption class=\"wp-element-caption\">\n        [4] Shin, Wooseok, et al., \u201cLeveraging Out-of-Distribution Unlabeled Images: Semi-Supervised Semantic Segmentation with an Open-Vocabulary Model\u201d, (Under  Review)\n    <\/figcaption>\n<\/figure>\n\n\n<p>Extensive experiments on the Pascal VOC and Pascal Context datasets reveal that (1) leveraging additional unlabeled images from the COCO dataset or online sources significantly improves the performance of the semi-supervised learner, and (2) using the OVS model to pseudo-label OOD images substantially improves performance.\n<\/p>\n\n\n<figure class=\"wp-block-image aligncenter size-full\">\n    <img decoding=\"async\" src=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2025\/02\/SemiOVS3.png\" alt=\"\" class=\"wp-image-1727\"\/>\n    <figcaption class=\"wp-element-caption\">\n        [4] Shin, Wooseok, et al., \u201cLeveraging Out-of-Distribution Unlabeled Images: Semi-Supervised Semantic Segmentation with an Open-Vocabulary Model\u201d, (Under  Review)\n    <\/figcaption>\n<\/figure>\n\n\n\n\n<hr class=\"wp-block-separator is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><strong>Propagating Complementary Multi-Level Aggregation Network for Polyp Segmentation<\/strong><\/strong><\/h2>\n\n\n\n<p><strong>Objective<\/strong><\/p>\n\n\n\n<p>Colorectal cancer (CRC) usually begins as a polyp in the intestinal mucosa, and approximately one quarter of untreated polyps can develop into colon cancer.<\/p>\n\n\n\n<p>As the polyps are usually small and the boundaries are low in contrast to their surroundings, polyps can easily be mistaken for wrinkles or other intestinal structures.<\/p>\n\n\n\n<p>Polyp detection using colonoscopy images is a challenging task owing to the ambiguous image context.<\/p>\n\n\n\n<p><strong>Data<\/strong><\/p>\n\n\n\n<p>Train &amp; Validation: Kvasir (900 images), CVC-ClinicDB (550)<\/p>\n\n\n\n<p>Test: Kvasir (100), CVC-ClinicDB(62), CVC-ColonDB (380), ETIS (196), EndoScene.CVC-300 (60)<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-34.png\" alt=\"\" class=\"wp-image-1686\" width=\"646\" height=\"512\" srcset=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-34.png 764w, https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-34-300x238.png 300w\" sizes=\"auto, (max-width: 646px) 100vw, 646px\" \/><figcaption>Kvasir &#8211; https:\/\/datasets.simula.no\/kvasir-seg\/#download<br>CVC-Clinic &#8211; <a href=\"https:\/\/polyp.grand-challenge.org\/site\/Polyp\/CVCClinicDB\/\">https:\/\/polyp.grand-challenge.org\/site\/Polyp\/CVCClinicDB\/<\/a><br>CVC-Colon &#8211; <a href=\"http:\/\/www.cvc.uab.es\/CVC-Colon\/index.php\/databases\/\">http:\/\/www.cvc.uab.es\/CVC-Colon\/index.php\/databases\/<\/a><br>ETIS, EndoScene &#8211; http:\/\/www.cvc.uab.es\/CVC-Colon\/index.php\/databases\/cvc-endoscenestill\/<\/figcaption><\/figure><\/div>\n\n\n\n<p><strong>Related Work<\/strong><\/p>\n\n\n\n<p>U-Net and U-Net++ exhibits a distribution discrepancy between the low-level and high-level representations when aggregating multi-level features. <\/p>\n\n\n\n<p>Psi-Net and SFA addressed a joint training strategy using the polyp region and boundary detection tasks. <\/p>\n\n\n\n<p>PraNet employed a parallel reverse attention method with partial decoders to incorporate the polyp area and boundary features.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"520\" src=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-35-1024x520.png\" alt=\"\" class=\"wp-image-1687\" srcset=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-35-1024x520.png 1024w, https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-35-300x152.png 300w, https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-35-768x390.png 768w, https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-35.png 1421w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>U-Net: Convolutional Networks for Biomedical Image Segmentation<br>UNet++: A Nested U-Net Architecture for Medical Image Segmentation<br>Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation<br>SFA: Selective Feature Aggregation Network with Area-Boundary Constraints for Polyp Segmentation<br>PraNet: Parallel Reverse Attention Network for Polyp Segmentation<\/figcaption><\/figure><\/div>\n\n\n\n<p><strong>Proposed Method<\/strong><\/p>\n\n\n\n<p>The proposed network, COMMA, is designed to reduce the multi-level distribution discrepancy by propagating both refined levels and explicit boundary information. To proliferate distinct information, we employ multi-decoder structures consisting of CMMs and BPM.<\/p>\n\n\n\n<p>CMM : CMM clarifies the boundary noise in the low-level through the abstracted high-level representation and propagates the refined information to another decoder.<\/p>\n\n\n\n<p>BPM : BPM is designed to propagate the explicit boundary information to the complementary multi-level features by incorporating the lowest- and highest-level representations. The boundary information is propagated to the CMMs in the next decoder to enhance the segmentation performance.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"628\" src=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-36-1024x628.png\" alt=\"\" class=\"wp-image-1688\" srcset=\"https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-36-1024x628.png 1024w, https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-36-300x184.png 300w, https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-36-768x471.png 768w, https:\/\/aida.korea.ac.kr\/wp-content\/uploads\/2022\/05\/image-36.png 1253w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>COMMA: Propagating Complementary Multi-Level Aggregation Network for Polyp Segmentation<\/figcaption><\/figure><\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Deep Learning \u2013 Image Segmentation PrevMatch: Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation Objective In semi-supervised semantic segmentation, the Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. However, despite their high performance, these approaches frequently involve complex training pipelines and a substantial computational burden, limiting the scalability &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/aida.korea.ac.kr\/?page_id=231\" class=\"more-link\">Read more<span class=\"screen-reader-text\"> &#8220;&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-231","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/aida.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/pages\/231","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aida.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/aida.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/aida.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aida.korea.ac.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=231"}],"version-history":[{"count":11,"href":"https:\/\/aida.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/pages\/231\/revisions"}],"predecessor-version":[{"id":2389,"href":"https:\/\/aida.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/pages\/231\/revisions\/2389"}],"wp:attachment":[{"href":"https:\/\/aida.korea.ac.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=231"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}