Deep Learning – Image Segmentation
Propagating Complementary Multi-Level Aggregation Network for Polyp Segmentation
Objective
Colorectal cancer (CRC) usually begins as a polyp in the intestinal mucosa, and approximately one quarter of untreated polyps can develop into colon cancer.
As the polyps are usually small and the boundaries are low in contrast to their surroundings, polyps can easily be mistaken for wrinkles or other intestinal structures.
Polyp detection using colonoscopy images is a challenging task owing to the ambiguous image context.
Data
Train & Validation: Kvasir (900 images), CVC-ClinicDB (550)
Test: Kvasir (100), CVC-ClinicDB(62), CVC-ColonDB (380), ETIS (196), EndoScene.CVC-300 (60)

CVC-Clinic – https://polyp.grand-challenge.org/site/Polyp/CVCClinicDB/
CVC-Colon – http://www.cvc.uab.es/CVC-Colon/index.php/databases/
ETIS, EndoScene – http://www.cvc.uab.es/CVC-Colon/index.php/databases/cvc-endoscenestill/
Related Work
U-Net and U-Net++ exhibits a distribution discrepancy between the low-level and high-level representations when aggregating multi-level features.
Psi-Net and SFA addressed a joint training strategy using the polyp region and boundary detection tasks.
PraNet employed a parallel reverse attention method with partial decoders to incorporate the polyp area and boundary features.

UNet++: A Nested U-Net Architecture for Medical Image Segmentation
Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation
SFA: Selective Feature Aggregation Network with Area-Boundary Constraints for Polyp Segmentation
PraNet: Parallel Reverse Attention Network for Polyp Segmentation
Proposed Method
The proposed network, COMMA, is designed to reduce the multi-level distribution discrepancy by propagating both refined levels and explicit boundary information. To proliferate distinct information, we employ multi-decoder structures consisting of CMMs and BPM.
CMM : CMM clarifies the boundary noise in the low-level through the abstracted high-level representation and propagates the refined information to another decoder.
BPM : BPM is designed to propagate the explicit boundary information to the complementary multi-level features by incorporating the lowest- and highest-level representations. The boundary information is propagated to the CMMs in the next decoder to enhance the segmentation performance.

얼굴 피부 내 Object Segmentation
얼굴 피부 내 작은 객체들(pore, pigment, sebum, trouble 등)을 segmentation함에 있어, Radiance Noise의 영향을 최소화하여 재현성 높은 결과를 산출할 수 있는 학습 모델을 개발함.
(1) Prototype App을 통해 150명의 피부 데이터를 수집 후, crop → tiling → augmentation 전처리하여 약 4000장의 이미지를 모델에 인풋 (augmentation에 random bright&contras를 적용하여 radiance noise를 줄이고자 함)

(2) 객체 크기가 매우 작기 때문에 SOTA 모델 중 하나인 Mask R-CNN 모델을 적용하여 RoI Align 기반의 정확한 위치 정보를 학습하고자 함 (가장 작은 object 크기 2~4px)

(3) 보다 높은 성능을 위해 추후 Graph based Approach 및 Vision Transformer 등의 모델을 fine-tunin하여 적용하여 최종 성능을 비교 평가하고자 함.
Leveraging Out-of-Distribution Unlabeled Images: Semi-Supervised Semantic Segmentation with an Open-Vocabulary Model
Objective
In semi-supervised semantic segmentation, existing studies have shown promising results in academic settings with controlled splits of benchmark datasets. However, the potential benefits of leveraging significantly larger sets of unlabeled images remain unexplored. In real-world scenarios, abundant unlabeled images are often available from online sources (web-scraped images) or large-scale datasets. However, these images may have different distributions from those of the target dataset, a situation known as out-of-distribution (OOD). Using these images as unlabeled data in semi-supervised learning can lead to inaccurate pseudo-labels, potentially misguiding network training.
Data
We use the Pascal VOC [1], Pascal Context [2], and COCO [3] datasets.
[1] M. Everingham et al., “The pascal visual object classes (voc) challenge”, 2010.
[2] R. Mottaghi et al., “The role of context for object detection and semantic segmentation in the wild”, 2014.
[3] T.-Y. Lin et al., “Microsoft coco: Common objects in context”, 2014.
Related Work
Existing semi-supervised semantic segmentation studies have shown promising results in academic settings where benchmark datasets are split into various setups based on different proportions or numbers of labeled and unlabeled images. However, there has been little exploration of the potential benefits of leveraging more unlabeled images.

Proposed Method
We propose a new semi-supervised segmentation framework with an open-vocabulary segmentation model (SemiOVS) to effectively utilize unlabeled OOD images. we integrate an open-vocabulary segmentation model into the existing semi-supervised learning process. In particular, the OVS model generates pseudo-labels for OOD images. Then, the standard segmentation model uses these pseudo-labels to learn OOD objects. This strategy provides the standard segmentation model with reliable guidance for OOD images, expanding its understanding to objects and scenes beyond the in-distribution data.

Extensive experiments on the Pascal VOC and Pascal Context datasets reveal that (1) leveraging additional unlabeled images from the COCO dataset or online sources significantly improves the performance of the semi-supervised learner, and (2) using the OVS model to pseudo-label OOD images substantially improves performance.

PrevMatch: Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation
Objective
In semi-supervised semantic segmentation, the Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. However, despite their high performance, these approaches frequently involve complex training pipelines and a substantial computational burden, limiting the scalability and compatibility of these methods. In this paper, we propose a PrevMatch framework that effectively mitigates the aforementioned limitations by maximizing the utilization of the temporal knowledge obtained during the training process.
Data
We use the Pascal-VOC [1], Cityscapes [2] , COCO [3], and ADE20K [4] datasets.
[1] M. Everingham et al., “The pascal visual object classes (voc) challenge”, 2010.
[2] M. Cordts et al., “The cityscapes dataset for semantic urban scene understanding”, 2016.
[3] T.-Y. Lin et al., “Microsoft coco: Common objects in context”, 2014.
[4] B. Zhou et al., “Scene parsing through ade20k dataset,”, 2017
Related Work
PS-MT and Dual Teacher methods implement a dual EMA teacher-based framework to mitigate the coupling problem between the teacher and student models, with two teachers alternately updating each epoch based on the EMA of the student’s weights.
Co-training approaches provide diverse pseudo-label guidance with stability and without concerns regarding the coupling problem.

Proposed Method
We propose the PrevMatch framework, which efficiently expands pseudo-label views by maximizing the utilization of previous models obtained during training. The PrevMatch framework is based on two main ideas. First, to efficiently address the coupling problem, we revisit the utilization of temporal knowledge. Specifically, we save several models at specific epochs during training and utilize their predictions as additional guidance, referred to as previous guidance, which acts as a regularizer in conjunction with standard guidance. Second, we design a highly randomized ensemble strategy to maximize the effectiveness of utilizing the previous guidance. This approach involves selecting a random number of models from those previously saved and ensembling their predictions using randomized weights. These strategies can efficiently provide diverse and reliable pseudo-labels while avoiding the complexities inherent in dual EMA and co-training-based approaches.

Extensive experiments conducted across various evaluation protocols on the PASCAL, Cityscapes, COCO, and ADE20K datasets reveal that the proposed PrevMatch significantly outperforms existing methods.
