دانشگاه زنجان - Mostafa Charmi

view:53599 Last Update: 2025-1-7

Mahnaz Moghaddam, Mostafa Charmi, Hossein Hassanpoor
Jointly Human Semantic Parsing and Attribute Recognition with Feature Pyramid Structure in EfficientNets

Abstract

Pedestrian attributes recognition is an important issue in computer vision and has a special role in the field of video surveillance. The previous methods presented to solve this issue are mainly based on multi-label end-to-end deep neural networks. These methods neglect to apply attributes for defining local feature areas and they suffer from the problems of the bounding box presence. This paper proposes a new framework for jointly human semantic parsing and pedestrian attribute recognition to achieve effective attribute recognition. By extracting human parts via semantic parsing, we can explore both semantic and spatial information with eliminating of background. The framework also uses multi-scale features to employ rich details and contextual information through proposed Attribute Recognition-Bidirectional Feature Pyramid Network (AR-BiFPN). For baseline network that has a significant impact on the performance, EfficientNet-B3 is selected as a baseline network from The EfficientNet family which provides an appropriate trade-off between the three factors of CNNs scaling (depth/width/resolution). Finally, the proposed framework is tested on datasets PETA, RAP, and PA-100k. Experimental results show that our method has superior performance in both Mean Accuracy (mA) and instance-based metrics compared to state-of-the-art results.

Back