Skip to content


Folders and files

Last commit message
Last commit date
Nov 18, 2019
Mar 6, 2020
Jan 8, 2020
Nov 15, 2019
Mar 22, 2020
Nov 15, 2019
Jan 16, 2020
Feb 17, 2020
Nov 15, 2019
Nov 15, 2019
Sep 9, 2019
Nov 15, 2019

Repository files navigation

Semantic-Aware Scene Recognition

GitHub version GitHub license GitHub stars

Official Pytorch Implementation of Semantic-Aware Scene Recognition by Alejandro López-Cifuentes, Marcos Escudero-Viñolo, Jesús Bescós and Álvaro García-Martín (Elsevier Pattern Recognition).



This paper propose to improve scene recognition by using object information to focalize learning during the training process. The main contributions of the paper are threefold:

  • We propose an end-to-end multi-modal deep learning architecture which gathers both image and context information using a two-branched CNN architecture.
  • We propose to use semantic segmentation as an additional information source to automatically create, through a convolutional neural network, an attention model to reinforce the learning of relevant contextual information.
  • We validate the effectiveness of the proposed method through experimental results on public scene recognition datasets such as ADE20K, MIT Indoor 67, SUN 397 and Places365 obtaining state-of-the-art results.

The propose CNN architecture is as follows:


State-of-the-art Results

ADE20K Dataset

RGB Semantic Top@1 Top@2 Top@5 MCA
55.90 67.25 78.00 20.96
50.60 60.45 72.10 12.17
62.55 73.25 82.75 27.00

MIT Indoor 67 Dataset

Method Backbone Number of Parameters Top@1
PlaceNet Places-CNN 62 M 68.24
MOP-CNN CaffeNet 62 M 68.90
CNNaug-SVM OverFeat 145 M 69.00
HybridNet Places-CNN 62 M 70.80
URDL + CNNaug AlexNet 62 M 71.90
MPP-FCR2 AlexNet 62 M 75.67
DSFL + CNN (7 Scales) AlexNet 62M 76.23
MPP + DSFL AlexNet 62 M 80.78
CFV VGG-19 143 M 81.00
CS VGG-19 143 M 82.24
SDO (1 Scale) 2 x VGG-19 276 M 83.98
VSAD 2 x VGG-19 276 M 86.20
SDO (9 Scales) 2 x VGG-19 276 M 86.76
Ours ResNet-18 + Sem Branch + G-RGB-H 47 M 85.58
Ours* ResNet-50 + Sem Branch + G-RGB-H 85 M 87.10

SUN 397 Dataset

Method Backbone Number of Parameters Top@1
Decaf AlexNet 62 M 40.94
MOP-CNN CaffeNet 62 M 51.98
HybridNet Places-CNN 62 M 53.86
Places-CNN Places-CNN 62 M 54.23
Places-CNN ft Places-CNN 62 M 56.20
CS VGG-19 143 M 64.53
SDO (1 Scale) 2 x VGG-19 276 M 66.98
VSAD 2 x VGG-19 276 M 73.00
SDO (9 Scale) 2 x VGG-19 276 M 73.41
Ours ResNet-18 + Sem Branch + G-RGB-H 47 M 71.25
Ours* ResNet-50 + Sem Branch + G-RGB-H 85 M 74.04

Places 365 Dataset

Network Number of Parameters Top@1 Top@2 Top@5 MCA
AlexNet 62 M 47.45 62.33 78.39 49.15
AlexNet* 62 M 53.17 - 82.59 -
GooLeNet* 7 M 53.63 - 83.88 -
ResNet-18 12 M 53.05 68.87 83.86 54.40
ResNet-50 25 M 55.47 70.40 85.36 55.47
ResNet-50* 25 M 54.74 - 85.08 -
VGG-19* 143 M 55.24 - 84.91 -
DenseNet-161 29 M 56.12 71.48 86.12 56.12
Ours 47 M 56.51 71.57 86.00 56.51



The repository has been tested in the following software versions.

  • Ubuntu 16.04
  • Python 3.6
  • Anaconda 4.6

Clone Repository

Clone repository running the following command:

$ git clone

Anaconda Enviroment

To create and setup the Anaconda Envirmorent run the following terminal command from the repository folder:

$ conda env create -f Config/Conda_Env.yml
$ conda activate SA-Scene-Recognition


Download and setup instructions for each datasets are provided in the follwing links:


Model Zoo

In order to evaluate the models independently, download them from the following links and indicate the path in YAML configuration files (Usually /Data/Model Zoo/DATASET FOLDER).

[Recommended] Alternatively you can run the following script from the repository folder to download all the available Model Zoo:

bash ./Scripts/


MIT Indoor 67

SUN 397

Places 365

Run Evaluation

In order to evaluate models run file from the respository folder indicating the dataset YAML configuration path:

python --ConfigPath [PATH to configuration file]

Example for ADE20K Dataset:

python --ConfigPath Config/config_ADE20K.yaml

All the desired configuration (backbone architecture to use, model to load, batch size...etc) should be changed in each separate YAML configuration file.

Computed performance metrics for both training and validation sets are:

  • Top@1
  • Top@2
  • Top@5
  • Mean Class Accuracy (MCA)


If you find this code and work useful, please consider citing:

  title={Semantic-Aware Scene Recognition},
  author={L{\'o}pez-Cifuentes, Alejandro and Escudero-Vi{\~n}olo, Marcos and Besc{\'o}s, Jes{\'u}s and Garc{\'\i}a-Mart{\'\i}n, {\'A}lvaro},
  journal={Pattern Recognition},


This study has been partially supported by the Spanish Government through its TEC2017-88169-R MobiNetVideo project.
