GeoSay: A geometric saliency for extracting buildings

in remote sensing images

Gui-Song Xia1*, Jin Huang1, Nan Xue1, Qikai Lu2*, Xiaoxiang Zhu3

1. State Key Lab. LIESMARS, Wuhan University, Wuhan 430079, China
2. Electronic Information School, Wuhan University, Wuhan, 430079, China
3. Civil, Geo and Environmental Engineering, Technische Universität München, Germany

- Abstract -

This work aims to address the problem of detecting buildings from remote sensing images with very high resolutions (VHR). Inspired by the observation that buildings are always more distinguishable in geometries than in texture or spectral, we present a new geometric building index (GBI) for accurate building detection, which relies on the geometric saliency of building structures. We derive the geometric saliency of buildings from a mid-level geometric representations including meaningful junctions that can locally describe anisotropic geometrical structures of images. The resulting GBI is measured by integrating the computed geometric saliency bounded in a parallelogram. Experiments on three public datasets demonstrate that without any training samples, the proposed GBI achieves very promising performance, and meanwhile shows impressive generalization capability.

- Codes -

The code is now available at here.

- Experiment -

- Methods

We compared with the following four kinds of building detection methods to demonstrate the effectiveness of our method:

  1. Learning-based method: Hierarchically fused fully convolutional network (HF-FCN) [4].
  2. Texture-based methods: Built-up areas saliency index (BASI) [1].
  3. Morphology-based method: Morphological building index (MBI) [2].
  4. Geometry-based method: Perceptual building index (PBI) [3].

- Datasets

We conduct the experiments on three public datasets:
  1. Spacenet-65, contains 65 RGB images of 2000x2000 pixels and their manual corrected ground truths. The Resolution is 0.5m. It covers both urban and rural areas. We manually corrected the grund truth of this dataset for the accurate computation of prior probability.

  2. Potsdam, has 214 ortho corrected RGB images of 2000x2000 pixels. Resolution of ground truth is 0.05m. It covers most of the historical city, Potsdam.

  3. Massachusetts, contains 10 images of 1500x1500 pixels from test set. Resolution is 1m. Due to the low quality of images, we smoothed them with a 3x3 Gaussian kernel.

- Evaluation metrics

For comparing the detection accuracy, we employed the two common used metrics: mean Average Precision[5] (mAP) and F-score[6] (also called F-factor). Directly assessing the index map is difficult, we used 100 thresholds in [0,1] to get the segmented results to measure the performance.

For each binary segmented result, pixels could be divided into true positive (TP), false positive (FP), false negative (FN) and true negative (TN). TP represents the number of correctly detected building pixels and FP is the number of wrongly detected building pixels. FN is the number of wrongly detected background pixels and TN is the number of correctly detected background pixel. Precision of building detection is the proportion of correctly detected building pixels in all detected building pixels. Recall is the the proportion of correctly detected building pixels in all building pixels.

Equation 1. the description of precision and recall.

We could plot a precision-recall curve, plotting precision p(r) as a function of recall r for all precision and recall scores got by the 100 thresholds. The average precision (AP) is the area under the precision-recall curve and the mean AP scores among one dataset is called the mean average precision. F-score is the harmonic mean of precision and recall. For an image, each threshold corresponding to a F-score and we choose the maximal F-score as the final result. The F-score of a dataset is the average scores of all of the images in the dataset.

Equation 2. the calculation of mAP and F-score, where n is the number of images.

- Results

Table 1. mAP and F-score of all methods in three datasets, the highest is in bold.

Beside the mAP and F-score at above Table 1, we also show all of the original images in the three datasets and results given by the building detection methods at below.


  1. Left part shows a preview of all images and results, click to see details.
  2. From the seven to seventh row of left part, there are indexes and segmented results of GBI, BASI, MBI, Pantex, PBI and HF-FCN.
  3. Right part has a big image and will show the details of image selected from left part.
  4. Click the arrows (under the title) to view the previous or next image of the current selected dataset.
  5. Results of all of the images in the three datasets
    dataset 1 / 65 Jump to


    Left:         index

    Right: segmentation

    Dataset: Spacenet65 Image-Type: Original Image
    Precision: unknown Recall: unknown F-measure: unknown Threshold: unknown

- Reference -

  1. Z. Shao, Y. Tian, and X. Shen, "BASI: A new index to extract built-up areas from high-resolution remote sensing images by visual attention model" Remote Sensing Letters, vol. 5, no. 4, pp. 305–314, 2014.
  2. X. Huang and L. Zhang, "A multidirectional and multiscale morphological index for automatic building extraction from multispectral geoeye-1 imagery" Photogramm. Eng. & Rem. Sens., vol. 77, no. 7, pp. 721–732, 2011.
  3. G. Liu, G. Xia, X. Huang, W. Yang, and L. Zhang, "A perceptioninspired building index for automatic built-up area detection in highresolution satellite images" in IGARSS 2013, pp. 3132–3135.
  4. T. Zuo, J. Feng, and X. Chen, "HF-FCN: Hierarchically fused fully convolutional network for robust building extraction" in Asian Conference on Computer Vision, 2016, pp. 291–302.
  5. C. Buckley and E. Voorhees, "Evaluating evaluation measure stability" in Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, 2000, SIGIR ’00, pp. 33–40, ACM.
  6. D. Powers, "Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation" Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37–63, 2011.