Learning the Synthesizability of

Dynamic Texture Samples


Feng Yang, Gui-Song Xia*, Dengxin Dai, Liangpei Zhang

[pdf] [code coming soon]

-Abstract-

A dynamic texture (DT) refers to a sequence of images that exhibit temporal regularities and has many applications in computer vision and graphics. Given an exemplar of dynamic texture, it is a dynamic but challenging task to generate new samples with high quality that are perceptually similar to the input exemplar, which is known to be example-based dynamic texture synthesis (EDTS). Numerous approaches have been devoted to this problem, in the past decades, but none them are able to tackle all kinds of dynamic textures equally well. In this paper, we investigate the synthesizability of dynamic texture samples : given a dynamic texture sample, how synthesizable it is by using EDTS, and which EDTS method is the most suitable to synthesize it? To this end, we propose to learn regression models to connect dynamic texture samples with synthesizability scores, with the help of a compiled dynamic texture dataset annotated in terms of synthesizability. More precisely, we first define the synthesizability of DT samples and characterize them by a set of spatiotemporal features. Based on these features and an annotated dynamic texture dataset, we then train regression models to predict the synthesizability scores of texture samples, and learn classifiers to select the most suitable EDTS methods. We further complete the selection, partition and synthesizability prediction of dynamic texture samples in a hierarchical scheme. We finally apply the learnt synthesizability to detecting synthesizable regions in videos. The experiments demonstrate that our method can effectively learn and predict the synthesizability of DT samples.

1. Motivation

texture

Figure 1. A short summarization of motivations.

Although it is intuitive that some videos will be easier to synthesize than others, quantifying this intuition has not been addressed in previous experiments. We intend to investigate the synthesizability of dynamic texture motivated by the following issues:

  1. Is dynamic texture synthesizability predictable? Can we predict synthesizability score of a particular dynamic texture prior to putting it into any exemplar-based synthesis method?
  2. There are no previous studies that try to quantify individual DTs in terms of how synthesizable they are, and no computer vision systems that try to predict dynamic texture synthesizability.
  3. There are no databases of videos calibrated in terms of the degree of synthesizability for each dynamic texture.
  4. It is also imperative to investigate the characterization of dynamic textures by spatiotemporal features related to synthesizability.

2. Dynamic texture synthesizability

We characterize the synthesizability of a dynamic texture sample as the probability that existing synthesis methods will produce good synthesized results for a specific DT sample, prior to synthesizing it. We define the synthesizability score of a dynamic texture sample as an index that indicates whether the DT sample being investigated would be synthesized well or not by EDTS methods in advance. We are going to investigate the synthesizability of dynamic texture - how well its underlying dynamic patterns can be reproduced by only analysing the original sample. The quantified synthesizabilities of several dynamic textures are shown in Fig. 2.

texture

0.78 (TDT)

texture

0.64 (TDT)

texture

0.08 (TDT)

texture

0.70 (SHDT)

texture

0.51 (SHDT)

texture

0.65 (SHDT)

Figure 2. Synthesizability (in [0, 1]) of dynamic texture examples predicted by our method. TDT: time-stationary dynamic texture; SHDT: spatially homogeneous dynamic texture.

In order to do so, we built a database depicting a variety of dynamic textures, and we measured the synthesizability of each video over a set of example-based dynamic texture synthesis results. Fig. 3 exhibits two classes of dynamic textures (water and smoke) that span a wide range of synthesizabilities. Each class of 8 videos are sorted by different level of synthesizabilities from more synthesizable (left) to less synthesizable (right).

texture
texture
texture
texture
texture
texture
texture
texture
texture
texture
texture
texture
texture
texture
texture
texture

Figure 3. Samples of the database used for the dynamic texture synthesizability study. The videos are sorted from more synthesizable (left) to less synthesizable (right). Top: spatially homogeneous dynamic textures (SHDTs); Bottom: time-stationary dynamic textures (TDTs).

3. Learning and predicting synthesizability

We investigate the dynamic texture synthesizability as a learnable and predictable property. We collected a dataset of dynamic textures with synthesizability annotations. A series of spatiotemporal features are necessary to represent dynamic textures, among which C3D [1] and LBP-TOP [2] have been used. We also designed a novel descriptor SCOP-DT specific for dynamic textures in the learning process, which extended the SCOP descriptor [3] by incorporating temporal cues implicitly. We formalise the prediction algorithm of synthesizability as a regression problem, where a regression model is learned from the collection of annotated data.

We predict synthesizability by aggregating features to train SVM or Random Forest, where the predicted synthesizability scores of individual feature are weighted combined, seen in Fig. 4. Firstly, we use regression models SVM and RF for every single feature to predict synthesizability, and compare the performance of two regression models. Secondly, we choose the optimal feature and regression model among them, and set weights manually for feature combination. Finally, the prediction synthesizability scores of feature combination on decision level are output. For dynamic texture representation, we use generic video feature C3D [1] and dynamic texture descriptor LBP-TOP [2]. We also propose a novel SCOP-DT descriptor.


Figure 4. Learn and predict synthesizability for dynamic texture samples by aggregating features to train regression models.

4. Annotated Dataset

We collected a new dynamic texture dataset and manually annotated it with synthesizability score. The dataset contains 1729 DT samples including 452 SHDTs and 1277 TDTs. We used six example-based dynamic texture synthesis methods to synthesize dynamic textures in the dataset. All dynamic textures of the dataset are annotated according to their synthesized results to label the synthesizability: good, acceptable, and bad. The best synthesis method for each texture sample is also recorded. See Fig.5 for examples of such annotation.

texture texture

[SHDT, Good, SN-dynTexton]

texture texture

[SHDT, Acceptable, SN-dynTexton]

texture texture

[SHDT, Bad, Graphcut Textures ]

texture texture

[TDT, Good, Graphcut Textures]

texture texture

[TDT, Acceptable, Graphcut Textures]

texture texture

[TDT, Bad, LDS]

Figure 5. Three SHDT (top) and three TDT (bottom) examples from our dataset with their annotations of saptial and temporal synthesizability respectively. Left: DT exemplar; right: synthesized result.

5. Experimental Analysis

For prediction of synthesizability, all 3 single features C3D, LBP-TOP, SCOP-DT and their combination were evaluated for SHDTs and TDTs respectively. 50% of the dataset videos were used for training, the rest for testing. We report results over 100 random training-testing splits in all quantitative experiments. For quantitative evaluation, we performed two-level retrieval tasks and evaluated the average precision of synthesizability prediction with individual feature and multiple features: (1) retrieve videos with "good" synthesizability scores (>=good); (2) retrieve videos with "good" or "acceptable" synthesizability scores (>=acceptable).

5.1 Prediction of synthesizability for SHDTs


Table.1: The average precision (%) of synthesizability prediction for SHDTs with individual features and their combination

texture
texture
texture
texture
texture
texture

0.83

Graphcut Textures

texture

0.85

SN-dynTexton

texture

0.44

Gatys-DT

texture

0.19

Graphcut Textures

texture

0.10

SN-dynTexton

texture
texture
texture
texture
texture
texture

0.68

Gatys-DT

texture

0.65

Graphcut Textures

texture

0.57

SN-dynTexton

texture

0.86

Graphcut Textures

texture

0.42

SN-dynTexton

texture
texture
texture
texture
texture
texture

0.10

AR-dynTexton

texture

0.49

Gatys-DT

texture

0.56

Graphcut Textures

texture

0.55

SN-dynTexton

texture

0.26

Gatys-DT


Figure 6. Spatial synthesizability scores of SHDT examples and the best synthesized dynamic textures by EDTS methods. Top: exemplar; bottom: synthesized in space.

5.2 Prediction of synthesizability for TDTs


Table.2: The average precision (%) of synthesizability prediction for TDTs with individual features and their combination

texture
texture
texture
texture
texture
texture
texture

0.98

Graphcut Textures

texture

0.54

LDS

texture

0.23

LDS

texture

0.64

STGConvNet

texture

0.12

Graphcut Textures

texture

0.52

Graphcut Textures

texture
texture
texture
texture
texture
texture
texture

0.49

Graphcut Textures

texture

0.19

LDS

texture

0.94

Graphcut Textures

texture

0.69

Graphcut Textures

texture

0.25

STGConvNet

texture

0.72

Graphcut Textures

texture
texture
texture
texture
texture
texture
texture

0.56

Graphcut Textures

texture

0.96

Graphcut Textures

texture

0.28

STGConvNet

texture

0.87

Graphcut Textures

texture

0.40

LDS

texture

0.53

Graphcut Textures

Figure 7. Temporal synthesizability scores of TDT examples and the best synthesized results by EDTS methods. Top: exemplar; bottom: synthesized along time.

5.3 Detection of synthesizable regions

We use the dynamic texture detection method proposed by Fazekas et al. [7] to detect the rough and irregular DT regions in videos at first. Then, the detected region is trimmed into regular shape aimed at good synthesizability. After detection, the synthesizability of subregions within the DT region is computed and compared. The most synthesizable subregion is then suggested.

texture

VS:0.36

RS:0.60

texture texture

video synthesis            region synthesis

texture

VS:0.33

RS:0.83

texture texture

video synthesis            region synthesis

texture

VS:0.38

RS:0.62

texture texture

video synthesis            region synthesis

texture

VS:0.37

RS:0.71

texture texture

video synthesis            region synthesis

Figure 8. Predicted synthesizability and synthesis results for the original videos and detected DT regions. VS: predicted synthesizability of the entire video; RS: predicted synthesizability of the detected DT region.

-References-

  1. D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning spatiotemporal features with 3d convolutional networks," in ICCV, 2015, pp. 4489-4497.
  2. G. Zhao and M. Pietikainen, "Dynamic texture recognition using local binary patterns with an application to facial expressions," IEEE TPAMI, vol. 29, no. 6, pp. 915-928, 2007.
  3. G. S. Xia, G. Liu, X. Bai, and L. Zhang, "Texture characterization using shape co-occurrence patterns," IEEE TIP, vol. 26, no. 10, pp. 5005-5018, Oct 2017.
  4. D. Dai, H. Riemenschneider, and L. Van Gool, "The synthesizability of texture examples," in CVPR, 2014, pp. 3027-3034.
  5. G. Doretto, A. Chiuso, Y. N. Wu, and S. Soatto, "Dynamic textures," IJCV, vol. 51, no. 2, pp. 91-109, 2003.
  6. G.-S. Xia, S. Ferradans, G. Peyre, and J.-F. Aujol, "Synthesizing and mixing stationary gaussian texture models," SIAM J. Imaging Sciences, vol. 7, no. 1, pp. 476-508, 2014.
  7. S. Fazekas, T. Amiaz, D. Chetverikov, and N. Kiryati, "Dynamic texture detection based on motion analysis," IJCV, vol. 82, no. 1, p. 48, 2009.
  8. G.-S. Xia, J. Delon, Y. Gousseau, Shape-based invariant texture indexing, IJCV, Vol. 88, No. 3, pp. 382-403, July, 2010
  9. P. Saisan, G. Doretto, Y. N. Wu, and S. Soatto, "Dynamic texture recognition," in CVPR, vol. 2. IEEE, 2001, pp. II-58.

* Gui-Song Xia is the corresponding author.