Filters, random fields, and maximum entropy model

In the domain of physics and probability, the filters, random fields, and maximum entropy (FRAME) model^[1]^[2] is a Markov random field model (or a Gibbs distribution) of stationary spatial processes, in which the energy function is the sum of translation-invariant potential functions that are one-dimensional non-linear transformations of linear filter responses. The FRAME model was originally developed by Song-Chun Zhu, Ying Nian Wu, and David Mumford for modeling stochastic texture patterns, such as grasses, tree leaves, brick walls, water waves, etc. This model is the maximum entropy distribution that reproduces the observed marginal histograms of responses from a bank of filters (such as Gabor filters or Gabor wavelets), where for each filter tuned to a specific scale and orientation, the marginal histogram is pooled over all the pixels in the image domain. The FRAME model is also proved to be equivalent to the micro-canonical ensemble,^[3] which was named the Julesz ensemble. Gibbs sampler^[4] is adopted to synthesize texture images by drawing samples from the FRAME model.

The original FRAME model is homogeneous for texture modeling. Xie et al. proposed the sparse FRAME model,^[5]^[6] which is an inhomogeneous generalization of the original FRAME model, for the purpose of modeling object patterns, such as animal bodies, faces, etc. It is a non-stationary Markov random field model that reproduces the observed statistical properties of filter responses at a subset of selected locations, scales and orientations. The sparse FRAME model can be considered a deformable template.

The deep FRAME model ^[7]^[8] is a deep generalization of the original FRAME model. Instead of using linear filters as in the original FRAME model, Lu et al. uses the filters at a certain convolutional layer of a pre-learned ConvNet.^[7] Instead of relying on the pre-trained filters from an existing ConvNet, Xie et al. parameterized the energy function of the FRAME model by a ConvNet structure and learn all parameters from scratch.^[8] The deep FRAME model is the first framework that integrates modern deep neural network from deep learning and Gibbs distribution from statistical physics. The deep FRAME models are further generalized to modeling video patterns,^[9]^[10] 3D volumetric shape patterns ^[11]

References[edit]

^ Zhu, Song-Chun; Wu, Ying Nian; Mumford, David. "Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling". International Journal of Computer Vision: 1998.
^ Zhu, Song Chun; Wu, Ying Nian; Mumford, David (November 1997). "Minimax Entropy Principle and Its Application to Texture Modeling". Neural Computation. 9 (8): 1627–1660. doi:10.1162/neco.1997.9.8.1627. ISSN 0899-7667. S2CID 15926.
^ Ying Nian Wu; Song Chun Zhu; Xiuwen Liu (1999). "Equivalence of Julesz and Gibbs texture ensembles". Proceedings of the Seventh IEEE International Conference on Computer Vision. IEEE. pp. 1025–1032 vol.2. doi:10.1109/iccv.1999.790382. ISBN 0-7695-0164-8. S2CID 7550898.
^ Smith, Grahame B. (1987), "Stuart Geman and Donald Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images";", Readings in Computer Vision, Elsevier, pp. 562–563, doi:10.1016/b978-0-08-051581-6.50056-8, ISBN 978-0-08-051581-6
^ Xie, Jianwen; Hu, Wenze; Zhu, Song-Chun; Wu, Ying Nian (2014-10-02). "Learning Sparse FRAME Models for Natural Image Patterns". International Journal of Computer Vision. 114 (2–3): 91–112. CiteSeerX 10.1.1.434.7360. doi:10.1007/s11263-014-0757-x. ISSN 0920-5691. S2CID 8742525.
^ Xie, Jianwen; Lu, Yang; Zhu, Song-Chun; Wu, Ying Nian (July 2016). "Inducing wavelets into random fields via generative boosting". Applied and Computational Harmonic Analysis. 41 (1): 4–25. doi:10.1016/j.acha.2015.08.004. ISSN 1063-5203. S2CID 521731.
^ ^a ^b Lu, Yang; Zhu, Song-Chun; Wu, Ying Nian (2016). "Learning FRAME Models Using CNN Filters". Proceedings of the AAAI Conference on Artificial Intelligence. 30. arXiv:1509.08379. doi:10.1609/aaai.v30i1.10238. S2CID 2387309.
^ ^a ^b Xie, Jianwen; Lu, Yang; Zhu, Song-Chun; Wu, Ying Nian (2016). "A theory of generative ConvNet". International Conference on Machine Learning. arXiv:1602.03264.
^ Xie, Jianwen; Zhu, Song-Chun; Wu, Ying Nian (July 2017). "Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet". 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 1061–1069. arXiv:1606.00972. doi:10.1109/cvpr.2017.119. ISBN 978-1-5386-0457-1. S2CID 763074.
^ Xie, Jianwen; Zhu, Song-Chun; Wu, Ying Nian (2019). "Learning energy-based spatial-temporal generative ConvNet for dynamic patterns". IEEE Transactions on Pattern Analysis and Machine Intelligence. 43 (2): 516–531. arXiv:1909.11975. doi:10.1109/TPAMI.2019.2934852. PMID 31425020. S2CID 201098397.
^ Xie, Jianwen; Zheng, Zilong; Gao, Ruiqi; Wang, Wenguan; Zhu, Song-Chun; Wu, Ying Nian (June 2018). "Learning Descriptor Networks for 3D Shape Synthesis and Analysis". 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. pp. 8629–8638. arXiv:1804.00586. doi:10.1109/cvpr.2018.00900. ISBN 978-1-5386-6420-9. S2CID 4564025.

[1] Zhu, Song-Chun; Wu, Ying Nian; Mumford, David. "Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling". International Journal of Computer Vision: 1998.

[2] Zhu, Song Chun; Wu, Ying Nian; Mumford, David (November 1997). "Minimax Entropy Principle and Its Application to Texture Modeling". Neural Computation. 9 (8): 1627–1660. doi:10.1162/neco.1997.9.8.1627. ISSN 0899-7667. S2CID 15926.

[3] Ying Nian Wu; Song Chun Zhu; Xiuwen Liu (1999). "Equivalence of Julesz and Gibbs texture ensembles". Proceedings of the Seventh IEEE International Conference on Computer Vision. IEEE. pp. 1025–1032 vol.2. doi:10.1109/iccv.1999.790382. ISBN 0-7695-0164-8. S2CID 7550898.

[4] Smith, Grahame B. (1987), "Stuart Geman and Donald Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images";", Readings in Computer Vision, Elsevier, pp. 562–563, doi:10.1016/b978-0-08-051581-6.50056-8, ISBN 978-0-08-051581-6

[5] Xie, Jianwen; Hu, Wenze; Zhu, Song-Chun; Wu, Ying Nian (2014-10-02). "Learning Sparse FRAME Models for Natural Image Patterns". International Journal of Computer Vision. 114 (2–3): 91–112. CiteSeerX 10.1.1.434.7360. doi:10.1007/s11263-014-0757-x. ISSN 0920-5691. S2CID 8742525.

[6] Xie, Jianwen; Lu, Yang; Zhu, Song-Chun; Wu, Ying Nian (July 2016). "Inducing wavelets into random fields via generative boosting". Applied and Computational Harmonic Analysis. 41 (1): 4–25. doi:10.1016/j.acha.2015.08.004. ISSN 1063-5203. S2CID 521731.

[:0-7] Lu, Yang; Zhu, Song-Chun; Wu, Ying Nian (2016). "Learning FRAME Models Using CNN Filters". Proceedings of the AAAI Conference on Artificial Intelligence. 30. arXiv:1509.08379. doi:10.1609/aaai.v30i1.10238. S2CID 2387309.

[:1-8] Xie, Jianwen; Lu, Yang; Zhu, Song-Chun; Wu, Ying Nian (2016). "A theory of generative ConvNet". International Conference on Machine Learning. arXiv:1602.03264.

[9] Xie, Jianwen; Zhu, Song-Chun; Wu, Ying Nian (July 2017). "Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet". 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 1061–1069. arXiv:1606.00972. doi:10.1109/cvpr.2017.119. ISBN 978-1-5386-0457-1. S2CID 763074.

[10] Xie, Jianwen; Zhu, Song-Chun; Wu, Ying Nian (2019). "Learning energy-based spatial-temporal generative ConvNet for dynamic patterns". IEEE Transactions on Pattern Analysis and Machine Intelligence. 43 (2): 516–531. arXiv:1909.11975. doi:10.1109/TPAMI.2019.2934852. PMID 31425020. S2CID 201098397.

[11] Xie, Jianwen; Zheng, Zilong; Gao, Ruiqi; Wang, Wenguan; Zhu, Song-Chun; Wu, Ying Nian (June 2018). "Learning Descriptor Networks for 3D Shape Synthesis and Analysis". 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. pp. 8629–8638. arXiv:1804.00586. doi:10.1109/cvpr.2018.00900. ISBN 978-1-5386-6420-9. S2CID 4564025.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]