skip to main content
research-article

VNect: real-time 3D human pose estimation with a single RGB camera

Published: 20 July 2017 Publication History
  • Get Citation Alerts
  • Abstract

    We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control---thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method's accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.

    Supplementary Material

    ZIP File (a44-mehta.zip)
    Supplemental files.
    MP4 File (papers-0079.mp4)

    References

    [1]
    Ankur Agarwal and Bill Triggs. 2006. Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 28, 1 (2006), 44--58.
    [2]
    Sameer Agarwal, Keir Mierle, and Others. 2017. Ceres Solver. http://ceres-solver.org. (2017).
    [3]
    Ijaz Akhter and Michael J Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1446--1455.
    [4]
    Sikandar Amin, Mykhaylo Andriluka, Marcus Rohrbach, and Bernt Schiele. 2013. Multiview Pictorial Structures for 3D Human Pose Estimation. In BMVC.
    [5]
    Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [6]
    Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. 2009. Pictorial structures revisited: People detection and articulated pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1014--1021.
    [7]
    Anelia Angelova, Alex Krizhevsky, Vincent Vanhoucke, Abhijit Ogale, and Dave Ferguson. 2015. Real-Time Pedestrian Detection With Deep Network Cascades. In Proceedings of BMVC 2015.
    [8]
    Andreas Baak, Meinard Müller, Gaurav Bharaj, Hans-Peter Seidel, and Christian Theobalt. 2011. A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera. In IEEE International Conference on Computer Vision (ICCV).
    [9]
    Alexandru O Balan, Leonid Sigal, and Michael J Black. 2005. A quantitative evaluation of video-based 3D person tracking. In 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. IEEE, 349--356.
    [10]
    Alexandru O Balan, Leonid Sigal, Michael J Black, James E Davis, and Horst W Haussecker. 2007. Detailed human shape and pose from images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.
    [11]
    Vasileios Belagiannis, Sikandar Amin, Mykhaylo Andriluka, Bernt Schiele, Nassir Navab, and Slobodan Ilic. 2014. 3D pictorial structures for multiple human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1669--1676.
    [12]
    Vasileios Belagiannis and Andrew Zisserman. 2016. Recurrent Human Pose Estimation. arXiv preprint arXiv:1605.02914 (2016).
    [13]
    Alessandro Bissacco, Ming-Hsuan Yang, and Stefano Soatto. 2007. Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
    [14]
    Liefeng Bo and Cristian Sminchisescu. 2010. Twin gaussian processes for structured prediction. International Journal of Computer Vision 87, 1--2 (2010), 28--52.
    [15]
    Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In European Conference on Computer Vision (ECCV).
    [16]
    Lubomir Bourdev and Jitendra Malik. 2009. Poselets: Body part detectors trained using 3d human pose annotations. In IEEE International Conference on Computer Vision (ICCV). 1365--1372.
    [17]
    Ernesto Brau and Hao Jiang. 2016. 3D Human Pose Estimation via Deep Learning from 2D Annotations. In International Conference on 3D Vision (3DV).
    [18]
    Christoph Bregler and Jitendra Malik. 1998. Tracking people with twists and exponential maps. In Conference on Computer Vision and Pattern Recognition. 8--15.
    [19]
    Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2016. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv preprint arXiv:1611.08050 (2016).
    [20]
    Géry Casiez, Nicolas Roussel, and Daniel Vogel. 2012. 1âĆň filter: a simple speed-based low-pass filter for noisy input in interactive systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2527--2530.
    [21]
    Jinxiang Chai and Jessica K Hodgins. 2005. Performance animation from low-dimensional control signals. ACM Transactions on Graphics (TOG) 24, 3 (2005), 686--696.
    [22]
    Wenzheng Chen, Huan Wang, Yangyan Li, Hao Su, Zhenhua Wang, Changhe Tu, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2016. Synthesizing Training Images for Boosting Human 3D Pose Estimation. In International Conference on 3D Vision (3DV).
    [23]
    Martin de La Gorce, Nikos Paragios, and David J Fleet. 2008. Model-based hand tracking with texture, shading and self-occlusions. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference On. IEEE, 1--8.
    [24]
    Jonathan Deutscher and Ian Reid. 2005. Articulated body motion capture by stochastic search. International Journal of Computer Vision 61, 2 (2005), 185--205.
    [25]
    Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, and others. 2016. Fusion4d: Real-time performance capture of challenging scenes. ACM Transactions on Graphics (TOG) 35, 4 (2016), 114.
    [26]
    Ahmed Elgammal and Chan-Su Lee. 2004. Inferring 3D body pose from silhouettes using activity manifold learning. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, Vol. 2. IEEE, II-681.
    [27]
    Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Mykhaylo Andriluka, Christoph Bregler, Bernt Schiele, and Christian Theobalt. 2016. MARCOnI - ConvNet-based MARker-less Motion Capture in Outdoor and Indoor Scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2016).
    [28]
    Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. In IEEE transactions on pattern analysis and machine intelligence. IEEE, 1627--1645.
    [29]
    Pedro F Felzenszwalb and Daniel P Huttenlocher. 2005. Pictorial structures for object recognition. International Journal of Computer Vision (IJCV) 61, 1 (2005), 55--79.
    [30]
    Vittorio Ferrari, Manuel Marin-Jimenez, and Andrew Zisserman. 2009. Pose search: retrieving people using their pose. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.
    [31]
    Juergen Gall, Bodo Rosenhahn, Thomas Brox, and Hans-Peter Seidel. 2010. Optimization and Filtering for Human Motion Capture. International Journal of Computer Vision (IJCV) 87, 1--2 (2010), 75--92.
    [32]
    Varun Ganapathi, Christian Plagemann, Daphne Koller, and Sebastian Thrun. 2012. Real-time human pose tracking from range data. In European conference on computer vision. Springer, 738--751.
    [33]
    Ravi Garg, Anastasios Roussos, and Lourdes Agapito. 2013. Dense variational reconstruction of non-rigid surfaces from monocular video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1272--1279.
    [34]
    Ross Girshick, Jamie Shotton, Pushmeet Kohli, Antonio Criminisi, and Andrew Fitzgibbon. 2011. Efficient regression of general-activity human poses from depth images. In Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 415--422.
    [35]
    Paulo FU Gotardo and Aleix M Martinez. 2011. Computing smooth time trajectories for camera and deformable shape in structure from motion with occlusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 10 (2011), 2051--2065.
    [36]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR).
    [37]
    Nicholas R Howe, Michael E Leventon, and William T Freeman. 1999. Bayesian Reconstruction of 3D Human Motion from Single-Camera Video. In NIPS, Vol. 99. 820--6.
    [38]
    Peiyun Hu, Deva Ramanan, Jia Jia, Sen Wu, Xiaohui Wang, Lianhong Cai, and Jie Tang. 2016. Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [39]
    Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time Volumetric Non-rigid Reconstruction. (October 2016), 17.
    [40]
    Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele. 2016. DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model. In European Conference on Computer Vision (ECCV).
    [41]
    Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of The 32nd International Conference on Machine Learning. 448--456.
    [42]
    Catalin Ionescu, Joao Carreira, and Cristian Sminchisescu. 2014a. Iterated second-order label sensitive pooling for 3d human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1661--1668.
    [43]
    Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014b. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 36, 7 (2014), 1325--1339.
    [44]
    Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. MovieReshape: Tracking and Reshaping of Humans in Videos. ACM Transactions on Graphics 29, 5 (2010).
    [45]
    Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia. 675--678.
    [46]
    Sam Johnson and Mark Everingham. 2010. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. In British Machine Vision Conference (BMVC).
    [47]
    Sam Johnson and Mark Everingham. 2011. Learning Effective Human Pose Estimation from Inaccurate Annotation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.
    [48]
    Minsik Lee, Jungchan Cho, Chong-Ho Choi, and Songhwai Oh. 2013. Procrustean normal distribution for non-rigid structure from motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1280--1287.
    [49]
    Sijin Li and Antoni B Chan. 2014. 3d human pose estimation from monocular images with deep convolutional neural network. In Asian Conference on Computer Vision (ACCV). 332--347.
    [50]
    Sijin Li, Weichen Zhang, and Antoni B Chan. 2015a. Maximum-margin structured learning with deep networks for 3d human pose estimation. In IEEE International Conference on Computer Vision (ICCV). 2848--2856.
    [51]
    Sijin Li, Weichen Zhang, and Antoni B Chan. 2015b. Maximum-margin structured learning with deep networks for 3d human pose estimation. In IEEE International Conference on Computer Vision (ICCV). 2848--2856.
    [52]
    Ita Lifshitz, Ethan Fetaya, and Shimon Ullman. 2016. Human Pose Estimation using Deep Consensus Voting. In European Conference on Computer Vision (ECCV).
    [53]
    Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott E. Reed. 2016. SSD: Single Shot MultiBox Detector. In European Conference on Computer Vision (ECCV).
    [54]
    Matthew M Loper and Michael J Black. 2014. OpenDR: An approximate differentiable renderer. In European Conference on Computer Vision. Springer, 154--169.
    [55]
    Ziyang Ma and Enhua Wu. 2014. Real-time and robust hand tracking with a single depth camera. The Visual Computer 30, 10 (2014), 1133--1144.
    [56]
    Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2016. Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision. arXiv preprint arXiv:1611.09813v2 (2016).
    [57]
    Alberto Menache. 2000. Understanding motion capture for computer animation and video games. Morgan kaufmann.
    [58]
    Microsoft Corporation. 2010. Kinect for Xbox 360. http://www.xbox.com/en-US/xbox-360/accessories/kinect. (2010).
    [59]
    Microsoft Corporation. 2013. Kinect for Xbox One. http://www.xbox.com/en-US/xbox-one/accessories/kinect. (2013).
    [60]
    Microsoft Corporation. 2015. Kinect SDK. https://developer.microsoft.com/en-us/windows/kinect. (2015).
    [61]
    Thomas B. Moeslund, Adrian Hilton, and Volker KrÃiger. 2006. A Survey of Advances in Vision-based Human Motion Capture and Analysis. CVIU 104, 2--3 (2006), 90--126.
    [62]
    Greg Mori and Jitendra Malik. 2006. Recovering 3d human body configurations using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 28, 7 (2006), 1052--1062.
    [63]
    Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [64]
    Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked Hourglass Networks for Human Pose Estimation. In European Conference on Computer Vision (ECCV).
    [65]
    Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros. 2011. Efficient model-based 3D tracking of hand articulations using Kinect. In BmVC, Vol. 1. 3.
    [66]
    Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, and others. 2016. Holoportation: Virtual 3D Teleportation in Real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 741--754.
    [67]
    Hyun Soo Park and Yaser Sheikh. 2011. 3D reconstruction of a smooth articulated trajectory from a monocular image sequence. In International Conference on Computer Vision (ICCV). 201--208.
    [68]
    Georgios Pavlakos, Xiaowei Zhou, Konstantinos G Derpanis, and Kostas Daniilidis. 2016. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose. arXiv preprint arXiv:1611.07828 (2016).
    [69]
    Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2013. Strong appearance and expressive spatial models for human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 3487--3494.
    [70]
    Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [71]
    Gerard Pons-Moll, David J Fleet, and Bodo Rosenhahn. 2014. Posebits for monocular human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2337--2344.
    [72]
    Real Madrid C.F. 2016. Cristiano Ronaldo and Coentrao continue their recovery. https://www.youtube.com/watch?v=xqiPuX_buOo. (2016).
    [73]
    Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640 (2015).
    [74]
    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.
    [75]
    Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, and Christian Theobalt. 2016a. EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras. ACM Trans. Graph. (Proc. SIGGRAPH Asia) (2016).
    [76]
    Helge Rhodin, Nadia Robertini, Dan Casas, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2016b. General automatic human shape and motion capture using volumetric contour cues. In European Conference on Computer Vision (ECCV). Springer, 509--526.
    [77]
    Helge Rhodin, Nadia Robertini, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2015. A Versatile Scene Model With Differentiable Visibility Applied to Generative Pose Estimation. In ICCV.
    [78]
    Grégory Rogez and Cordelia Schmid. 2016. MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild. arXiv preprint arXiv:1607.02046 (2016).
    [79]
    Lorenz Rogge, Felix Klose, Michael Stengel, Martin Eisemann, and Marcus Magnor. 2014. Garment replacement in monocular video sequences. ACM Transactions on Graphics (TOG) 34, 1 (2014), 6.
    [80]
    Rómer Rosales and Stan Sclaroff. 2000. Specialized mappings and the estimation of human body pose from a single image. In Human Motion, 2000. Proceedings. Workshop on. IEEE, 19--24.
    [81]
    Rómer Rosales and Stan Sclaroff. 2006. Combining generative and discriminative models in a framework for articulated pose estimation. International Journal of Computer Vision 67, 3 (2006), 251--276.
    [82]
    RUSFENCING-TV. 2017. The Most Beautiful Strike / Saber Woman (Translated from Russian). https://www.youtube.com/watch?v=0gOcMsWUkCU. (2017).
    [83]
    Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, and Richard Moore. 2013. Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 1 (2013), 116--124.
    [84]
    Hedvig Sidenbladh, Michael J Black, and David J Fleet. 2000. Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision. Springer, 702--718.
    [85]
    Leonid Sigal, Michael Isard, Horst Haussecker, and Michael J Black. 2012. Loose-limbed people: Estimating 3D human pose and motion using non-parametric belief propagation. International Journal of Computer Vision (IJCV) 98, 1 (2012), 15--48.
    [86]
    Cristian Sminchisescu, Atul Kanaujia, and Dimitris Metaxas. 2006. Learning joint top-down and bottom-up processes for 3D visual inference. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1743--1752.
    [87]
    Cristian Sminchisescu, Atul Kanaujia, and Dimitris N Metaxas. 2007. BM3E: Discriminative Density Propagation for Visual Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 11 (2007), 2030--2044.
    [88]
    Cristian Sminchisescu and Bill Triggs. 2001. Covariance scaled sampling for monocular 3D body tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1. IEEE, I-447.
    [89]
    Jonathan Starck and Adrian Hilton. 2003. Model-based multiple view reconstruction of people. In IEEE International Conference on Computer Vision (ICCV). 915--922.
    [90]
    Carsten Stoll, Nils Hasler, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Fast articulated motion tracking using a sums of Gaussians body model. In IEEE International Conference on Computer Vision (ICCV). 951--958.
    [91]
    Leonid Taycher, David Demirdjian, Trevor Darrell, and Gregory Shakhnarovich. 2006. Conditional random people: Tracking humans with crfs and grid filters. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 1. IEEE, 222--229.
    [92]
    Camillo J Taylor. 2000. Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1. 677--684.
    [93]
    Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016a. Structured Prediction of 3D Human Pose with Deep Neural Networks. In British Machine Vision Conference (BMVC).
    [94]
    Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, and Pascal Fua. 2016b. Fusing 2D Uncertainty and 3D Cues for Monocular Body Pose Estimation. arXiv preprint arXiv:1611.05708 (2016).
    [95]
    Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016c. Direct Prediction of 3D Body Poses from Motion Compensated Sequences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [96]
    Jonathan J Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in Neural Information Processing Systems (NIPS). 1799--1807.
    [97]
    Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Conference on Computer Vision and Pattern Recognition (CVPR). 1653--1660.
    [98]
    Raquel Urtasun, David J Fleet, and Pascal Fua. 2006. Temporal motion models for monocular and multiview 3d human body tracking. Computer vision and image understanding 104, 2 (2006), 157--177.
    [99]
    Marek Vondrak, Leonid Sigal, Jessica Hodgins, and Odest Jenkins. 2012. Video-based 3D motion capture through biped control. ACM Transactions On Graphics (TOG) 31, 4 (2012), 27.
    [100]
    Chunyu Wang, Yizhou Wang, Zhouchen Lin, Alan L Yuille, and Wen Gao. 2014. Robust estimation of 3d human poses from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2361--2368.
    [101]
    Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional Pose Machines. In Conference on Computer Vision and Pattern Recognition (CVPR).
    [102]
    Xiaolin Wei and Jinxiang Chai. 2010. Videomocap: modeling physically realistic human motion from monocular video sequences. In ACM Transactions on Graphics (TOG), Vol. 29. ACM, 42.
    [103]
    Xiaolin Wei, Peizhao Zhang, and Jinxiang Chai. 2012. Accurate realtime full-body motion capture using a single depth camera. ACM Transactions on Graphics (TOG) 31, 6 (2012), 188.
    [104]
    Christopher Richard Wren, Ali Azarbayejani, Trevor Darrell, and Alex Paul Pentland. 1997. Pfinder: real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 19, 7 (1997), 780--785.
    [105]
    Hashim Yasin, Umar Iqbal, Björn Krüger, Andreas Weber, and Juergen Gall. 2016. A Dual-Source Approach for 3D Pose Estimation from a Single Image. In Conference on Computer Vision and Pattern Recognition (CVPR).
    [106]
    Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2345--2352.
    [107]
    Yongkang Yu, Feilinand Yonghao, Zhen Yilin, and Weidong Mohan. 2016. Marker-less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps. In European Conference on Computer Vision (ECCV).
    [108]
    Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012).
    [109]
    Xiaowei Zhou, Spyridon Leonardos, Xiaoyan Hu, and Kostas Daniilidis. 2015. 3D shape estimation from 2D landmarks: A convex relaxation approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4447--4455.
    [110]
    Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep Kinematic Pose Regression. ECCV Worktp on Geometry Meets Deep Learning.
    [111]
    Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, and Kostas Daniilidis. 2015a. Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach. arXiv preprint arXiv:1509.04309 (2015).
    [112]
    Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Kosta Derpanis, and Kostas Daniilidis. 2015b. Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [113]
    Yingying Zhu, Mark Cox, and Simon Lucey. 2011. 3D motion reconstruction for real-world camera motion. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 1--8.
    [114]
    Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rhemann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time Non-rigid Reconstruction using an RGB-D Camera. ACM Transactions on Graphics (TOG) 33, 4 (2014).

    Cited By

    View all
    • (2024)HDPose: Post-Hierarchical Diffusion with Conditioning for 3D Human Pose EstimationSensors10.3390/s2403082924:3(829)Online publication date: 26-Jan-2024
    • (2024)Gestures recognition based on multimodal fusion by using 3D CNNsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23479146:1(1647-1661)Online publication date: 1-Jan-2024
    • (2024)Human Image Generation: A Comprehensive SurveyACM Computing Surveys10.1145/366586956:11(1-39)Online publication date: 28-Jun-2024
    • Show More Cited By

    Index Terms

    1. VNect: real-time 3D human pose estimation with a single RGB camera

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 36, Issue 4
      August 2017
      2155 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3072959
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 July 2017
      Published in TOG Volume 36, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. body pose
      2. monocular
      3. real time

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)642
      • Downloads (Last 6 weeks)80

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)HDPose: Post-Hierarchical Diffusion with Conditioning for 3D Human Pose EstimationSensors10.3390/s2403082924:3(829)Online publication date: 26-Jan-2024
      • (2024)Gestures recognition based on multimodal fusion by using 3D CNNsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23479146:1(1647-1661)Online publication date: 1-Jan-2024
      • (2024)Human Image Generation: A Comprehensive SurveyACM Computing Surveys10.1145/366586956:11(1-39)Online publication date: 28-Jun-2024
      • (2024)Benchmarking analysis of human pose estimation solutions for virtual television setsProceedings of the XXIV International Conference on Human Computer Interaction10.1145/3657242.3657244(1-10)Online publication date: 19-Jun-2024
      • (2024)A Novel Auxiliary Task Framework in 3D Human Pose Estimation for Opera VideosProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658066(202-210)Online publication date: 30-May-2024
      • (2024)PyroSenseProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314357:4(1-32)Online publication date: 12-Jan-2024
      • (2024)A Hybrid Deep Learning Framework for Estimating Human 3D Pose from 2D Joint Positions2024 International Conference on Advances in Computing, Communication, Electrical, and Smart Systems (iCACCESS)10.1109/iCACCESS61735.2024.10499591(01-06)Online publication date: 8-Mar-2024
      • (2024)MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00677(6905-6915)Online publication date: 3-Jan-2024
      • (2024)Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00603(6130-6140)Online publication date: 3-Jan-2024
      • (2024)A Geometry Loss Combination for 3D Human Pose Estimation2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00324(3260-3269)Online publication date: 3-Jan-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media

      -