TY - JOUR
T1 - Imitation Learning-Based Algorithm for Drone Cinematography System
AU - Dang, Yuanjie
AU - Huang, Chong
AU - Chen, Peng
AU - Liang, Ronghua
AU - Yang, Xin
AU - Cheng, Kwang Ting
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2022/6/1
Y1 - 2022/6/1
N2 - Viewpoint selection for capturing human motion is an important task in autonomous aerial videography, animation, and virtual 3-D environments. Existing methods rely on heuristics for selecting the 'best' viewpoint, which requires human effort to summarize and integrate viewpoint selection rules into a visual servo system to control a camera. In this work, we propose an integrated aerial filming system for autonomously capturing cinematic shots of action scenes on the basis of a set of demonstrations given for imitation. Our model, which is built on the basis of the deep deterministic policy gradient, takes a sequence of a subject's skeleton and the camera pose as input and outputs the camera motion with an optimal viewpoint related to the subject. In addition, we design a spatial attention network to selectively focus on the discriminative joints of the skeleton within each frame. Given the demonstrations with human motions, our framework learns to predict the next best viewpoint by imitating the demonstrations for viewing the motion of the subject. Extensive experimental results in simulated and real outdoor environments demonstrate that our method can successfully mimic the viewpoint selection strategy and capture a more accurate viewpoint than state-of-the-art autonomous cinematography methods.
AB - Viewpoint selection for capturing human motion is an important task in autonomous aerial videography, animation, and virtual 3-D environments. Existing methods rely on heuristics for selecting the 'best' viewpoint, which requires human effort to summarize and integrate viewpoint selection rules into a visual servo system to control a camera. In this work, we propose an integrated aerial filming system for autonomously capturing cinematic shots of action scenes on the basis of a set of demonstrations given for imitation. Our model, which is built on the basis of the deep deterministic policy gradient, takes a sequence of a subject's skeleton and the camera pose as input and outputs the camera motion with an optimal viewpoint related to the subject. In addition, we design a spatial attention network to selectively focus on the discriminative joints of the skeleton within each frame. Given the demonstrations with human motions, our framework learns to predict the next best viewpoint by imitating the demonstrations for viewing the motion of the subject. Extensive experimental results in simulated and real outdoor environments demonstrate that our method can successfully mimic the viewpoint selection strategy and capture a more accurate viewpoint than state-of-the-art autonomous cinematography methods.
KW - Cinematography system
KW - imitation filming
KW - unmanned aerial vehicles
KW - viewpoint control
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:000809402600017
UR - https://openalex.org/W3113355707
UR - https://www.scopus.com/pages/publications/85097939374
U2 - 10.1109/TCDS.2020.3043441
DO - 10.1109/TCDS.2020.3043441
M3 - Journal Article
SN - 2379-8920
VL - 14
SP - 403
EP - 413
JO - IEEE Transactions on Cognitive and Developmental Systems
JF - IEEE Transactions on Cognitive and Developmental Systems
IS - 2
ER -