This course focuses on deep learning models and their applications in computer vision multimodal data. Specifically, the topics include various deep neural network architectures with applications in computer vision, natural language processing, and graph data. Some of the latest deep learning models will be introduced, including diffusion models, transformers, graph neural networks, and normalizing flows. The students have the opportunity to implement deep learning models for practical tasks such as visual perception, generative AI, graph processing, and 3D vision.