This course explores cutting-edge topics in vision-language models (VLMs) and AI-generated content (AIGC). In the first half, we focus on VLMs and their foundations: covering models like LLAVA, InternVL, QWen, and more advanced architectures such as mixture-of-experts and retrieval-augmented models. Topics include reasoning, few-shot learning, in-context learning, chain-of-thought, and associated challenges in scalability and ethics. Besides image-level understanding, this part also sheds light on VLMs for video-level understanding, e.g., event/action captioning and localization. The second half shifts focus to AIGC, exploring how generative models revolutionize content creation across text, image, video, and 3D. We will analyze their technical mechanisms, including model architectures and system design, as well as broader societal implications, such as ethical risks and creative disruption. Throughout the course, students will read and present key papers, lead discussions, and complete a hands-on research project. This advanced graduate course requires prior knowledge of machine learning, NLP, and deep learning architectures such as Transformers and Diffusion Models.