AI Security (CSED490H)
A practical red-teaming course covering modern AI systems, attack methods, and optimization tools for AI security research.
Objective
As AI advances and becomes practical at scale, safety and security concerns are rapidly emerging. In this course, we learn the art of attacking AI systems together with the core concepts and tools needed in modern AI.
In particular, we study two core axes: victim models (e.g., LLMs, VLAs, and Agentic AI) and attack methods (e.g., adversarial examples and jailbreaking), along with optimization tools such as gradient descent, policy optimization, and prompt tuning with LoRA.
By the end of this class, students will have a strong understanding of trendy AI model families, broad AI red teaming methods, and practical AI tooling required for security research and engineering.
Course Snapshot
Course Staff
Instructor
Prof. Sangdon Park
Assistant Professor
Graduate School of Artificial Intelligence (GSAI)
Department of Computer Science and Engineering (CSE)
POSTECH
Teaching Assistant
Sechan Lee
Teaching Assistant for CSED490H. Supporting course operations, student discussions, and technical guidance on assignments and project milestones.
Email: chan1031@postech.ac.kr
Course Snapshot
Schedule
| Week | Topics |
|---|---|
| 1 |
Introduction to AI Security
Course Logistics |
| 2 | Preliminary: Neural Networks / SGD Inference-time Attacks: Adversarial Examples / Adversarial Patches / Transfer Attacks |
| 3 | Preliminary: Transformers / LLMs / LCMs / LRMs Preliminary: RAG |
| 4 | Student Presentation and Discussion on HW 1 |
| 5 | Preliminary: Diffusion Models Preliminary: Vision-Language-Action Models |
| 6 | Preliminary: Optimization for Whitebox Victim Models -- Prompt tuning methods (e.g., LoRA) Preliminary: Optimization for Blackbox Victim Models -- Zero-th Order Optimization |
| 7 | Preliminary: Optimization for Blackbox Victim Models -- RL / Policy Optimization Inference-time Attacks: Prompt Leaking, Prompt Injection, Jailbreaking |
| 8 | Preliminary: Agentic AI / Tool-calling Agents Inference-time Attacks: Current Trends on Red Teaming |
| 9 | Student Presentation and Discussion on HW 2 |
| 10 | Introduction to OpenClaw |
| 11 | Training-set Attacks: membership inference attacks Training-set Attacks: data poisoning attacks |
| 12 | Model Attacks: model extraction attacks |
| 13 | Final Remarks: Overview on defense methods |
| 14 | Student Presentation and Discussion on Final Projects |
| 15 | Student Presentation and Discussion on Final Projects |