AI Security (CSED490H)

A practical red-teaming course covering modern AI systems, attack methods, and optimization tools for AI security research.

Objective

As AI advances and becomes practical at scale, safety and security concerns are rapidly emerging. In this course, we learn the art of attacking AI systems together with the core concepts and tools needed in modern AI.

In particular, we study two core axes: victim models (e.g., LLMs, VLAs, and Agentic AI) and attack methods (e.g., adversarial examples and jailbreaking), along with optimization tools such as gradient descent, policy optimization, and prompt tuning with LoRA.

By the end of this class, students will have a strong understanding of trendy AI model families, broad AI red teaming methods, and practical AI tooling required for security research and engineering.

Course Snapshot

Credits3-0-3
Time / LocationThu 14:00–15:15 / Building 2 - 109
PrerequisiteArtificial Intelligence
Assessment80% Projects · 20% Participation

Course Staff

Instructor

Prof. Sangdon Park

Prof. Sangdon Park

Assistant Professor
Graduate School of Artificial Intelligence (GSAI)
Department of Computer Science and Engineering (CSE)
POSTECH

Teaching Assistant

TA Sechan Lee

Sechan Lee

Teaching Assistant for CSED490H. Supporting course operations, student discussions, and technical guidance on assignments and project milestones.

Email: chan1031@postech.ac.kr

Course Snapshot

Credits3-0-3
Time / LocationThu 14:00–15:15 / Building 2 - 109
PrerequisiteArtificial Intelligence
Assessment80% Projects · 20% Participation

Schedule

Week Topics
1 Introduction to AI Security
Course Logistics
2Preliminary: Neural Networks / SGD
Inference-time Attacks: Adversarial Examples / Adversarial Patches / Transfer Attacks
3Preliminary: Transformers / LLMs / LCMs / LRMs
Preliminary: RAG
4Student Presentation and Discussion on HW 1
5Preliminary: Diffusion Models
Preliminary: Vision-Language-Action Models
6Preliminary: Optimization for Whitebox Victim Models -- Prompt tuning methods (e.g., LoRA)
Preliminary: Optimization for Blackbox Victim Models -- Zero-th Order Optimization
7Preliminary: Optimization for Blackbox Victim Models -- RL / Policy Optimization
Inference-time Attacks: Prompt Leaking, Prompt Injection, Jailbreaking
8Preliminary: Agentic AI / Tool-calling Agents
Inference-time Attacks: Current Trends on Red Teaming
9Student Presentation and Discussion on HW 2
10Introduction to OpenClaw
11Training-set Attacks: membership inference attacks
Training-set Attacks: data poisoning attacks
12Model Attacks: model extraction attacks
13Final Remarks: Overview on defense methods
14Student Presentation and Discussion on Final Projects
15Student Presentation and Discussion on Final Projects