Hi! I am a postdoc at UC Berkeley, where I work with Pieter Abbeel on developing intelligent agents
that can interact with the physical
world.
Previously, I was a Research Scientist at Dyson Robot Learning Lab working with Stephen James.
I received my PhD from KAIST advised by Jinwoo Shin,
During my PhD, I was a visiting scholar at UC Berkeley working with Pieter Abbeel and Kimin Lee, collaborated with Honglak Lee at University of Michigan, and interned at
Microsoft Research Asia.
I am on the academic job market this year! 🎓🌟
Feel free to send me an e-mail if you want to have a chat! Contact: mail AT younggyo.me
I aim to develop an intelligent agent that can understand how the world works and improve itself
through online experiences. To this end, I am interested in studying representation learning to
extract representations from observations through which the agent perceives the world, world models with which the agent understands the
consequence of its own actions, and reinforcement
learning to enable the agent to actively look for new data to improve itself over
time. Through real-world experiments with
robots in collaboration with expert roboticists, I
have demonstrated the effectiveness of my algorithms in developing AI systems for real-world setups.
We introduce CoordTok, a scalable video tokenizer that learns a mapping from coordinate-based
representations to the corresponding patches of input videos.
We present Coarse-to-fine Q-Network with Action Sequence (CQN-AS), a value-based RL
algorithm that trains a critic network to output Q-values over a sequence of actions.
We present Coarse-to-fine Reinforcement Learning (CRL), which trains RL agents to zoom-into
continuous action space in a coarse-to-fine manner. Within this framework, we present
Coarse-to-fine Q-Network (CQN), a
value-based RL algorithm for continuous control.
We present BiGym, a new benchmark and learning environment for mobile bi-manual demo-driven
robotic manipulation. BiGym consists of 40 diverse tasks in home environment, and provides
human-collected
demonstrations.
We present RSP, a framework for visual representation learning from videos, that learns
representations that capture temporal information between frames by training a stochastic future
frame prediction model.
We introduce Render and Diffuse (R&D), a method that unifies low-level robot actions and RGB
observations within the image space using virtual renders of
the 3D model of the robot.
We introduce a new exploration technique that maximizes value-conditional state entropy, which
takes into account the value estimates of states for computing the intrinsic bonus.
We introduce RE3, an exploration technique for visual RL that utilizes a k-NN state entropy
estimate in the representation space of a randomly initialized & fixed CNN encoder.
We introduce T-MCL that learns multi-headed dynamics model whose each prediction head is
specialized in certain environments with similar dynamics, i.e., clustering environments.
We introduce CaDM that learns a context encoder to extract contextual information from
recent history with self-supervised losses of future and backward prediction.