1. <li id="egwb6"></li>

      <div id="egwb6"><span id="egwb6"><u id="egwb6"></u></span></div>
      <div id="egwb6"><strike id="egwb6"><kbd id="egwb6"></kbd></strike></div>

      <thead id="egwb6"></thead>

          1. 2019-05-31 | Yi Wu:On Building Generalizable Learning Agents



            Despite the recent practical successes by deep reinforcement learning (DRL), one critical issue for existing DRL works is generalization. The learned neural policy can be extremely specialized to the training scenarios and easily fail even when the agent is tested in a scenario slightly different from the training ones. In contrast, humans have the ability to adapt its learn skills to unseen situations easily without further training. Such generalization challenge indicates a fundamental gap towards our ultimate goal of building agents with artificial general intelligence (AGI).


            This talk presents progresses on this challenge. We found one of the solutions is to enable the learning agents with long-term planning abilities. We first describe why an agent with a simple feed-forward policy fails to generalize well even on simple tasks, and then propose plannable policy representations on both fully observable and partially observable settings. We empirically show that by simply augmenting those conventional neural agents with our proposed planning modules, the generalization performances can be significantly improved on a variety of tasks, including real-world applications in both language and vision domains.






            Yi Wu is now a researcher at OpenAI Inc. and will join institute of interdisciplinary information sciences (IIIS), Tsinghua university, as an assistant professor in 2020 fall. He recently earned his PhD degree from UC Berkeley under the supervision of Prof. Stuart Russell.


            His research focuses on improving the generalization ability of AI systems. He is broadly interested in a variety topics in AI, including deep reinforcement learning, natural language processing and probabilistic programming. His work, Value Iteration Network, won the best paper award at NIPS 2016.