Roger Creus Castanyer

creus99@protonmail.com

PhD student @ Mila / UdeM


About me

I am a PhD student at Mila Québec & University of Montréal since Fall 2024, working with Professors Pablo Samuel Castro and Glen Berseth. I got my MSc at Mila Québec & University of Montréal in Montréal, Canada my BSc in Data Science and Engineering at Universitat Politècnica de Catalunya (UPC) in Barcelona, Spain.

From Barcelona, Spain
Currently in Montreal, Canada


Research

My research aims at developing autonomous agents that can operate in open-ended environments without human supervision. I am particularly interested in the intersection of Deep Reinforcement Learning (e.g. exploration, representation learning) and Foundation Models (e.g. LLMs, VLMs). I aim to bridge the gap between the high-level reasoning capabilities of Foundation Models and the low-level control capabilities of Reinforcement Learning agents to enable agents to learn and discover skills in a task-agnostic manner.

Interests

  • Deep Reinforcement Learning
  • Unsupervised Reinforcement Learning and Intrinsic Exploration
  • Foundation Models
  • Game AI

Experience

  • Research Intern @ Ubisoft LaForge (Montreal, Canada)
  • Junior Data Scientist @ HP Inc (Barcelona, Spain)
  • Teaching Assistant @ UPC School (Barcelona, Spain)
  • Research Intern @ UPC (Barcelona, Spain)
  • Basketball Coach @ Sagrada Familia Claror (Barcelona, Spain)

Publications

Publication Image 2

Improving Intrinsic Exploration by Creating Stationary Objectives

Roger Creus Castanyer, Joshua Romoff, Glen Berseth

Published at ICLR 2024

Accepted at Agent Learning in Open Endedness workshop @ NeurIPS 2023.


We identify that any intrinsic reward function derived from count-based methods is non-stationary and hence induces a difficult objective to optimize for the agent. The key contribution of our work lies in transforming the original non-stationary rewards into stationary rewards through an augmented state representation. We introduce the Stationary Objectives For Exploration (SOFE) framework. Our experiments show that SOFE improves the agents' performance in challenging exploration problems, including sparse-reward tasks, pixel-based observations, 3D navigation, and procedurally generated environments.


Publication Image 7

Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

Adriana Hugessen*, Roger Creus Castanyer*, Faisal Mohamed*, Glen Berseth

Published at RLC 2024

(Oral Presentation) Accepted at Intrinsically Motivated Open-ended Learning workshop @ NeurIPS 2023.


Both surprise-minimizing and surprise-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method can perform well across all entropy regimes. In an effort to find a single surprise-based method that will encourage emergent behaviors in any environment, we propose an agent that can adapt its objective depending on the entropy conditions it faces, by framing the choice as a multi-armed bandit problem. We devise a novel intrinsic feedback signal for the bandit which captures the ability of the agent to control the entropy in its environment. We demonstrate that such agents can learn to control entropy and exhibit emergent behaviors in both high-and low-entropy regimes.


Publication Image 7

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

Mingqi Yuan*, Roger Creus Castanyer*, Bo Li, Xin Jin, Wenjun Zeng, Glen Berseth

Accepted at RL Beyond Rewards workshop @ RLC 2024.


Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks. However, extrinsic rewards frequently fall short in complex environments due to the significant human effort needed for their design and annotation. This limitation underscores the necessity for intrinsic rewards, which offer auxiliary and dense signals and can enable agents to learn in an unsupervised manner. Although various intrinsic reward formulations have been proposed, their implementation and optimization details are insufficiently explored and lack standardization, thereby hindering research progress. To address this gap, we introduce RLeXplore, a unified, highly modularized, and plug-and-play framework offering reliable implementations of eight state-of-the-art intrinsic reward algorithms. Furthermore, we conduct an in-depth study that identifies critical implementation details and establishes well-justified standard practices in intrinsically-motivated RL. The source code for RLeXplore is available at https://github.com/RLE-Foundation/RLeXplore.


Publication Image 1

PixelEDL: Unsupervised Skill Discovery and Learning from Pixels

Roger Creus Castanyer, Juan José Nieto, Xavier Giró-i-Nieto

Accepted at Embodied AI workshop @ CVPR 2021.


We tackle embodied visual navigation in a task-agnostic set-up by putting the focus on the unsupervised discovery of skills that provide a good coverage of states. Our approach intersects with empowerment: we address the reward-free skill discovery and learning tasks to discover what can be done in an environment and how.


Publication Image 8

Unsupervised Skill-Discovery and Skill-Learning in Minecraft

Juan José Nieto, Roger Creus Castanyer, Xavier Giró-i-Nieto

Accepted at Unsupervised Reinforcement Learning workshop @ ICML 2021.


Pre-training Reinforcement Learning agents in a task-agnostic manner has shown promising results. However, previous works still struggle in learning and discovering meaningful skills in high-dimensional state-spaces, such as pixel-spaces. We approach the problem by leveraging unsupervised skill discovery and self-supervised learning of state representations.


Publication Image 5

PiCoEDL: Discovery and Learning of Minecraft Navigation Goals from Pixels and Coordinates

Juan José Nieto, Roger Creus Castanyer, Xavier Giró-i-Nieto

Accepted at Embodied AI workshop @ CVPR 2021.


Defining a reward function in Reinforcement Learning(RL) is not always possible or very costly. For this reason, there is a great interest in training agents in a task-agnostic manner making use of intrinsic motivations and unsupervised techniques. We hypothesize that RL agents will also benefit from unsupervised pre-trainings with no extrinsic rewards, analogously to how humans mostly learn, especially in the early stages of life.

Publication Image 4

Integration of Convolutional Neural Networks in Mobile Applications

Roger Creus Castanyer, Silverio Martínez-Fernández, Xavier Franch

Accepted at Workshop on AI Engineering @ ICSE 2021.


When building Deep Learning (DL) models, data scientists and software engineers manage the trade-off between their accuracy, or any other suitable success criteria, and their complexity. In an environment with high computational power, a common practice is making the models go deeper by designing more sophisticated architectures. However, in the context of mobile devices, which possess less computational power, keeping complexity under control is a must.



Publication Image 9

Which Design Decisions in AI-enabled Mobile Applications Contribute to Greener AI?

Roger Creus Castanyer, Silverio Martínez-Fernández, Xavier Franch

Published at EMSE Journal


The construction, evolution and usage of complex artificial intelligence (AI) models demand expensive computational resources. While currently available high-performance computing environments support well this complexity, the deployment of AI models in mobile devices, which is an increasing trend, is challenging. Our objective is to systematically assess the trade-off between accuracy and complexity when deploying complex AI models (e.g. neural networks) to mobile devices, which have an implicit resource limitation.


Publication Image 6

Enhancing sequence-to-sequence modelling for RDF triples to natural text

Oriol Domingo, David Bergés, Roser Cantenys, Roger Creus Castanyer, José Adrian Rodríguez Fonollosa

Accepted at WebNLG workshop 2021


This work establishes key guidelines on how, which and when Machine Translation (MT) techniques are worth applying to RDF-to-Text task. Not only do we apply and compare the most prominent MT architecture, the Transformer, but we also analyze state-of-the-art techniques such as Byte Pair Encoding or Back Translation to demonstrate an improvement in generalization.


Publication Image 10

MT-adapted datasheets for datasets: Template and repository

Marta Costa-jussà, Roger Creus Castanyer, Oriol Domingo, Albert Domínguez, Miquel Escobar, Cayetana López, Marina Garcia, Margarita Geleta

In this report we are taking the standardized model proposed by Gebru et al. (2018) for documenting the popular machine translation datasets of the EuroParl (Koehn, 2005) and News-Commentary (Barrault et al., 2019). Within this documentation process, we have adapted the original datasheet to the particular case of data consumers within the Machine Translation area. We are also proposing a repository for collecting the adapted datasheets in this research area.


Publication Image 3

Unsupervised Skill Learning from Pixels

Roger Creus Castanyer

BSc Thesis


This work focuses on the self-acquirement of the fundamental task-agnostic knowledge available within an environment. The aim is to discover and learn baseline representations and behaviors that can later be useful for solving embodied visual navigation downstream tasks.


Talks









Projects