@ShenaoZhang
πExcited to share our recent research:π βLearning to Reason as Action Abstractions with Scalable Mid-Training RLβ We theoretically study ππ€π¬ π’ππ-π©π§πππ£ππ£π π¨πππ₯ππ¨ π₯π€π¨π©-π©π§πππ£ππ£π ππ. The findings lead to a scalable algorithm for learning action hierarchies from expert demonstrations, which we successfully apply to ππ½ Python code data. A thread:π§΅