Realistic rendering of human behavior is of great interest for applications such as video animations, virtual reality and gaming engines. Commonly animations of persons performing actions are rendered by articulating explicit 3D models based on sequences of coarse body shape representations simulating a certain behavior. While the simulation of natural behavior can be efficiently learned, the corresponding 3D models are typically designed in manual, laborious processes or reconstructed from costly (multi-)sensor data. In this work, we present an approach towards a holistic learning framework for rendering human behavior in which all components are learned from easily available data. To enable control over the generated behavior, we utilize motion capture data and generate realistic motions based on user inputs. Alternatively, we can directly copy behavior from videos and learn a rendering of characters using RGB camera data only. Our experiments show that we can further improve data efficiency by training on multiple characters at the same time. Overall our approach shows a new path towards easily available, personalized avatar creation.