An modeling other agents (MOA) constructs a model of other agents in every agent. It enables the agents to predict the actions of other agents and achieve coordinated and effective interactions in multi-agent systems. However, the relationship between the executed and predicted actions of agents is vague and diverse. To clarify the relationship, we proposed a method by which an agent through communications constructs its MOA using the historical data of other agents and asymmetrically treats itself and its MOA in a non-cooperative game to obtain Stackelberg equilibrium (SE). Subsequently, the SE are used to choose actions. We experimentally demonstrated that, in a partially observable and mixed cooperative-competitive environment, agents using our method with reinforcement learning could establish better coordination and engage in behaviors that are more appropriate compared to conventional methods. We then analyzed the coordinated interaction structure generated in the trained network to clarify the relationship between individual agents. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.