This paper presents a proposal for object detection as a first stage for the analysis of Human-Object Interaction (HOI) in the context of automated functional assessment. The proposed system is based in a two-step strategy, thus, in the first stage there are detected the people in the scene, as well as large objects (table, chairs, etc.) using a pre-trained YOLOv8. Then, there is defined a ROI around each person that is processed using a custom YOLO to detect small elements (forks, plates, spoons, etc.). Since there are no large image datasets that include all the objects of interest, there has also been compiled a new dataset including images from different sets, and improving the available labels. The proposal has been evaluated in the novel dataset, and in different images acquired in the area in which the functional assessment is performed, obtaining promising results. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.