Appendix A - Dataset

Appendix A.1: Dataset Scenes and Objects

The dataset encompasses three primary scenes: kitchen, living room, and tabletop settings.

1
Kitchen
2
Living Room
3
Tabletop Settings

It includes a variety of common household items, such as kitchen utensils, daily necessities, fruits and vegetables, drinks and snacks, stationery, books, clothing and toys.

Objects in the dataset

A.2 Dataset Samples

Semantic Grounding Dataset

Messages Image
User: What is the state of the cutting board?
Assistant: The brown cutting board stands next to the wall.
Sample Image
User: What is the state of the cutting board?
Assistant: The brown cutting board is on the counter.
Sample Image
Messages Image
User: Where is pitaya in the image?
Assistant: In the image, pitaya is on the counter.
Sample Image
User: Where is pitaya in the image?
Assistant: In the image, pitaya is on the fridge.
Sample Image

Cue Generation Dataset

Messages Image
User: Lisa wants the robot to organize items. If you were Lisa, what reminders would you give to the robot?
Assistant: Put all vegetables in the fridge. The fridge is already open. Close the fridge after putting items inside. Put the pot in the sink.
Sample Image
User: Zak wants the robot to sauté carrot slices. If you were Zak, what reminders would you give to the robot?
Assistant: The brown cutting board stands next to the wall. You need to lay it flat on the table first. You need to find and grab a knife before cutting. You should put the knife on the counter after cutting. The pan is already on the stove. Grab the carrot slices before putting them in the pan.
Sample Image
User: Lisa wants the robot to organize the table. If you were Lisa, what reminders would you give to the robot?
Assistant: Put the stationery on the first shelf. Place stationery into the pen holder before arranging them. Put documents and books on the second shelf.
Sample Image
User: Zak wants the robot to put things in boxes. If you were Zak, what reminders would you give to the robot?
Assistant: Put the toys in the blue box. Place the light-colored clothes in the transparent box. Put the dark clothes in the white box.
Sample Image