Appendix C - AlignBot

C.1. Task Details

Task Description
put the apple in fridge Put the apples in the fridge. If there is a bowl on the table, need to put the apples in the bowl before placing them in the fridge.
boil water Fill the pot with water and boil it. Turn off the stove after completing the task.
cut carrot Cut carrots on cutting board.
heat the pot Heat a pot on the stove. Turn off the stove after completing the task.
heat the potato Microwave potatoes. If there is a bowl on the table, need to put the potatoes in the bowl before placing them in the Microwave.
put the apple in bowl Put the apples on the table into the bowl.
put all the fruits in the bowl Put the fruits on the table into the bowl.
put pear in the Drawer Put the pears into the white drawer.
Saute carrot slices Cut carrots on cutting board. Then fry the carrot slices in the pan.
Organize items Place items on the counter at designated locations.
put things in drawer Place items on the counter at designated drawer.
give me something to drink According to the user's preferences, get the user's favorite drinks.
give me some snacks According to the user's preferences, get the user's favorite snacks.
give me my cup According to the user's preferences, get the user's favorite cups.
give me salt Give the user the container with salt.
give me sugar Give the user the container with sugar.
give me vinegar Give the user the bottle with vinegar.
give me soy sauce Give the user the bottle with soy sauce.
Organize the table Put items on the table onto the shelves.
put things in box Put items on the floor into designated boxes.

C.2. Human Rater Evaluation Criteria

Points Rating Guidelines
0 Cues are so irrelevant to the scene that it has no corrective effect at all.
1 Although there are some errors, cues contain partially correct item status reminders, personalized reminders, or planning step reminders.
2 There are no errors, but cues only contain partially correct item status reminders, personalized reminders or planning step reminders.
3 Cues accurately include item status reminders, personalized reminders and planning step reminders.

C.3. Detailed Experimental Results

Table 1

Method put apple in fridge boil water cut carrot heat pot heat potato put apple in bowl put fruits in bowl put pear in Drawer Saute carrot slices
GPT4o only 0% 13.3% 36.7% 46.7% 3.3% 40% 56.7% 0% 6.7%
GPT4o with memory base 33.3% 26.7% 40% 16.7% 13.3% 36.7% 46.7% 6.7% 13.3%
LLaMA2-7b with GPT4o 13.3% 13.3% 3.3% 3.3% 10% 40% 6.7% 20% 20%
AlignBot 90% 83.3% 96.7% 96.7% 80% 93.3% 73.3% 93.3% 56.7%

Table 2

Method Organize items put things in drawer give something to drink give snacks give cup give sugar give vinegar Organize table put things in box
GPT4o only 0% 0% 30% 33.3% 30% 3.3% 20% 0% 0%
GPT4o with memory base 10% 3.3% 40% 36.7% 43.4% 80% 83.3% 40% 26.7%
LLaMA2-7b with GPT4o 3.3% 6.7% 56.7% 40% 96.7% 96.7% 93.3% 43.4% 80%
AlignBot 70% 76.7% 96.7% 90% 96.7% 96.7% 96.7% 86.7% 93.3%