Spatial perspective-taking is predicted to include two crucial processes: the detachment of representational self and the self-representation movement. This study shows the features of these processes in three age groups: 36 children aged 5–6 years (13 girls), 29 students aged 19–24 years (20 women), and 33 adults aged 60–84 years (14 women). Participants performed a video game task of spatial perspective-taking. Their response times and eye movements were measured. Reaction latency (RL) data were gathered from the stimulus presentation to the beginning of the gaze movement as the detachment. The remaining time (RT) calculated from the perspective of operation time minus RL was measured as the self-representation movement. A two-way mixed-design analysis of variance (ANOVA) was conducted on the RTs. Significant main effects of age group in RTs revealed that the child group was significantly slower than the students’ and older adults’ groups. Older adults were significantly slower than students. In a two-way mixed design ANOVA conducted on the RLs, a significant main effect of age group in RLs revealed that the child group was significantly slower than the students’ and older adults’ groups. The results suggest that the core of spatial perspective-taking comprises the anticipated processes.