Discussions about the adult L2 learning capacity often take as their starting point stages where considerable L2 knowledge has already been accumulated. This paper probes the absolute earliest stages of learning and investigates what lexical knowledge adult learners can extract from complex, continuous speech in an unknown language after minimal exposure and without any help. Dutch participants were exposed to naturalistic but controlled audiovisual input in Mandarin Chinese, in which item frequency and gestural highlighting were manipulated. The results from a word recognition task showed that adults are able to draw on frequency to recognize disyllabic words appearing only eight times in continuous speech. The findings from a sound-to-picture matching task revealed that the mapping of meaning to word form requires a combination of cues: disyllabic words accompanied by a gesture were correctly assigned meaning after eight encounters. Overall, the study suggests that the adult learning mechanism is a considerably more powerful than typically assumed in the SLA literature drawing on frequency, gestural cues and syllable structure. Even in the absence of pre-existing knowledge about cognates and sound system to bootstrap and boost learning, it deals efficiently with very little, very complex input.