A detailed reconstruction of the environment is a crucial component of mobile robotic systems and enables higher level scene understanding. To achieve information redundancy heterogenous sensors need to be used with each sensor having specific strengths and weaknesses. Therefore, the goal of this work is to fuse information from multiple lidars, radars, a stereo camera and semantic camera information into one common scene representation. In contrast to past publications, we focus on the combination of distance measurements and semantic estimates in the image domain in one common evidential framework. Grid maps are used as common fusion structure which enable efficient data processing. The approach is validated on an automated driving plattform in real traffic scenarios. Experiments show that the scene reconstruction precision increases while still retaining the real-time capability.