Lehe Yu, Zhengxiu Gui
February 22, 2021
There are generally hundreds of millions of nodes in social media, and they are connected to a huge social network through attention and fan relationships. The news is spread through this huge social network. This paper studies the acquisition technology of social media topic data and enterprise data. The topic positioning technology based on Sina meta search and topic related keywords is introduced, and the crawling efficiency of topic crawlers is analyzed. Aiming at the factors of diverse and variable webpage structure on the Internet, this paper proposes a new Web information extraction algorithm by studying the general laws existing in the webpage structure, combining DOM (Document Object Model) tree and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. Several links in the algorithm are introduced in detail, including Web page processing, DOM tree construction, segmented text content acquisition, and web content extraction based on the DBSCAN algorithm. The simulation results show that the intelligence culture, intelligence system, technology platform and intelligence organization ecological collaboration strategy under the extraction of DOM tree and DBSCAN information can improve the level of intelligence participation of all employees. There is a significant positive correlation between the level of participation and the level of the intelligence environment of all employees. According to the research results, the DOM tree and DBSCAN information proposed in this paper can extract the enterprise’s employee intelligence and the effective implementation of relevant collaborative strategies, which can provide guidance for the effective implementation of the employee intelligence.