.
柏建普,田芳
(内蒙古科技大学信息与工程学院,内蒙古包头014010)
关键词:语义分析微博;热点;话题发现
中图分类号:TP391 文献标识码:A
摘要:近年来,微博热点话题发现已经成为当前网络舆情分析研究的热点. 本文针对微博信息的碎片化、口语化等短文本特点,为解决向量空间模型(VSM)文本表示方法存在高维度、稀疏,及同义多义等问题,采用潜在语义分析法对微博信息进行建模,再通过贝叶斯分类算法实现话题发现.并采用J2EE 开发包及Eclipse 集成开发环境,结合Hibernate,Lucene 等技术实现了微博热点话题发现系统,实验表明这种方法是有效的.
Research of micro-blog’s hot topic detection technology based on semantic analysis
BAI Jian-pu,TIAN Fang
(Information Science and Engineering School,Inner Mongolia University of Science and Technology,Baotou 014010,china)
Key words:semantic analysis micro blogs;hot topics;topic detection
Abstract:The hot topics of micro-blog detecting has become the current research focuses of Internet public opinion information. In order to solve the existing problems of high-dimension,sparse,synonymy and polysemy from the Vector Space Model (VSM) text presentation,the micro-blog information model was developed using LSA for the short texts of the fragment,colloquial micro blog information,then the topic detection was achieved through the Bayesian classification algorithm. Furthermore,the micro blog topic detecting system was constructed by adopting software developer's kit J2EE,the integrated development environment Eclipse and techniques such as Hibernate
and Lucene,and the operation of the system was proved to be effective.