首页 > 最新目录 > 正文

12.基于LDA的专业个人微博事件提取

日期:2015-10-19 15:21:11 点击:

基于LDA的专业个人微博事件提取

高永兵1 熊振华1 

(内蒙古科技大学信息工程学院内蒙古 包头 014010)1

关键词:专业个人微博;LDA;相似度;事件提取

中图分类号:TP399  文献标识码:A   

摘要:为了自动识别出博主的专业兴趣活动,提出了一种基于LDA的专业个人微博事件提取算法。该算法利用改进的TF-IDF来提取特征词以及采用LDA为语料库建模,以此来挖掘出不同主题和词之间的关系,使得权重较大的词更能反映出微博主题。进一步得到了微博在各主题下的概率分布,并结合时间相似度来计算微博之间的综合相似度,最后通过改进的K-Means聚类并与人工数据作比对。实验结果验证了该算法的有效性,同时表明了该算法可以有结构、有条理的呈现出人们感兴趣的微博事件。

TechnicalAlgorithm to extract Individual Micro-blog Events on the Basis of LDA

GAO yong-bing1  XIONG zhen-hua1

(Information Engineering School , Inner MongoliaUniversity of Science and Technology, Baotou Inner Mongolia 014010, China)1

Keyword Professional individual micro-blog; LDA; Similarity;Event extraction

Abstract A technical algorithm was proposed on the basis of LDA to extractprofessional individual micro-blog events in order to identify the certaininterest of professional bloggers automatically. This algorithm will use theimproved TF-IDF algorithm to pick up key words. Besides, it can rely on the LDAmodeling corpus to dig out the various relationship between certain themes andrelevant words. In this case, the words of more weight can reflect the blogtheme more distinctly, which will not only be helpful to find out theprobability of different micro blogs under the identical theme, but also tocalculate the proximity of diverse blogs in the light of time similarity.Ultimately, the updated K-Means clustering was used to make a comparisonbetween the final data and the artificial data, turning out that there ishardly any errors. As shown by the experimental results , it is both pragmaticand efficient to use this algorithm to present the structure of theseinteresting blog aggregates logically.



收稿日期:

作者简介:高永兵(1974-),内蒙古包头市人内蒙古科技大学副教授

地址:内蒙古包头市昆都仑区阿尔丁大街7号 邮编:014010 电话:0472-5951610或0472-5953910 Email:cky@imust.edu.cn nkdxb@imust.edu.cn

版权所有:内蒙古科技大学学报编辑部(©2013)