2017年7月4日,百度开源了一款主题模型项目,名曰:Familia。
InfoQ记者第一时间联系到百度Familia项目负责人姜迪并对他进行采访,在本文中,他将为我们解析Familia项目的技术细节。
什么是Familia
Familia
开源项目包含文档主题推断工具、语义匹配计算工具以及基于工业级语料训练的三种主题模型:Latent Dirichlet
Allocation(LDA)、SentenceLDA 和Topical Word Embedding(TWE)。
文 件 列 表
Familia-master .gitignore .travis.yml AUTHORS LICENSE Makefile README.md build.sh depends.mk include model proto python run_doc_distance_demo.sh run_inference_demo.sh run_query_doc_sim_demo.sh run_show_topic_demo.sh run_topic_word_demo.sh run_word_distance_demo.sh src
SHOW FULL COLUMNS FROM `jrk_downrecords` [ RunTime:0.001819s ]
SELECT `a`.`aid`,`a`.`title`,`a`.`create_time`,`m`.`username` FROM `jrk_downrecords` `a` INNER JOIN `jrk_member` `m` ON `a`.`uid`=`m`.`id` WHERE `a`.`status` = 1 GROUP BY `a`.`aid` ORDER BY `a`.`create_time` DESC LIMIT 10 [ RunTime:0.084742s ]
SHOW FULL COLUMNS FROM `jrk_tagrecords` [ RunTime:0.002300s ]
SELECT * FROM `jrk_tagrecords` WHERE `status` = 1 ORDER BY `num` DESC LIMIT 20 [ RunTime:0.002959s ]
SHOW FULL COLUMNS FROM `jrk_member` [ RunTime:0.002164s ]
SELECT `id`,`username`,`userhead`,`usertime` FROM `jrk_member` WHERE `status` = 1 ORDER BY `usertime` DESC LIMIT 10 [ RunTime:0.003521s ]
SHOW FULL COLUMNS FROM `jrk_searchrecords` [ RunTime:0.001933s ]
SELECT * FROM `jrk_searchrecords` WHERE `status` = 1 ORDER BY `num` DESC LIMIT 5 [ RunTime:0.003271s ]
SELECT aid,title,count(aid) as c FROM `jrk_downrecords` GROUP BY `aid` ORDER BY `c` DESC LIMIT 10 [ RunTime:0.014242s ]
SHOW FULL COLUMNS FROM `jrk_articles` [ RunTime:0.002141s ]
UPDATE `jrk_articles` SET `hits` = 1 WHERE `id` = 200797 [ RunTime:0.001101s ]