CSpace  > 大数据挖掘及应用中心
Forest cover types classification based on online machine learning on distributed cloud computing platforms of storm and SAMOA
Li, Guang Di; Wang, Guo Yin; Zhang, Xue Rui; Deng, Wei Hui; Zhang, Fan
2014
摘要Storm is the most popular realtime stream processing platform, which can be used to deal with online machine learning. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. SAMOA includes distributed algorithms for the most common machine learning tasks like Mahout for Hadoop. SAMOA is both a platform and a library. In this paper, Forest cover types, a large benchmaking dataset available at the UCI KDD Archive is used as the data stream source. Vertical Hoeffding Tree, a parallelizing streaming decision tree induction for distributed enviroment, which is incorporated in SAMOA API is applied on Storm platform. This study compared stream prcessing technique for predicting forest cover types from cartographic variables with traditional classic machine learning algorithms applied on this dataset. The test then train method used in this system is totally different from the traditional train then test. The results of the stream processing technique indicated that it's output is aymptotically nearly identical to that of a conventional learner, but the model derived from this system is totally scalable, real-time, capable of dealing with evolving streams and insensitive to stream ordering. © (2014) Trans Tech Publications, Switzerland.
语种英语
DOI10.4028/www.scientific.net/AMR.955-959.3803
会议(录)名称3rd International Conference on Energy and Environmental Protection, ICEEP 2014
页码3803-3812
收录类别EI
会议地点Xi'an, China
会议日期April 26, 2014 - April 28, 2014