针对不平衡数据集的入侵检测算法An Intrusion Detection Algorithm for Imbalanced Datasets
陈美霞;郭躬德;黄杰;刘永芬;
摘要(Abstract):
采用少类样本合成过采样技术(SMOTE)与二叉树多类支持向量机(BTSVM)相结合的入侵检测算法来解决实际应用中经常遇到的类别不平衡的分类问题.该方法首先对不平衡类别的训练集使用BTSVM分类,然后对求出各分类器中的支持向量使用SMOTE方法进行向上采样,最后用不平衡类别的测试集在新的分类模型中进行测试.实验结果表明本算法能够有效地提高不平衡数据集的分类性能.
关键词(KeyWords): 不平衡数据;SMOTE;二叉树多类SVM;ROC
基金项目(Foundation): 教育部留学回国人员基金资助项目(教外司留[2008]890号)
作者(Authors): 陈美霞;郭躬德;黄杰;刘永芬;
参考文献(References):
- [1]Chawla N V,Japkowicz N,Kolcz A.Editorial:special issue on learning from imbalanced data sets[J].SIGKDDExplorations,2004,6(1):1-6.
- [2]Weiss G M.Mining with rarity:a unifying framework[J].SIGKDD Explorations,2004,6(1):7-19.
- [3]Kotsiantis S,Kanellopoulos D,Pintelas P.Handling imbalanced datasets:a review[J].GESTS InternationalTransactions on Computer Science and Engineering,2006,30(1):25-36.
- [4]Burez J,Van den Poel D.Handling class imbalance in customer churn prediction[J].Expert Systems withApplications,2008,36(3):4626-4636.
- [5]Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journalof Artificial Intelligence Reasearch,2002,16:321-357.
- [6]Dehmeshki J,Karakoy M,Casique M V.A rule-based scheme for filtering examples from majority class in animbalanced training set[C]∥Proc of MLDM,Berlin:Springer-Verlga,2003:215-223.
- [7]Margineantu D D,Dietterich T G.Bootstrap methods for the cost-sensitive evaluation of classifiers[C]∥Proc ofInternational Conference on Machine Learning,San Francisco:Morgan Kaufmann Publishers Inc,2000:583-590.
- [8]Manevitz L M,Yousef M,One-class VMs for document classification[J].Journal of Machine Learning Research,2001,2(1):139-154.
- [9]Veropoulos K,Campbell C,Cristianini N.Controlling the sensitivity of support vector machines[C]∥Proc ofInternational Joint Conference on AI,Sweden:Workshop ML3,1999:55-60.
- [10]Akbani R,Kwek S,Japkowice N.Applying support vector machines to imbalanced datasets[C]∥Proc of the15th European Conference on Machines Learning,Berlin:Springer,2004:39-50.
- [11]Sungmoon C,Sang H O,Soo-Young L.Support vector machines with binary tree architecture for multi-classclassification[J].Neural Information Processing-Letters and Reviews,2004,3(2):547-553.
- [12]Takahashi F,Abe S.Decision-tree-based multiclass support vector machines[C]∥In Proc.of the 9thInternational Joint Conference on Networks.Singapore:IEEE Press,2002:1418-1422.
- [13]Kubat M,Matwin S.Addressing the curse of imbalanced training sets:one-sided selection[C]∥Proc of 14thInternational Conference on Machine Learning(ICML),San Francisco:Morgan Kaufmann,1997:179-186.
- [14]Maloof M A.Learning when data sets are imbalanced and when costs are unequal and unknown[C]∥ICML-2003Workshop on Learning from Imbalanced Data Sets.Washington DC:AAA I Press,2003.
- [15]Raskutti B,Kowalczyk A.Extreme rebalancing for SVMs:a case study[J].SICKDD Explorations,2004,6(1):60-69.
- [16]Fung G,Mangasarian O L.Proximal Support Vector Machine Classifiers[R].Wisconsin:University ofWisconsin,2001.
- [17]Brefeld U,Scheffer T.AUC maximizing support vector learning[C]∥Proc of ICML Workshop on ROCAnalysis in Machine Learning,Bonn:Acm Press,2005.
- [18]Callut J,Dupont P.Fβsupport vector machines[C]∥Proc of International Joint Conference on NeuralNetworks.Montreal,2005.
- [19]Wu G,Chang E Y.KBA:kernel boundary alignment considering imbalanced data distribution[J].IEEETrans on Knowledge and Data Engineering,2005,17(6):786-795.
- [20]Merz C J,Murphy P M.UCI repository of machine learning databases[EB/OL].[2010-06-24].http:∥www.ics.uci.edu/mlearn/MLRepository.html.
- [21]Mukkamala S,Sung A H.Feature selection for intrusion detection using neural networks and support vector ma-chines[J].Transport Res Rec,2003,1822:33-39.
- [22]Elazmeh W,Japkowicz N,Matwin S.Evaluating misclassifications in imbalanced data[J].LNCS,2006,4212:126-137.