數據挖掘的數據集資源

來自互聯網:

1、氣候監測數據集 http://cdiac.ornl.gov/ftp/ndp026b

2、幾個實用的測試數據集下載的網站

http://www.cs.toronto.edu/~roweis/data.html
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的網址可以找到reuters數據集http://www.research.att.com/~lewis/reuters21578.html

以下網址上有各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html

進行文本分類,還有一個數據集是可以用的,即rainbow的數據集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

3、找了很多測試數據集,寫論文的同志們肯定需要的,至少能用來檢驗算法的效果
可能有一些不能訪問,但是總有能訪問的吧:

UCI收集的機器學習數據集
ftp://pami.sjtu.edu.cn/
http://www.ics.uci.edu/~mlearn//MLRepository.htm

statlib 
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
http://lib.stat.cmu.edu/

樣本數據庫
http://kdd.ics.uci.edu/
http://www.ics.uci.edu/~mlearn/MLRepository.html

關於基金的數據挖掘的網站
http://www.gotofund.com/index.asp

http://lans.ece.utexas.edu/~strehl/

reuters數據集
http://www.research.att.com/~lewis/reuters21578.html

各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html
http://lib.stat.cmu.edu/datasets/
http://dctc.sjtu.edu.cn/adaptive/datasets/ 
http://fimi.cs.helsinki.fi/data/
http://www.almaden.ibm.com/software/quest/Resources/index.shtml
http://miles.cnuce.cnr.it/~palmeri/datam/DCI/

進行文本分類&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html


時間序列數據的網址
http://www.stat.wisc.edu/~reinsel/bjr-data/

apriori算法的測試數據
http://www.almaden.ibm.com/cs/quest/syndata.html

數據生成器的鏈接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
http://www.almaden.ibm.com/cs/quest/syndata.html


關聯:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData

WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2。A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3。A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar

癌症基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

金融數據:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm

 

另一個人提供的
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的網址可以找到reuters數據集
http://www.research.att.com/~lewis/reuters21578.html

以下網址上有各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html

進行文本分類,還有一個數據集是可以用的,即rainbow的數據集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html


Download the Financial Data (~17.5M zipped file, ~67M unzipped data) 
Download the Medical Data (~2M zipped file, ~6M unzipped data)
http://lisp.vse.cz/pkdd99/Challenge/chall.htm


kdnuggets 相關鏈接數據集:
http://www.kdnuggets.com/datasets/index.html

還有另外一個很好的資源網址爲:http://kdd.ics.uci.edu/,裏面包含的數據資源如下(按應用領域劃分):

Direct Marketing 
  KDD CUP 1998 Data 

GIS 
  Forest CoverType 

Indexing 
  Corel Image Features 
  Pseudo Periodic Synthetic Time Series 

Intrusion Detection 
  KDD CUP 1999 Data 

Process Control 
  Synthetic Control Chart Time Series 

Recommendation Systems 
  Entree Chicago Recommendation Data 

Robots 
  Pioneer-1 Mobile Robot Data 
  Robot Execution Failures 

Sign Language Recognition 
  Australian Sign Language Data 
  High-quality Australian Sign Language Data 

Text Categorization 
  20 Newsgroups Data 
  Reuters-21578 Text Categorization Collection 
  NSF Research Awards Abstracts 199 0-2003 

World Wide Web 
  Microsoft Anonymous Web Data 
  MSNBC Anonymous Web Data 
  Syskill Webert Web Data 

這裏又找到一個,在一個老外的blog上找到的。(兒童節前一天)
http://www.fs.fed.us/fire/fuelman/

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章