Open/Public Dataset Repositories

Open/Public Dataset dapat digunakan untuk beberapa keperluan, diantaranya:
    1. Penelitian yang mengusulkan (propose) model/algoritma baru. Kita membutuhkan open/public dataset sebagai benchmark model yang kita ajukan dengan model (state of the art) yang ada sebelumnya.
    2. Saat mengerjakan project atau prototyping suatu aplikasi/system.
    3. Saat mengajar atau memberi tugas pelajaran terkait data (e.g. Statistika atau Data Mining).
    4. Untuk latihan/belajar saat mempelajari berbagai metode/algoritma yang ada di data science, machine learning/AI, dsb.
Berikut ini adalah kompilasi dataset repositories untuk tujuan-tujuan diatas atau tujuan lain. Jika mengetahui sumber data lain yang mungkin bermanfaat bagi orang lain, silahkan komentar di bawah halaman ini.
Basis Data Indonesia:
  1. Dataset Indonesia [data.go.id]
  2. Badan Pusat Statistik (BPS): (Hanya summary data).
  3. UN unglobalpulse research Data on Indonesia
  4. Global Open data Index Indonesia
  5. OECD Data Indonesia
  6. Harga Pangan
  7. Harga Komoditi
Basis Data (Datasets) Umum:
  1. Google Dataset search Engine
  2. Microsoft Research Open Datasets
  3. Kaggle Datasets
  4. Kumpulan Dataset dikategorikan berdasarkan bidang ilmu: misal Agriculture, Biologi, finance, dll (ada >20 bidang ilmu!).
  5. Amazon Open Data
  6. UC Irvine Machine Learning Repository
  7. Data Time Series berbagai Crypto Currency
  8. CERN Dataset (up to 2 Terabytes!)
  9. National Flight Data Center (NFDC)
  10. FAA Data & Research
  11. Flight Delay Information
  12. FAA Aviation Safety Information Analysis and Sharing (ASIAS)
  13. Aircraft Situation Display to Industry (ASDI)
  14. NTSB Accident Database & Synopses
  15. OpenFlights.org
  16. The Center for Innovation in Engineering and Science Education Real time data sites
  17. MIT Airline Data Project
  18. Space – Real-Time Space Weather Data Sources
  19. Politics – Data on the U.S. Congress – A Joint Effort from Brookings and the American Enterprise Institute
  20. Sports – Open Sports Data/API
  21. Sports – Football (Soccer) Stats
  22. Government  – Public Government Data Sets
  23. U.S. Department of Homeland Security Data
  24. Public Data for the State of Utah
  25. Finding Data on the Internet – Inside-R
  26. Nathan Yau’s collection of data sets
  27. Dr. Jerry A. Smith’s Favorite Data sets
  28. Hilary Mason’s “Research Quality” Data-sets
  29. Peter Skomoroch’s list of data sets on Delicious
  30. Data Wrangling blog data set list
  31. DonorsChoose.org – Hacking Education: A Contest for Developers and Data Crunchers
  32. Datasets for “The Elements of Statistical Learning”
  33. Enron Email Dataset
  34. Yandex
  35. The Data Page
  36. Public Data Sets on Amazon
  37. Miami School of Business Statistical Data Sets
  38. Public data put to good use
  39. ASU GeoDA Center Data
  40. European Cities 1M Data Sets
  41. University of Edinburgh School of Informatics Data Sets for Data Mining
  42. Opinion Mining, Sentiment Analysis, and Opinion Spam Detection
  43. Quandl – Intelligenct search for numerical data
  44. Gephi Graph Visualization Sample Data Sets
  45. CitiBike, by NYC Bike Share – Station data
  46. Large Datasets 
  47. Air Quality Notifications
  48. The GDELT Project – Global Database of Events, Language, and Tone
  49. http://www.kdnuggets.com/datasets/index.html
  50. http://goo.gl/9eNqFq  [more from KDNugets]
  51. http://archive.ics.uci.edu/ml/
  52. http://www.stat.ucla.edu/data/
  53. http://lib.stat.cmu.edu/
  54. http://www.umass.edu/statdata/statdata/
  55. http://datamarket.com/data/list/?q=provider:tsdl
  56. http://lib.stat.cmu.edu/DASL/
  57. http://www.statsci.org/data/index.html
  58. http://trec.nist.gov/data.html
  59. http://graphlab.org/resources/datasets.html
  60. http://www.scaleunlimited.com/datasets/public-datasets/
  61. http://www.datawrangling.com/some-datasets-available-on-the
  62. web
  63. http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets0405.html
  64. http://pami.uwaterloo.ca/~hammouda/webdata/
  65. http://www.daviddlewis.com/resources/testcollections/reuters21578/
  66. http://dumps.wikimedia.org/
  67. http://www.cs.cmu.edu/~WebKB/
  68. http://www.uco.es/~in1rosaj/utiles/datasets.html
  69. http://www.ke.tu-darmstadt.de/resources/eurlex/eurlex.html
  70. KEEL
Semoga bermanfaat,
Cheers,
</TES>®

Tidak ada komentar:

Posting Komentar

Relevant & Respectful Comments Only.