Open/Public Dataset dapat digunakan untuk beberapa keperluan, diantaranya:
-
- Penelitian yang mengusulkan (propose) model/algoritma baru. Kita membutuhkan open/public dataset sebagai benchmark model yang kita ajukan dengan model (state of the art) yang ada sebelumnya.
- Saat mengerjakan project atau prototyping suatu aplikasi/system.
- Saat mengajar atau memberi tugas pelajaran terkait data (e.g. Statistika atau Data Mining).
- Untuk latihan/belajar saat mempelajari berbagai metode/algoritma yang ada di data science, machine learning/AI, dsb.
Berikut ini adalah kompilasi dataset repositories untuk tujuan-tujuan diatas atau tujuan lain. Jika mengetahui sumber data lain yang mungkin bermanfaat bagi orang lain, silahkan komentar di bawah halaman ini.
Basis Data Indonesia:
- Dataset Indonesia [data.go.id]
- Badan Pusat Statistik (BPS): (Hanya summary data).
- UN unglobalpulse research Data on Indonesia
- Global Open data Index Indonesia
- OECD Data Indonesia
- Harga Pangan
- Harga Komoditi
Basis Data (Datasets) Umum:
- Google Dataset search Engine
- Microsoft Research Open Datasets
- Kaggle Datasets
- Kumpulan Dataset dikategorikan berdasarkan bidang ilmu: misal Agriculture, Biologi, finance, dll (ada >20 bidang ilmu!).
- Amazon Open Data
- UC Irvine Machine Learning Repository
- Data Time Series berbagai Crypto Currency
- CERN Dataset (up to 2 Terabytes!)
- National Flight Data Center (NFDC)
- FAA Data & Research
- Flight Delay Information
- FAA Aviation Safety Information Analysis and Sharing (ASIAS)
- Aircraft Situation Display to Industry (ASDI)
- NTSB Accident Database & Synopses
- OpenFlights.org
- The Center for Innovation in Engineering and Science Education Real time data sites
- MIT Airline Data Project
- Space – Real-Time Space Weather Data Sources
- Politics – Data on the U.S. Congress – A Joint Effort from Brookings and the American Enterprise Institute
- Sports – Open Sports Data/API
- Sports – Football (Soccer) Stats
- Government – Public Government Data Sets
- U.S. Department of Homeland Security Data
- Public Data for the State of Utah
- Finding Data on the Internet – Inside-R
- Nathan Yau’s collection of data sets
- Dr. Jerry A. Smith’s Favorite Data sets
- Hilary Mason’s “Research Quality” Data-sets
- Peter Skomoroch’s list of data sets on Delicious
- Data Wrangling blog data set list
- DonorsChoose.org – Hacking Education: A Contest for Developers and Data Crunchers
- Datasets for “The Elements of Statistical Learning”
- Enron Email Dataset
- Yandex
- The Data Page
- Public Data Sets on Amazon
- Miami School of Business Statistical Data Sets
- Public data put to good use
- ASU GeoDA Center Data
- European Cities 1M Data Sets
- University of Edinburgh School of Informatics Data Sets for Data Mining
- Opinion Mining, Sentiment Analysis, and Opinion Spam Detection
- Quandl – Intelligenct search for numerical data
- Gephi Graph Visualization Sample Data Sets
- CitiBike, by NYC Bike Share – Station data
- Large Datasets
- Air Quality Notifications
- The GDELT Project – Global Database of Events, Language, and Tone
- http://www.kdnuggets.com/datasets/index.html
- http://goo.gl/9eNqFq [more from KDNugets]
- http://archive.ics.uci.edu/ml/
- http://www.stat.ucla.edu/data/
- http://lib.stat.cmu.edu/
- http://www.umass.edu/statdata/statdata/
- http://datamarket.com/data/list/?q=provider:tsdl
- http://lib.stat.cmu.edu/DASL/
- http://www.statsci.org/data/index.html
- http://trec.nist.gov/data.html
- http://graphlab.org/resources/datasets.html
- http://www.scaleunlimited.com/datasets/public-datasets/
- http://www.datawrangling.com/some-datasets-available-on-the
- web
- http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets0405.html
- http://pami.uwaterloo.ca/~hammouda/webdata/
- http://www.daviddlewis.com/resources/testcollections/reuters21578/
- http://dumps.wikimedia.org/
- http://www.cs.cmu.edu/~WebKB/
- http://www.uco.es/~in1rosaj/utiles/datasets.html
- http://www.ke.tu-darmstadt.de/resources/eurlex/eurlex.html
- KEEL
Semoga bermanfaat,
Cheers,
</TES>®
No comments:
Post a Comment
Relevant & Respectful Comments Only.