Prasyarat DFDS-01
- ADSP-01 : https://tau-data.id/adsp/ (Memahami dasar Python dengan baik)
- Kalkulus Dasar
tau-data Indonesia
Pendahuluan SQL dasar untuk Data Science - 01
https://tau-data.id/dfds-01/
Instalasi:¶
- XAMPP: https://www.apachefriends.org/download.html
- IBM Watson: https://www.ibm.com/cloud/sql-query
PHPMyAdmin: Obat Sakit Kepala¶
!pip install --upgrade mysql-connector-python
Requirement already up-to-date: mysql-connector-python in c:\winpython\python-3.6.8.amd64\lib\site-packages (8.0.21) Requirement already satisfied, skipping upgrade: protobuf>=3.0.0 in c:\winpython\python-3.6.8.amd64\lib\site-packages (from mysql-connector-python) (3.12.2) Requirement already satisfied, skipping upgrade: six>=1.9 in c:\winpython\python-3.6.8.amd64\lib\site-packages (from protobuf>=3.0.0->mysql-connector-python) (1.12.0) Requirement already satisfied, skipping upgrade: setuptools in c:\winpython\python-3.6.8.amd64\lib\site-packages (from protobuf>=3.0.0->mysql-connector-python) (49.2.0)
import mysql.connector as mysql
import pandas as pd
from time import time
Koneksi ke MySQL lokal¶
par = {'db_': 'fga', 'usr':'root', 'pas':'', 'hst':'localhost'}
par['db_'], par['hst']
('fga', 'localhost')
def conMql(dbPar, maxTry=3):
try_ = 0
while try_<maxTry:
try:
return mysql.connect(host=dbPar['hst'],user=dbPar['usr'],passwd=dbPar['pas'],db=dbPar['db_'])
except (mysql.Error) as e:
print ("Error Connecting to MySQL %d: %s, please wait retrying" % (e.args[0],e.args[1]))
try_ += 1; time.sleep(1)
db = conMql(par)
db
<mysql.connector.connection.MySQLConnection at 0x2984b284f98>
Catatan:¶
- Buka PHPMyAdmin ==> Generate Query untuk membantu pemula di Query Dasar
- "Bisa karena terbiasa" ==> jangan hawatir untuk "mengingat"
qry = "CREATE TABLE `mahasiswa` ( `nim` VARCHAR(12) NOT NULL COMMENT 'Nomer Induk Mahasiswa' , \
`nama_lengkap` VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL COMMENT \
'Nama Lengkap' , `jenis_kelamin` VARCHAR(1) CHARACTER SET ascii COLLATE ascii_general_ci \
NOT NULL COMMENT 'Jenis Kelamin P atau W' , `angkatan` YEAR NOT NULL COMMENT \
'Tahun Masuk UIN Jakarta' , PRIMARY KEY (`nim`)) \
ENGINE = MyISAM CHARSET=utf8 COLLATE utf8_general_ci COMMENT = 'Tabel Mahasiswa UIN Jakarta';"
db = conMql(par)
cur = db.cursor()
res = cur.execute(qry)
cur.close()
db.close()
res
qry = "SHOW TABLES"
db = conMql(par)
cur = db.cursor()
cur.execute(qry)
data = cur.fetchall() # lakukan Pagination untuk data yang sangat besar
cur.close(); db.close()
data
[('mahasiswa',)]
qry = "INSERT INTO `mahasiswa` (`nim`, `nama_lengkap`, `angkatan`, `jenis_kelamin`) \
VALUES ('1234', 'bambang', '2016', 'P'), ('1235', 'Wati', '2017', 'W'), \
('1239', 'Iwan', '2017', 'P');"
db = conMql(par)
cur = db.cursor()
result = cur.execute(qry)
cur.close()
db.close()
qry = "SELECT * FROM mahasiswa"
db = conMql(par)
cur = db.cursor()
cur.execute(qry)
data = cur.fetchall() # lakukan Pagination untuk data yang sangat besar
cur.close(); db.close()
data
[('1234', 'bambang', 'P', 2016), ('1235', 'Wati', 'W', 2017), ('1239', 'Iwan', 'P', 2017)]
More Practical Way untuk Data Scientist/Analyst: Use Pandas¶
qry = "SELECT * FROM mahasiswa"
db = conMql(par)
data = pd.read_sql(qry, db)
data.head()
nim | nama_lengkap | jenis_kelamin | angkatan | |
---|---|---|---|---|
0 | 1234 | bambang | P | 2016 |
1 | 1235 | Wati | W | 2017 |
2 | 1239 | Iwan | P | 2017 |
Contoh Data¶
# Create Table 2
qry = "CREATE TABLE committees (committee_id INT AUTO_INCREMENT, name VARCHAR(100), PRIMARY KEY (committee_id));"
db = conMql(par)
cur = db.cursor()
res = cur.execute(qry)
cur.close(); db.close()
'Done'
'Done'
# Create Table 1
qry = "CREATE TABLE members (member_id INT AUTO_INCREMENT, name VARCHAR(100), PRIMARY KEY (member_id))"
db = conMql(par)
cur = db.cursor()
res = cur.execute(qry)
cur.close(); db.close()
'Done'
'Done'
# Insert Data 2
qry = "INSERT INTO committees (name) VALUES ('John'), ('Mary'), ('Amelia'), ('Joe')"
db = conMql(par)
cur = db.cursor()
res = cur.execute(qry)
db.commit() # Hati-hati ... Butuh tambahan perintah ini!!!
cur.close(); db.close()
'Done'
'Done'
# Insert Data 1
qry = "INSERT INTO members (name) VALUES('John'),('Jane'),('Mary'),('David'),('Amelia');"
db = conMql(par)
cur = db.cursor()
res = cur.execute(qry)
db.commit() # Hati-hati ... Butuh tambahan perintah ini!!!
cur.close(); db.close()
'Done'
'Done'
# Query to DataFrame
db = conMql(par)
query = "SELECT * FROM committees" # or members
data = pd.read_sql(query, db)
data.head()
committee_id | name | |
---|---|---|
0 | 1 | John |
1 | 2 | Mary |
2 | 3 | Amelia |
3 | 4 | Joe |
# order by
db = conMql(par)
query = "SELECT * FROM committees ORDER BY name"
data = pd.read_sql(query, db)
data.head()
committee_id | name | |
---|---|---|
0 | 3 | Amelia |
1 | 4 | Joe |
2 | 1 | John |
3 | 2 | Mary |
# add new names
qry = "INSERT INTO committees (name) VALUES('John'),('Mary'),('Amelia'),('Bambang');"
db = conMql(par)
cur = db.cursor()
res = cur.execute(qry)
db.commit() # Hati-hati ... Butuh tambahan perintah ini!!!
cur.close(); db.close()
# order by
db = conMql(par)
query = "SELECT * FROM committees ORDER BY name"
data = pd.read_sql(query, db)
data.head(10)
committee_id | name | |
---|---|---|
0 | 3 | Amelia |
1 | 7 | Amelia |
2 | 8 | Bambang |
3 | 4 | Joe |
4 | 1 | John |
5 | 5 | John |
6 | 2 | Mary |
7 | 6 | Mary |
# group by
db = conMql(par)
query = "SELECT * FROM committees GROUP BY name" # or members
data = pd.read_sql(query, db)
data.head(10)
committee_id | name | |
---|---|---|
0 | 3 | Amelia |
1 | 8 | Bambang |
2 | 4 | Joe |
3 | 1 | John |
4 | 2 | Mary |
# Select Distinct
db = conMql(par)
query = "SELECT DISTINCT(name) FROM committees" # or members
data = pd.read_sql(query, db)
data.head(10)
name | |
---|---|
0 | John |
1 | Mary |
2 | Amelia |
3 | Joe |
4 | Bambang |
# WHERE clause
db = conMql(par)
query = "SELECT * FROM committees WHERE committee_id>4" # or members
data = pd.read_sql(query, db)
data.head(10)
committee_id | name | |
---|---|---|
0 | 5 | John |
1 | 6 | Mary |
2 | 7 | Amelia |
3 | 8 | Bambang |
Search with index - Magick command¶
%lsmagic
Available line magics: %alias %alias_magic %autoawait %autocall %automagic %autosave %bookmark %cd %clear %cls %colors %conda %config %connect_info %copy %ddir %debug %dhist %dirs %doctest_mode %echo %ed %edit %env %gui %hist %history %killbgscripts %ldir %less %load %load_ext %loadpy %logoff %logon %logstart %logstate %logstop %ls %lsmagic %macro %magic %matplotlib %mkdir %more %notebook %page %pastebin %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %pip %popd %pprint %precision %prun %psearch %psource %pushd %pwd %pycat %pylab %qtconsole %quickref %recall %rehashx %reload_ext %ren %rep %rerun %reset %reset_selective %rmdir %run %save %sc %set_env %store %sx %system %tb %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode Available cell magics: %%! %%HTML %%SVG %%bash %%capture %%cmd %%debug %%file %%html %%javascript %%js %%latex %%markdown %%perl %%prun %%pypy %%python %%python2 %%python3 %%ruby %%script %%sh %%svg %%sx %%system %%time %%timeit %%writefile Automagic is ON, % prefix IS NOT needed for line magics.
%%timeit
db = conMql(par)
query = "SELECT * FROM committees WHERE name='John'" # or members
data = pd.read_sql(query, db)
4.17 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Hati-hati var "data" tidak disimpan ke memori akibat penggunaan Magic Command "timeit"
Sekarang kita coba tambahkan index di field "name"¶
qry = "ALTER TABLE committees ADD FULLTEXT name (name);"
db = conMql(par)
cur = db.cursor()
res = cur.execute(qry)
cur.close();db.close()
'Done'
'Done'
Sekarang kita coba Query lagi dan analisa performanya¶
- Catt: pada data yang kecil mungkin tidak signifikan, tapi pada data yang besar sangat besar pengaruhnya.
%%timeit
db = conMql(par)
query = "SELECT * FROM committees WHERE name LIKE '%ohn%'" # or members
data = pd.read_sql(query, db)
4.13 ms ± 69.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
data.head()
committee_id | name | |
---|---|---|
0 | 5 | John |
1 | 6 | Mary |
2 | 7 | Amelia |
3 | 8 | Bambang |
SQL Joins¶
Perintah JOIN dalam SQL digunakan untuk menampilkan data pada table yang saling berhubungan atau berelasi. Artinya kita dapat menampilkan data dalam beberapa table hanya dengan satu kali perintah.
# Query to DataFrame Kita lihat dulu datanya
db = conMql(par)
dt1 = pd.read_sql("SELECT * FROM committees", db)
dt2 = pd.read_sql("SELECT * FROM members", db)
pd.concat([dt1.reset_index(drop=1),dt2.reset_index(drop=1)], axis=1)
committee_id | name | member_id | name | |
---|---|---|---|---|
0 | 1 | John | 1.0 | John |
1 | 2 | Mary | 2.0 | Jane |
2 | 3 | Amelia | 3.0 | Mary |
3 | 4 | Joe | 4.0 | David |
4 | 5 | John | 5.0 | Amelia |
5 | 6 | Mary | NaN | NaN |
6 | 7 | Amelia | NaN | NaN |
7 | 8 | Bambang | NaN | NaN |
db = conMql(par)
query = "SELECT m.member_id, m.name member, c.committee_id, c.name committee \
FROM members m \
INNER JOIN committees c \
ON c.name = m.name"
data = pd.read_sql(query, db)
data.head(10)
member_id | member | committee_id | committee | |
---|---|---|---|---|
0 | 1 | John | 1 | John |
1 | 3 | Mary | 2 | Mary |
2 | 5 | Amelia | 3 | Amelia |
3 | 1 | John | 5 | John |
4 | 3 | Mary | 6 | Mary |
5 | 5 | Amelia | 7 | Amelia |
# Query to DataFrame Kita lihat dulu datanya
db = conMql(par)
dt1 = pd.read_sql("SELECT * FROM committees", db)
dt2 = pd.read_sql("SELECT * FROM members", db)
pd.concat([dt1.reset_index(drop=1),dt2.reset_index(drop=1)], axis=1)
committee_id | name | member_id | name | |
---|---|---|---|---|
0 | 1 | John | 1.0 | John |
1 | 2 | Mary | 2.0 | Jane |
2 | 3 | Amelia | 3.0 | Mary |
3 | 4 | Joe | 4.0 | David |
4 | 5 | John | 5.0 | Amelia |
5 | 6 | Mary | NaN | NaN |
6 | 7 | Amelia | NaN | NaN |
7 | 8 | Bambang | NaN | NaN |
query = "SELECT m.member_id, m.name member, c.committee_id, c.name committee\
FROM members m\
LEFT JOIN committees c USING(name)"
db = conMql(par)
data = pd.read_sql(query, db)
data.head(10)
member_id | member | committee_id | committee | |
---|---|---|---|---|
0 | 1 | John | 1.0 | John |
1 | 3 | Mary | 2.0 | Mary |
2 | 5 | Amelia | 3.0 | Amelia |
3 | 1 | John | 5.0 | John |
4 | 3 | Mary | 6.0 | Mary |
5 | 5 | Amelia | 7.0 | Amelia |
6 | 2 | Jane | NaN | None |
7 | 4 | David | NaN | None |
Bisa juga¶
- Catt hati-hati performa query
- ini left join
query = "SELECT member_id, name\
FROM members \
WHERE name IN\
(SELECT name FROM committees)"
db = conMql(par)
data = pd.read_sql(query, db)
data.head(10)
member_id | name | |
---|---|---|
0 | 1 | John |
1 | 3 | Mary |
2 | 5 | Amelia |
# Query to DataFrame Kita lihat dulu datanya
db = conMql(par)
dt1 = pd.read_sql("SELECT * FROM committees", db)
dt2 = pd.read_sql("SELECT * FROM members", db)
pd.concat([dt1.reset_index(drop=1),dt2.reset_index(drop=1)], axis=1)
committee_id | name | member_id | name | |
---|---|---|---|---|
0 | 1 | John | 1.0 | John |
1 | 2 | Mary | 2.0 | Jane |
2 | 3 | Amelia | 3.0 | Mary |
3 | 4 | Joe | 4.0 | David |
4 | 5 | John | 5.0 | Amelia |
5 | 6 | Mary | NaN | NaN |
6 | 7 | Amelia | NaN | NaN |
7 | 8 | Bambang | NaN | NaN |
query = "SELECT m.member_id, m.name member, c.committee_id, c.name committee\
FROM members m\
CROSS JOIN committees c"
db = conMql(par)
data = pd.read_sql(query, db)
data.head(30)
member_id | member | committee_id | committee | |
---|---|---|---|---|
0 | 1 | John | 1 | John |
1 | 2 | Jane | 1 | John |
2 | 3 | Mary | 1 | John |
3 | 4 | David | 1 | John |
4 | 5 | Amelia | 1 | John |
5 | 1 | John | 2 | Mary |
6 | 2 | Jane | 2 | Mary |
7 | 3 | Mary | 2 | Mary |
8 | 4 | David | 2 | Mary |
9 | 5 | Amelia | 2 | Mary |
10 | 1 | John | 3 | Amelia |
11 | 2 | Jane | 3 | Amelia |
12 | 3 | Mary | 3 | Amelia |
13 | 4 | David | 3 | Amelia |
14 | 5 | Amelia | 3 | Amelia |
15 | 1 | John | 4 | Joe |
16 | 2 | Jane | 4 | Joe |
17 | 3 | Mary | 4 | Joe |
18 | 4 | David | 4 | Joe |
19 | 5 | Amelia | 4 | Joe |
20 | 1 | John | 5 | John |
21 | 2 | Jane | 5 | John |
22 | 3 | Mary | 5 | John |
23 | 4 | David | 5 | John |
24 | 5 | Amelia | 5 | John |
25 | 1 | John | 6 | Mary |
26 | 2 | Jane | 6 | Mary |
27 | 3 | Mary | 6 | Mary |
28 | 4 | David | 6 | Mary |
29 | 5 | Amelia | 6 | Mary |
End of Module¶
Code Lesson DFDS-01
Code dari lesson ini dapat di akses di Link berikut (wajib login ke Google/Gmail): Code DFDS-01
Di link tersebut anda langsung bisa merubah code dan menjalankannya. Keterangan lebih lanjut di Video DFDS-01.
Sangat disarankan untuk membuka code dan video "side-by-side" untuk mendapatkan pengalaman belajar yang baik (Gambar dibawah). SIlahkan modifikasi (coba-coba) hal lain, selain yang ditunjukkan di video untuk mendapatkan pengalaman belajar yang lebih mendalam. Tentu saja juga silahkan akses berbagai referensi lain untuk memperkaya pengetahuan lalu diskusikan di forum yang telah disediakan.
Video Lesson DFDS-01 (Coming Soon)
Referensi DFDS:
- SQL Notes for Professionals, 2019, GoalKickers
- Silberschatz, A., Korth, H. F., & Sudarshan, S. (1997). Database system concepts(Vol. 4). New York: McGraw-Hill.
- DeBarros, A. (2018). Practical SQL: A Beginner’s Guide to Storytelling with Data. No Starch Press.
No comments:
Post a Comment
Relevant & Respectful Comments Only.