Code Lesson EDA-02-C
Code dari lesson ini dapat di akses di Link berikut (wajib login ke Google/Gmail): Code EDA-02-C
Di link tersebut anda langsung bisa merubah code dan menjalankannya. Keterangan lebih lanjut di video yang disertakan.
Sangat disarankan untuk membuka code dan video "side-by-side" untuk mendapatkan pengalaman belajar yang baik (Gambar dibawah). SIlahkan modifikasi (coba-coba) hal lain, selain yang ditunjukkan di video untuk mendapatkan pengalaman belajar yang lebih mendalam. Tentu saja juga silahkan akses berbagai referensi lain untuk memperkaya pengetahuan lalu diskusikan di forum yang telah disediakan.
Video EDA-02-C
tau-data Indonesia
Exploratory Data Analysis-02-C Visualizations: How should we design
https://tau-data.id/eda-02c/ ~ taufik@tau-data.id
Outline¶
- Menentukan Tujuan & Data untuk Visualisasi/Infographics.
- Visual Encodings
- Posisi
- Aplikasi: Visualisasi Interaktif
Catatan:¶
- Code & module tersedia di https://tau-data.id/eda-02c/
- Mahasiswa dipersilahkan merekam kuliah untuk kepentingan pribadi (personal). Tidak untuk di upload/reshare.
Tujuan Visualisasi¶
- Tujuan ditentukan diawal & dipengaruhi oleh motivasi kita pribadi serta kebutuhan client/user/reader terhadap visualisasi tersebut.
- tujuan visualisasi yang baik bersifat spesifik/fokus: dimensi/variabel apa yang akan diikutkan, relationship apa yang akan ditunjukkan, bagaimana antar indikator berhubungan, mengapa, dsb.
- Contoh (dasar/ge) tujuan visualisasi: monitor system, tracking (IKU/statistics), tell stories, show outliers/trends, support argumen, atau sekedar overview data (e.g. Kibana).
- Abela Chart
Guiding Questions¶
- What values or data dimensions are relevant in this context?
- Which of these dimensions matter; matter most; and matter least?
- What are the key relationships that need to be communicated?
- What properties or values would make some individual data points more interesting than the rest?
- What actions might be taken once this information need is satisfied, and what values will justify that action?
Dari sini maka pada report (di tempat kerja/perusahaan) atau bahkan karya ilmiah kita harus mampu menjelaskan tujuan dari setiap visualisasi (gambar) yang ditampilkan.¶
Catt: Related to "dangling pictures" di beberapa skripsi/karya tulis mahasiswa.
Hindari Informasi yang Terlalu Banyak Pada Sebuah Visualisasi¶
Tipe Data & Visualisasi¶
image source: https://www.qlik.com/blog/mapping-data-to-visualizations-data-attributes
Encoding Visualisasi: Warna¶
- Cleveland-McGill: https://www.jstor.org/stable/2288400
- natural Ordering: position has a natural ordering; shape doesn’t. Length has a natural ordering; texture doesn’t (but pattern density does). Line thickness or weight has a natural ordering; line style (solid, dotted, dashed) doesn’t. Depending on the specifics of the visual property, its natural ordering may be well suited to representing quantitative differences (27, 33, 41), or ordinal differences (small, medium, large, enormous).
- Warna tidak memiliki urutan, gunakan brightness/lightness untuk urutan (misal heatmap).
- Tools yang dapat digunakan untuk membantu pemilihan warna: https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3
- Kalau di Python/Seaborn: https://seaborn.pydata.org/tutorial/color_palettes.html
A very good reference for more comprehensive discussion: https://homepage.divms.uiowa.edu/~luke/classes/STAT4580/percep.html¶
Rekomendasi urutan penggunaan Warna¶
Asosiasi Warna dan persepsi¶
- Red is associated with warning, danger, and warfare. It can also be associated with passion—either love or anger—and blood. In the East it is associated with good luck and prosperity.
- Green is associated with nature, the earth, environmentalism, and renewal. It can also be associated with permission to move ahead, clearance, etc. (as in “green light”)—especially when paired with red.
- Yellow is associated with happiness, sunshine, and playfulness. However, on its own or in large fields, it can be irritating. It is also associated with caution.
- Blue is associated with water, coolness, and calm. Depending on the shade, it may be associated with religion or the military.
- Black is associated with mourning and death, but also with luxury and sophistication.
- White is associated with purity, innocence, and weddings, but also with sympathy and the afterlife (and therefore, with death).
- Pink is associated with affection, imagination, and childishness. Light pink is associated with young girls, and light blue with young boys—especially when paired together.
- Grey is associated with neutrality, conservatism, modesty, and maturity.
- Orange is associated with fire, energy, and—in the East—spirituality. It is named for the fruit, and so can also be associated with health and vigor.
- Brown is associated with dirt, leather, stone, and “earthiness.” It may also be associated with animal waste.
- Purple is associated with royalty (nobility) and magic (falsehood or artificiality).
Color Base/Space¶
in Python: https://matplotlib.org/stable/tutorials/colors/colors.html¶
image source: https://id.pinterest.com/pin/184647653459593611/
Encoding Visualisasi: Distinct Values¶
image source: Designing Data Visualizations, Representing Informational Relationships By Noah Iliinsky, Julie Steele · 2011
Encoding Visualisasi: Posisi¶
image source:
- Designing Data Visualizations, Representing Informational Relationships By Noah Iliinsky, Julie Steele · 2011
- https://towardsdatascience.com/visualizing-word-embedding-with-pca-and-t-sne-961a692509f5
Images Graphics Format¶
Interactive Visualization in Python¶
1. 3D Plotting¶
# Need additional module
!pip install chart_studio
!pip install plotly
Requirement already satisfied: chart_studio in c:\anaconda\lib\site-packages (1.1.0) Requirement already satisfied: requests in c:\anaconda\lib\site-packages (from chart_studio) (2.25.1) Requirement already satisfied: retrying>=1.3.3 in c:\anaconda\lib\site-packages (from chart_studio) (1.3.3) Requirement already satisfied: plotly in c:\anaconda\lib\site-packages (from chart_studio) (4.14.3) Requirement already satisfied: six in c:\anaconda\lib\site-packages (from chart_studio) (1.15.0) Requirement already satisfied: idna<3,>=2.5 in c:\anaconda\lib\site-packages (from requests->chart_studio) (2.10) Requirement already satisfied: chardet<5,>=3.0.2 in c:\anaconda\lib\site-packages (from requests->chart_studio) (3.0.4) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\anaconda\lib\site-packages (from requests->chart_studio) (1.26.4) Requirement already satisfied: certifi>=2017.4.17 in c:\anaconda\lib\site-packages (from requests->chart_studio) (2021.5.30) Requirement already satisfied: plotly in c:\anaconda\lib\site-packages (4.14.3) Requirement already satisfied: six in c:\anaconda\lib\site-packages (from plotly) (1.15.0) Requirement already satisfied: retrying>=1.3.3 in c:\anaconda\lib\site-packages (from plotly) (1.3.3)
%matplotlib inline
# Load Modules
import pandas as pd
import numpy as np
import plotly.graph_objs as go
from plotly.offline import plot
import chart_studio.plotly as py
from plotly.offline import download_plotlyjs, init_notebook_mode, plot,iplot
s = np.linspace(0, 2 * np.pi, 240)
t = np.linspace(0, np.pi, 240)
tGrid, sGrid = np.meshgrid(s, t)
r = 2 + np.sin(7 * sGrid + 5 * tGrid) # r = 2 + sin(7s+5t)
x = r * np.cos(sGrid) * np.sin(tGrid) # x = r*cos(s)*sin(t)
y = r * np.sin(sGrid) * np.sin(tGrid) # y = r*sin(s)*sin(t)
z = r * np.cos(tGrid) # z = r*cos(t)
surface = go.Surface(x=x, y=y, z=z)
data = [surface]
data
[Surface({ 'x': array([[ 0.00000000e+00, 5.60182438e-02, 1.18767232e-01, ..., -9.14517030e-02, -4.91275566e-02, -4.89858720e-16], [ 0.00000000e+00, 5.83930762e-02, 1.23361793e-01, ..., -9.61641644e-02, -5.15321310e-02, -5.12319396e-16], [ 0.00000000e+00, 6.07077300e-02, 1.27778987e-01, ..., -1.00933964e-01, -5.39357337e-02, -5.34493441e-16], ..., [-0.00000000e+00, -5.39357337e-02, -1.00933964e-01, ..., 1.27778987e-01, 6.07077300e-02, 5.34493441e-16], [-0.00000000e+00, -5.15321310e-02, -9.61641644e-02, ..., 1.23361793e-01, 5.83930762e-02, 5.12319396e-16], [-0.00000000e+00, -4.91275566e-02, -9.14517030e-02, ..., 1.18767232e-01, 5.60182438e-02, 4.89858720e-16]]), 'y': array([[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ..., -0.00000000e+00, -0.00000000e+00, -0.00000000e+00], [ 0.00000000e+00, 7.67605964e-04, 1.62165198e-03, ..., -1.26412566e-03, -6.77415435e-04, -6.73469271e-18], [ 0.00000000e+00, 1.59634233e-03, 3.36001703e-03, ..., -2.65411273e-03, -1.41826905e-03, -1.40547918e-17], ..., [ 0.00000000e+00, 1.41826905e-03, 2.65411273e-03, ..., -3.36001703e-03, -1.59634233e-03, -1.40547918e-17], [ 0.00000000e+00, 6.77415435e-04, 1.26412566e-03, ..., -1.62165198e-03, -7.67605964e-04, -6.73469271e-18], [ 0.00000000e+00, 6.01639050e-18, 1.11996035e-17, ..., -1.45447910e-17, -6.86025630e-18, -5.99903913e-32]]), 'z': array([[2. , 2.1303328 , 2.25675391, ..., 1.73771827, 1.86828501, 2. ], [2.09188339, 2.22083779, 2.34425991, ..., 1.82741991, 1.95989853, 2.09188339], [2.18298941, 2.30946856, 2.42882994, ..., 1.91855827, 2.05184548, 2.18298941], ..., [2.18298941, 2.05184548, 1.91855827, ..., 2.42882994, 2.30946856, 2.18298941], [2.09188339, 1.95989853, 1.82741991, ..., 2.34425991, 2.22083779, 2.09188339], [2. , 1.86828501, 1.73771827, ..., 2.25675391, 2.1303328 , 2. ]]) })]
layout = go.Layout(
title='Parametric Plot',
scene=dict(
xaxis=dict(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
),
yaxis=dict(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
),
zaxis=dict(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
)
)
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='jupyter-parametric_plot')
2. Animated Plots¶
data = [dict(
visible = False,
line=dict(color='#00CED1', width=6),
name = 'ݜȠ= '+str(step),
x = np.arange(0,10,0.01),
y = np.sin(step*np.arange(0,10,0.01))) for step in np.arange(0,5,0.1)]
data[10]['visible'] = True
steps = []
for i in range(len(data)):
step = dict(
method = 'restyle',
args = ['visible', [False] * len(data)],
)
step['args'][1][i] = True # Toggle i'th trace to "visible"
steps.append(step)
sliders = [dict(
active = 10,
currentvalue = {"prefix": "Frequency: "},
pad = {"t": 50},
steps = steps
)]
layout = dict(sliders=sliders)
fig = dict(data=data, layout=layout)
iplot(fig, filename='Sine Wave Slider')
import plotly.io as pio
import plotly.express as px
import plotly.offline as py
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species", size="sepal_length")
fig
3. Bokeh¶
from bokeh.plotting import figure, show, output_notebook
output_notebook()
import plotly.express as px
data = px.data.iris()
p = figure()
p.circle(data["sepal_width"], data["sepal_length"], fill_color=data["species"], size=data["sepal_length"])
show(p)
End of Module¶
Referensi
- Cox, V. (2017). Exploratory data analysis. In Translating Statistics to Make Decisions (pp. 47-74). Apress, Berkeley, CA.
- DuToit, S. H., Steyn, A. G. W., & Stumpf, R. H. (2012). Graphical exploratory data analysis. Springer Science & Business Media.
- Bock, H. H., & Diday, E. (Eds.). (2012). Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer Science & Business Media.
- Cleveland, W.S., 1993. Visualizing Data. Hobart Press.
- Cleveland, W.S., 1994. The elements of graphing data. Hobart Press.
- Few, S., 2009. Now you see it. Analytics Press.
- Harris, R.L., 1999. Information Graphics. Oxford University Press.
- Healy, K., 2018. Data Visualization: A Practical Introduction. Princeton University Press.
- Knaflic, C.N., 2015. Storytelling with Data. Wiley.
- Robbins, N.B., 2005. Creating More Effective Graphs. Wiley.
- Tufte, E.R., 2001. The Visual Display of Quantitative Information, 2nd ed. Cheshire, CT: Graphics Press.
- Tufte, E.R., 1997. Visual Explanations. Cheshire, CT: Graphics Press.
- Tufte, E.R., 2006. Beautiful evidence. Cheshire, CT: Graphics Press.
- Wainer, H., 2009. Picturing the Uncertain World. Princeton University Press.
- Yau, N., 2013. Data Points – Visualization that means something. Wiley.
- Huff, D. (1993). How to lie with statistics. WW Norton & Company.
- Reinhart, A. (2015). Statistics done wrong: The woefully complete guide. No starch press.
Warning, you need to login (to tau-data & Google) to view the quiz.
(Registration is free and easy: click here)
Tidak ada komentar:
Posting Komentar
Relevant & Respectful Comments Only.