Skip to content

Commit 4f44253

Browse files
wrightaprilmmaxim-belkin
authored andcommitted
08-putting-it-all-together.md: rewrite to use random data and draw histograms
Pull Request: #320 + minor fixes in 07-visualization-ggplot-python.md
1 parent e38161e commit 4f44253

5 files changed

Lines changed: 54 additions & 26 deletions

File tree

_episodes/07-visualization-ggplot-python.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -293,7 +293,7 @@ group, a boxplot can be used:
293293
294294
![png](../fig/06_boxplot.png)
295295
296-
By adding points of he individual observations to the boxplot, we can have a
296+
By adding points of the individual observations to the boxplot, we can have a
297297
better idea of the number of measurements and of their distribution:
298298
299299
~~~
@@ -452,7 +452,7 @@ arranged via formula notation (`rows ~ columns`; a `.` can be used as a
452452
placeholder that indicates only one row or column).
453453
454454
~~~
455-
# only selecte the years of interest
455+
# only select the years of interest
456456
survey_2000 = surveys_complete[surveys_complete["year"].isin([2000, 2001])]
457457

458458
(p9.ggplot(data=survey_2000,

_episodes/08-putting-it-all-together.md

Lines changed: 52 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -146,25 +146,26 @@ styles and the source codes that create them.
146146
147147
### `plt` pyplot versus object-based matplotlib
148148
149-
Matplotlib integrates nicely with the Numpy package and can use Numpy arrays
150-
as input of the available plot functions. Consider the following example data,
151-
created with Numpy:
149+
Matplotlib integrates nicely with the NumPy package and can use NumPy arrays
150+
as input to the available plot functions. Consider the following example data,
151+
created with NumPy by drawing 1000 samples from a normal distribution with a mean value of 0 and
152+
a standard deviation of 0.1:
152153
153154
~~~
154155
import numpy
155-
x = numpy.linspace(0, 5, 10)
156-
y = x ** 2
156+
sample_data = numpy.random.normal(0, 0.1, 1000)
157+
157158
~~~
158159
{: .language-python}
159160
160-
To make a scatter plot of `x` and `y`, we can use the `plot` command directly:
161+
To plot a histogram of our draws from the normal distribution, we can use the `hist` function directly:
161162
162163
~~~
163-
plt.plot(x, y, '-')
164+
plt.hist(sample_data)
164165
~~~
165166
{: .language-python}
166167
167-
![Line plot of y versus x](../fig/08_line_plot.png)
168+
![Histogram of 1000 samples from normal distribution](../fig/08-normal-distribution.png)
168169
169170
> ## Tip: Cross-Platform Visualization of Figures
170171
> Jupyter Notebooks make many aspects of data analysis and visualization much simpler. This includes
@@ -175,36 +176,47 @@ plt.plot(x, y, '-')
175176
> colleagues who aren't using a Jupyter notebook to reproduce your work on their platform.
176177
{: .callout}
177178
178-
or create matplotlib `figure` and `axis` objects first and add the plot later on:
179+
or create matplotlib `figure` and `axis` objects first and subsequently add a histogram with 30
180+
data bins:
179181
180182
~~~
181183
fig, ax = plt.subplots() # initiate an empty figure and axis matplotlib object
182-
ax.plot(x, y, '-')
184+
ax.hist(sample_data, 30)
183185
~~~
184186
{: .language-python}
185187
186-
![Simple line plot](../fig/08_line_plot.png)
187-
188188
Although the latter approach requires a little bit more code to create the same plot,
189189
the advantage is that it gives us **full control** over the plot and we can add new items
190-
such as labels, grid lines, title, etc.. For example, we can add additional axes to
191-
the figure and customize their labels:
190+
such as labels, grid lines, title, and other visual elements. For example, we can add
191+
additional axes to the figure and customize their labels:
192192
193193
~~~
194194
fig, ax1 = plt.subplots() # prepare a matplotlib figure
195-
ax1.plot(x, y, '-')
195+
ax1.hist(sample_data, 30)
196196

197+
# Add a plot of a Beta distribution
198+
a = 5
199+
b = 10
200+
beta_draws = np.random.beta(a, b)
197201
# adapt the labels
198-
ax1.set_ylabel('y')
199-
ax1.set_xlabel('x')
202+
ax1.set_ylabel('density')
203+
ax1.set_xlabel('value')
200204

201205
# add additional axes to the figure
202-
ax2 = fig.add_axes([0.2, 0.5, 0.4, 0.3])
203-
ax2.plot(x, y*2, 'r-')
206+
ax2 = fig.add_axes([0.125, 0.575, 0.3, 0.3])
207+
#ax2 = fig.add_axes([left, bottom, right, top])
208+
ax2.hist(beta_draws)
204209
~~~
205210
{: .language-python}
206211
207-
![Plot with additional axes](../fig/08_line_plot_inset.png)
212+
![Plot with additional axes](../fig/08-dualdistribution.png)
213+
214+
> ## Challenge - Drawing from distributions
215+
> Have a look at the NumPy
216+
> random documentation <https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html>.
217+
> Choose a distribution you have no familiarity with, and try to sample from and visualize it.
218+
{: .challenge}
219+
208220
209221
### Link matplotlib, Pandas and plotnine
210222
@@ -253,9 +265,9 @@ plt.show() # not necessary in Jupyter Notebooks
253265
254266
> ## Challenge - Pandas and matplotlib
255267
> Load the streamgage data set with Pandas, subset the week of the 2013 Front Range flood
256-
> (September 9 through 15) and create a hydrograph (line plot) of the discharge data using
257-
> Pandas, linking it to an empty maptlotlib `ax` object. Adapt the title, x-axis and y-axis label
258-
> using matplotlib.
268+
> (September 11 through 15) and create a hydrograph (line plot) of the discharge data using
269+
> Pandas, linking it to an empty maptlotlib `ax` object. Create a second axis that displays the
270+
> whole dataset. Adapt the title and axes' labels using matplotlib.
259271
>
260272
> > ## Answers
261273
> >
@@ -273,6 +285,23 @@ plt.show() # not necessary in Jupyter Notebooks
273285
> > ax.set_xlabel("") # no label
274286
> > ax.set_ylabel("Discharge, cubic feet per second")
275287
> > ax.set_title(" Front Range flood event 2013")
288+
> > discharge = pd.read_csv("../data/bouldercreek_09_2013.txt",
289+
> > skiprows=27, delimiter="\t",
290+
> > names=["agency", "site_id", "datetime",
291+
> > "timezone", "flow_rate", "height"])
292+
> > fig, ax = plt.subplots()
293+
> > flood = discharge[(discharge["datetime"] >= "2013-09-11") &
294+
(discharge["datetime"] < "2013-09-15")]
295+
>>
296+
> > ax2 = fig.add_axes([0.65, 0.575, 0.25, 0.3])
297+
>> flood.plot(x ="datetime", y="flow_rate", ax=ax)
298+
> > discharge.plot(x ="datetime", y="flow_rate", ax=ax2)
299+
> > ax2.legend().set_visible(False)
300+
301+
> > ax.set_xlabel("") # no label
302+
> > ax.set_ylabel("Discharge, cubic feet per second")
303+
> > ax.legend().set_visible(False)
304+
> > ax.set_title(" Front Range flood event 2013")
276305
> > ~~~
277306
> > {: .language-python}
278307
> >
@@ -311,7 +340,6 @@ Which will save the `fig` created using Pandas/matplotlib as a png file with the
311340
> {: .solution}
312341
{: .challenge}
313342
314-
315343
## Make other types of plots:
316344
317345
Matplotlib can make many other types of plots in much the same way that it makes two-dimensional line plots. Look through the examples in

fig/08-dualdistribution.png

7.56 KB
Loading

fig/08-normal-distribution.png

4.32 KB
Loading

fig/08_flood_event.png

93.3 KB
Loading

0 commit comments

Comments
 (0)