11---
22title : Data Ingest and Visualization - Matplotlib and Pandas
3- teaching : 20
4- exercises : 25
3+ teaching : 40
4+ exercises : 65
55questions :
66 - " What other tools can I use to create plots apart from ggplot?"
77 - " Why should I use Python to create plots?"
@@ -32,7 +32,7 @@ through as well as the Python documentation to help you along.
3232There are many repositories online from which you can obtain data. We are
3333providing you with one data file to use with these exercises, but feel free to
3434use any data that is relevant to your research. The file
35- [ ` bouldercreek_09_2013.txt ` ] ({{ page.root }}/data/bouldercreek_09_2013.txt)
35+ [ ` bouldercreek_09_2013.txt ` ] ({{ page.root }}/data/bouldercreek_09_2013.txt)
3636contains stream discharge data, summarized at 15
373715 minute intervals (in cubic feet per second) for a streamgage on Boulder
3838Creek at North 75th Street (USGS gage06730200) for 1-30 September 2013. If you'd
@@ -127,25 +127,25 @@ plt.show() # not necessary in Jupyter Notebooks
127127> ~~~
128128> {: .language-python}
129129
130- The returned object is a matplotlib object (check it yourself with `type(my_plot)`),
130+ The returned object is a matplotlib object (check it yourself with `type(my_plot)`),
131131to which we may make further adjustments and refinements using other matplotlib methods.
132132
133133> ## Tip
134- > Matplotlib itself can be overwhelming, so a useful strategy is to
134+ > Matplotlib itself can be overwhelming, so a useful strategy is to
135135> do as much as you easily can in a convenience layer, _i.e._ start
136136> creating the plot in Pandas or plotnine, and then use matplotlib
137137> for the rest.
138138{: .callout}
139139
140- We will cover a few basic commands for creating and formatting plots with matplotlib in this lesson.
140+ We will cover a few basic commands for creating and formatting plots with matplotlib in this lesson.
141141A great resource for help creating and styling your figures is the matplotlib gallery
142142(<http://matplotlib.org/gallery.html>), which includes plots in many different
143- styles and the source codes that create them.
143+ styles and the source codes that create them.
144144
145145
146146### `plt` pyplot versus object-based matplotlib
147147
148- Matplotlib integrates nicely with the numpy package and can use numpy arrays
148+ Matplotlib integrates nicely with the numpy package and can use numpy arrays
149149as input of the available plot functions. Consider the following example data,
150150created with numpy:
151151
@@ -169,8 +169,8 @@ plt.plot(x, y, '-')
169169> Jupyter Notebooks make many aspects of data analysis and visualization much simpler. This includes
170170> doing some of the labor of visualizing plots for you. But, not every one of your collaborators
171171> will be using a Jupyter Notebook. The .show() command allows you to visualize plots
172- > when working at the command line, with a script, or at the iPython interpreter. In the
173- > previous example, adding `plt.show()` after the creation of the plot will enable your
172+ > when working at the command line, with a script, or at the iPython interpreter. In the
173+ > previous example, adding `plt.show()` after the creation of the plot will enable your
174174> colleagues who aren't using a Jupyter notebook to reproduce your work on their platform.
175175{: .callout}
176176
@@ -186,7 +186,7 @@ ax.plot(x, y, '-')
186186
187187Although the latter approach requires a little bit more code to create the same plot,
188188the advantage is that it gives us **full control** over the plot and we can add new items
189- such as labels, grid lines, title, etc.. For example, we can add additional axes to
189+ such as labels, grid lines, title, etc.. For example, we can add additional axes to
190190the figure and customize their labels:
191191
192192~~~
@@ -207,12 +207,12 @@ ax2.plot(x, y*2, 'r-')
207207
208208### Link matplotlib, Pandas and plotnine
209209
210- When we create a plot using pandas or plotnine, both libraries use matplotlib
211- to create those plots. The plots created in pandas or plotnine are matplotlib
212- objects, which enables us to use some of the advanced plotting options available
210+ When we create a plot using pandas or plotnine, both libraries use matplotlib
211+ to create those plots. The plots created in pandas or plotnine are matplotlib
212+ objects, which enables us to use some of the advanced plotting options available
213213in the matplotlib library. Because the objects output by pandas and plotnine
214- can be read by matplotlib, we have many more options than any one library can
215- provide, offering a consistent environment to make publication-quality visualizations.
214+ can be read by matplotlib, we have many more options than any one library can
215+ provide, offering a consistent environment to make publication-quality visualizations.
216216
217217~~~
218218fig, ax1 = plt.subplots() # prepare a matplotlib figure
@@ -232,7 +232,7 @@ To retrieve the matplotlib figure object from plotnine for customization, use th
232232
233233~~~
234234import plotnine as p9
235- myplot = (p9.ggplot(data=surveys,
235+ myplot = (p9.ggplot(data=surveys,
236236 mapping=p9.aes(x='hindfoot_length', y='weight')) +
237237 p9.geom_point())
238238
@@ -253,20 +253,20 @@ plt.show() # not necessary in Jupyter Notebooks
253253> ## Challenge - Pandas and matplotlib
254254> Load the streamgage data set with Pandas, subset the week of the 2013 Front Range flood
255255> (September 9 through 15) and create a hydrograph (line plot) of the discharge data using
256- > Pandas, linking it to an empty maptlotlib `ax` object. Adapt the title, x-axis and y-axis label
256+ > Pandas, linking it to an empty maptlotlib `ax` object. Adapt the title, x-axis and y-axis label
257257> using matplotlib.
258258>
259259> > ## Answers
260260> >
261261> > ~~~
262- > > discharge = pd.read_csv("data/bouldercreek_09_2013.txt",
263- > > skiprows=27, delimiter="\t",
262+ > > discharge = pd.read_csv("data/bouldercreek_09_2013.txt",
263+ > > skiprows=27, delimiter="\t",
264264> > names=["agency", "site_id", "datetime",
265265> > "timezone", "discharge", "discharge_cd"])
266266> > discharge["datetime"] = pd.to_datetime(discharge["datetime"])
267- > > front_range = discharge[(discharge["datetime"] >= "2013-09-09") &
267+ > > front_range = discharge[(discharge["datetime"] >= "2013-09-09") &
268268> > (discharge["datetime"] < "2013-09-15")]
269- > >
269+ > >
270270> > fig, ax = plt.subplots()
271271> > front_range.plot(x ="datetime", y="discharge", ax=ax)
272272> > ax.set_xlabel("") # no label
@@ -284,7 +284,7 @@ plt.show() # not necessary in Jupyter Notebooks
284284Once satisfied with the resulting plot, you can save the plot with the `.savefig(*args)` method from matplotlib:
285285
286286~~~
287- fig.savefig("my_plot_name.png")
287+ fig.savefig("my_plot_name.png")
288288~~~
289289{: .language-python}
290290
@@ -297,7 +297,7 @@ Which will save the `fig` created using Pandas/matplotlib as a png file with the
297297{: .callout}
298298
299299> ## Challenge - Saving figure to file
300- > Check the documentation of the `savefig` method and check how
300+ > Check the documentation of the `savefig` method and check how
301301> you can comply to journals requiring figures as `pdf` file with
302302> dpi >= 300.
303303>
@@ -321,9 +321,9 @@ save as a text file with a `.py` extension and run in the command line).
321321> ## Challenge - Final Plot
322322> Display your data using one or more plot types from the example gallery. Which
323323> ones to choose will depend on the content of your own data file. If you are
324- > using the streamgage file [`bouldercreek_09_2013.txt`]({{ page.root }}/data/bouldercreek_09_2013.txt), you could make a
325- > histogram of the number of days with a given mean discharge, use bar plots
326- > to display daily discharge statistics, or explore the different ways matplotlib
324+ > using the streamgage file [`bouldercreek_09_2013.txt`]({{ page.root }}/data/bouldercreek_09_2013.txt), you could make a
325+ > histogram of the number of days with a given mean discharge, use bar plots
326+ > to display daily discharge statistics, or explore the different ways matplotlib
327327> can handle dates and times for figures.
328328{: .challenge}
329329
0 commit comments