@@ -69,7 +69,7 @@ single animal, and the columns represent:
6969
7070The first few rows of our first file look like this:
7171
72- ```
72+ ~~~
7373record_id,month,day,year,plot_id,species_id,sex,hindfoot_length,weight
74741,7,16,1977,2,NL,M,32,
75752,7,16,1977,3,NL,M,33,
@@ -80,7 +80,8 @@ record_id,month,day,year,plot_id,species_id,sex,hindfoot_length,weight
80807,7,16,1977,2,PE,F,,
81818,7,16,1977,1,DM,M,37,
82829,7,16,1977,1,DM,F,34,
83- ```
83+ ~~~
84+ >> {: .output}
8485
8586---
8687
@@ -138,7 +139,7 @@ pd.read_csv("data/surveys.csv")
138139
139140The above command yields the ** output** below:
140141
141- ```
142+ ~~~
142143record_id month day year plot_id species_id sex hindfoot_length weight
1431440 1 7 16 1977 2 NL M 32 NaN
1441451 2 7 16 1977 3 NL M 33 NaN
@@ -153,7 +154,8 @@ record_id month day year plot_id species_id sex hindfoot_length weight
15315435548 35549 12 31 2002 5 NaN NaN NaN NaN
154155
155156[35549 rows x 9 columns]
156- ```
157+ ~~~
158+ >> {: .output}
157159
158160We can see that there were 35,549 rows parsed. Each row has 9
159161columns. The first column is the index of the DataFrame. The index is used to
@@ -239,8 +241,8 @@ easier to fit on one window, you can see that pandas has neatly formatted the da
239241our screen:
240242
241243~~~
242- >>> surveys_df.head() # The head() function displays the first several lines of a file. It
243- # is discussed below.
244+ surveys_df.head() # The head() method displays the first several lines of a file. It
245+ # is discussed below.
244246~~~
245247{: .language-python}
246248~~~
@@ -265,10 +267,13 @@ our screen:
265267Again, we can use the ` type ` function to see what kind of thing ` surveys_df ` is:
266268
267269~~~
268- >>> type(surveys_df)
269- <class 'pandas.core.frame.DataFrame'>
270+ type(surveys_df)
270271~~~
271272{: .language-python}
273+ ~~~
274+ <class 'pandas.core.frame.DataFrame'>
275+ ~~~
276+ {: .output}
272277
273278As expected, it's a DataFrame (or, to use the full name that Python uses to refer
274279to it internally, a ` pandas.core.frame.DataFrame ` ).
@@ -277,7 +282,7 @@ What kind of things does `surveys_df` contain? DataFrames have an attribute
277282called ` dtypes ` that answers this:
278283
279284~~~
280- >>> surveys_df.dtypes
285+ surveys_df.dtypes
281286~~~
282287{: .language-python}
283288~~~
@@ -352,11 +357,12 @@ surveys_df.columns
352357
353358which ** returns** :
354359
355- ```
360+ ~~~
356361Index(['record_id', 'month', 'day', 'year', 'plot_id', 'species_id', 'sex',
357362 'hindfoot_length', 'weight'],
358363 dtype='object')
359- ```
364+ ~~~
365+ >> {: .output}
360366
361367Let's get a list of all the species. The ` pd.unique ` function tells us all of
362368the unique values in the ` species_id ` column.
@@ -472,8 +478,8 @@ summary stats.
472478> 1 . How many recorded individuals are female ` F ` and how many male ` M `
473479> 2 . What happens when you group by two columns using the following syntax and
474480> then grab mean values:
475- > - `grouped_data2 = surveys_df.groupby(['plot_id','sex'])`
476- > - `grouped_data2.mean()`
481+ > - ` grouped_data2 = surveys_df.groupby(['plot_id','sex']) `
482+ > - ` grouped_data2.mean() `
477483> 3 . Summarize weight values for each site in your data. HINT: you can use the
478484> following syntax to only create summary statistics for one column in your data
479485> ` by_site['weight'].describe() `
@@ -482,18 +488,19 @@ summary stats.
482488>> ## Did you get #3 right?
483489>> ** A Snippet of the Output from challenge 3 looks like:**
484490>>
485- >> ```
486- >> site
487- >> 1 count 1903.000000
488- >> mean 51.822911
489- >> std 38.176670
490- >> min 4.000000
491- >> 25% 30.000000
492- >> 50% 44.000000
493- >> 75% 53.000000
494- >> max 231.000000
491+ >> ~~~
492+ >> site
493+ >> 1 count 1903.000000
494+ >> mean 51.822911
495+ >> std 38.176670
496+ >> min 4.000000
497+ >> 25% 30.000000
498+ >> 50% 44.000000
499+ >> 75% 53.000000
500+ >> max 231.000000
495501>> ...
496- >> ```
502+ >> ~~~
503+ >> {: .output}
497504> {: .solution}
498505{: .challenge}
499506
@@ -586,13 +593,14 @@ total_count.plot(kind='bar');
586593>
587594> shows the following data
588595>
589- > ```
596+ > ~~~
590597> one two
591598> a 1 1
592599> b 2 2
593600> c 3 3
594601> d NaN 4
595- > ```
602+ > ~~~
603+ > {: .output}
596604>
597605> We can plot the above with
598606>
@@ -617,15 +625,15 @@ total_count.plot(kind='bar');
617625>>
618626>> First we group data by site and by sex, and then calculate a total for each site.
619627>>
620- >> ```python
628+ >> ~~~
621629>> by_site_sex = surveys_df.groupby(['plot_id','sex'])
622630>> site_sex_count = by_site_sex['weight'].sum()
623631>> ~~~
624- >> {: .language-python }
632+ >> {: .language-python}
625633>>
626634>> This calculates the sums of weights for each sex within each site as a table
627635>>
628- >> ```
636+ >> ~~~
629637>> site sex
630638>> plot_id sex
631639>> 1 F 38253
@@ -637,11 +645,12 @@ total_count.plot(kind='bar');
637645>> 4 F 39796
638646>> M 49377
639647>> <other sites removed for brevity>
640- >> ```
648+ >> ~~~
649+ >> {: .output}
641650>>
642651>> Below we'll use `.unstack()` on our grouped data to figure out the total weight that each sex contributed to each site.
643652>>
644- >> ```python
653+ >> ~~~
645654>> by_site_sex = surveys_df.groupby(['plot_id','sex'])
646655>> site_sex_count = by_site_sex['weight'].sum()
647656>> site_sex_count.unstack()
@@ -650,29 +659,30 @@ total_count.plot(kind='bar');
650659>>
651660>> The `unstack` method above will display the following output:
652661>>
653- >> ```
662+ >> ~~~
654663>> sex F M
655664>> plot_id
656665>> 1 38253 59979
657666>> 2 50144 57250
658667>> 3 27251 28253
659668>> 4 39796 49377
660669>> <other sites removed for brevity>
661- >> ```
670+ >> ~~~
671+ >> {: .output}
662672>>
663673>> Now, create a stacked bar plot with that data where the weights for each sex are stacked by site.
664674>>
665675>> Rather than display it as a table, we can plot the above data by stacking the values of each sex as follows:
666676>>
667- >> ```python
677+ >> ~~~
668678>> by_site_sex = surveys_df.groupby(['plot_id','sex'])
669679>> site_sex_count = by_site_sex['weight'].sum()
670680>> spc = site_sex_count.unstack()
671681>> s_plot = spc.plot(kind='bar',stacked=True,title="Total weight by site and sex")
672682>> s_plot.set_ylabel("Weight")
673683>> s_plot.set_xlabel("Plot")
674684>> ~~~
675- >> {: .language-python }
685+ >> {: .language-python}
676686>>
677687>> 
678688> {: .solution}
0 commit comments