Skip to content

Commit afea9db

Browse files
committed
Episode 02: fix python code blocks
1 parent 939dd80 commit afea9db

1 file changed

Lines changed: 45 additions & 35 deletions

File tree

_episodes/02-starting-with-data.md

Lines changed: 45 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ single animal, and the columns represent:
6969

7070
The first few rows of our first file look like this:
7171

72-
```
72+
~~~
7373
record_id,month,day,year,plot_id,species_id,sex,hindfoot_length,weight
7474
1,7,16,1977,2,NL,M,32,
7575
2,7,16,1977,3,NL,M,33,
@@ -80,7 +80,8 @@ record_id,month,day,year,plot_id,species_id,sex,hindfoot_length,weight
8080
7,7,16,1977,2,PE,F,,
8181
8,7,16,1977,1,DM,M,37,
8282
9,7,16,1977,1,DM,F,34,
83-
```
83+
~~~
84+
>> {: .output}
8485
8586
---
8687

@@ -138,7 +139,7 @@ pd.read_csv("data/surveys.csv")
138139

139140
The above command yields the **output** below:
140141

141-
```
142+
~~~
142143
record_id month day year plot_id species_id sex hindfoot_length weight
143144
0 1 7 16 1977 2 NL M 32 NaN
144145
1 2 7 16 1977 3 NL M 33 NaN
@@ -153,7 +154,8 @@ record_id month day year plot_id species_id sex hindfoot_length weight
153154
35548 35549 12 31 2002 5 NaN NaN NaN NaN
154155
155156
[35549 rows x 9 columns]
156-
```
157+
~~~
158+
>> {: .output}
157159
158160
We can see that there were 35,549 rows parsed. Each row has 9
159161
columns. The first column is the index of the DataFrame. The index is used to
@@ -239,8 +241,8 @@ easier to fit on one window, you can see that pandas has neatly formatted the da
239241
our screen:
240242

241243
~~~
242-
>>> surveys_df.head() # The head() function displays the first several lines of a file. It
243-
# is discussed below.
244+
surveys_df.head() # The head() method displays the first several lines of a file. It
245+
# is discussed below.
244246
~~~
245247
{: .language-python}
246248
~~~
@@ -265,10 +267,13 @@ our screen:
265267
Again, we can use the `type` function to see what kind of thing `surveys_df` is:
266268

267269
~~~
268-
>>> type(surveys_df)
269-
<class 'pandas.core.frame.DataFrame'>
270+
type(surveys_df)
270271
~~~
271272
{: .language-python}
273+
~~~
274+
<class 'pandas.core.frame.DataFrame'>
275+
~~~
276+
{: .output}
272277

273278
As expected, it's a DataFrame (or, to use the full name that Python uses to refer
274279
to it internally, a `pandas.core.frame.DataFrame`).
@@ -277,7 +282,7 @@ What kind of things does `surveys_df` contain? DataFrames have an attribute
277282
called `dtypes` that answers this:
278283

279284
~~~
280-
>>> surveys_df.dtypes
285+
surveys_df.dtypes
281286
~~~
282287
{: .language-python}
283288
~~~
@@ -352,11 +357,12 @@ surveys_df.columns
352357

353358
which **returns**:
354359

355-
```
360+
~~~
356361
Index(['record_id', 'month', 'day', 'year', 'plot_id', 'species_id', 'sex',
357362
'hindfoot_length', 'weight'],
358363
dtype='object')
359-
```
364+
~~~
365+
>> {: .output}
360366
361367
Let's get a list of all the species. The `pd.unique` function tells us all of
362368
the unique values in the `species_id` column.
@@ -472,8 +478,8 @@ summary stats.
472478
> 1. How many recorded individuals are female `F` and how many male `M`
473479
> 2. What happens when you group by two columns using the following syntax and
474480
> then grab mean values:
475-
> - `grouped_data2 = surveys_df.groupby(['plot_id','sex'])`
476-
> - `grouped_data2.mean()`
481+
> - `grouped_data2 = surveys_df.groupby(['plot_id','sex'])`
482+
> - `grouped_data2.mean()`
477483
> 3. Summarize weight values for each site in your data. HINT: you can use the
478484
> following syntax to only create summary statistics for one column in your data
479485
> `by_site['weight'].describe()`
@@ -482,18 +488,19 @@ summary stats.
482488
>> ## Did you get #3 right?
483489
>> **A Snippet of the Output from challenge 3 looks like:**
484490
>>
485-
>> ```
486-
>> site
487-
>> 1 count 1903.000000
488-
>> mean 51.822911
489-
>> std 38.176670
490-
>> min 4.000000
491-
>> 25% 30.000000
492-
>> 50% 44.000000
493-
>> 75% 53.000000
494-
>> max 231.000000
491+
>> ~~~
492+
>> site
493+
>> 1 count 1903.000000
494+
>> mean 51.822911
495+
>> std 38.176670
496+
>> min 4.000000
497+
>> 25% 30.000000
498+
>> 50% 44.000000
499+
>> 75% 53.000000
500+
>> max 231.000000
495501
>> ...
496-
>> ```
502+
>> ~~~
503+
>> {: .output}
497504
> {: .solution}
498505
{: .challenge}
499506
@@ -586,13 +593,14 @@ total_count.plot(kind='bar');
586593
>
587594
> shows the following data
588595
>
589-
> ```
596+
> ~~~
590597
> one two
591598
> a 1 1
592599
> b 2 2
593600
> c 3 3
594601
> d NaN 4
595-
> ```
602+
> ~~~
603+
> {: .output}
596604
>
597605
> We can plot the above with
598606
>
@@ -617,15 +625,15 @@ total_count.plot(kind='bar');
617625
>>
618626
>> First we group data by site and by sex, and then calculate a total for each site.
619627
>>
620-
>> ```python
628+
>> ~~~
621629
>> by_site_sex = surveys_df.groupby(['plot_id','sex'])
622630
>> site_sex_count = by_site_sex['weight'].sum()
623631
>> ~~~
624-
>> {: .language-python }
632+
>> {: .language-python}
625633
>>
626634
>> This calculates the sums of weights for each sex within each site as a table
627635
>>
628-
>> ```
636+
>> ~~~
629637
>> site sex
630638
>> plot_id sex
631639
>> 1 F 38253
@@ -637,11 +645,12 @@ total_count.plot(kind='bar');
637645
>> 4 F 39796
638646
>> M 49377
639647
>> <other sites removed for brevity>
640-
>> ```
648+
>> ~~~
649+
>> {: .output}
641650
>>
642651
>> Below we'll use `.unstack()` on our grouped data to figure out the total weight that each sex contributed to each site.
643652
>>
644-
>> ```python
653+
>> ~~~
645654
>> by_site_sex = surveys_df.groupby(['plot_id','sex'])
646655
>> site_sex_count = by_site_sex['weight'].sum()
647656
>> site_sex_count.unstack()
@@ -650,29 +659,30 @@ total_count.plot(kind='bar');
650659
>>
651660
>> The `unstack` method above will display the following output:
652661
>>
653-
>> ```
662+
>> ~~~
654663
>> sex F M
655664
>> plot_id
656665
>> 1 38253 59979
657666
>> 2 50144 57250
658667
>> 3 27251 28253
659668
>> 4 39796 49377
660669
>> <other sites removed for brevity>
661-
>> ```
670+
>> ~~~
671+
>> {: .output}
662672
>>
663673
>> Now, create a stacked bar plot with that data where the weights for each sex are stacked by site.
664674
>>
665675
>> Rather than display it as a table, we can plot the above data by stacking the values of each sex as follows:
666676
>>
667-
>> ```python
677+
>> ~~~
668678
>> by_site_sex = surveys_df.groupby(['plot_id','sex'])
669679
>> site_sex_count = by_site_sex['weight'].sum()
670680
>> spc = site_sex_count.unstack()
671681
>> s_plot = spc.plot(kind='bar',stacked=True,title="Total weight by site and sex")
672682
>> s_plot.set_ylabel("Weight")
673683
>> s_plot.set_xlabel("Plot")
674684
>> ~~~
675-
>> {: .language-python }
685+
>> {: .language-python}
676686
>>
677687
>> ![Stacked Bar Plot](../fig/stackedBar.png)
678688
> {: .solution}

0 commit comments

Comments
 (0)