Skip to content

Commit 6c792bc

Browse files
committed
04-data-types-and-format.md: split and fix code/output cells
1 parent 534c31b commit 6c792bc

1 file changed

Lines changed: 52 additions & 10 deletions

File tree

_episodes/04-data-types-and-format.md

Lines changed: 52 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ title: Data Types and Formats
33
teaching: 20
44
exercises: 25
55
questions:
6-
- "What types of data can be contained in a DataFrame?"
7-
- "Why is the data type important?"
6+
- "What types of data can be contained in a DataFrame?"
7+
- "Why is the data type important?"
88
objectives:
99
- "Describe how information is stored in a Python DataFrame."
1010
- "Define the two main types of data in Python: text and numerics."
@@ -32,7 +32,7 @@ numeric data, you get an error.
3232
In this lesson we will review ways to explore and better understand the
3333
structure and format of our data.
3434

35-
# Types of Data
35+
## Types of Data
3636

3737
How information is stored in a
3838
DataFrame or a Python object affects what we can do with it and the outputs of
@@ -170,26 +170,48 @@ subtraction, division and multiplication work on floats and integers as we'd exp
170170

171171
~~~
172172
print(5+5)
173+
~~~
174+
{: .language-python}
175+
176+
~~~
173177
10
178+
~~~
179+
{: .output}
174180

181+
~~~
175182
print(24-4)
176-
20
177183
~~~
178184
{: .language-python}
179185

186+
~~~
187+
20
188+
~~~
189+
{: .output}
190+
180191
If we divide one integer by another, we get a float.
181192
The result on Python 3 is different than in Python 2, where the result is an
182193
integer (integer division).
183194

184195
~~~
185196
print(5/9)
197+
~~~
198+
{: .language-python}
199+
200+
~~~
186201
0.5555555555555556
202+
~~~
203+
{: .output}
187204

205+
~~~
188206
print(10/3)
189-
3.3333333333333335
190207
~~~
191208
{: .language-python}
192209

210+
~~~
211+
3.3333333333333335
212+
~~~
213+
{: .output}
214+
193215
We can also convert a floating point number to an integer or an integer to
194216
floating point number. Notice that Python by default rounds down when it
195217
converts from floating point to integer.
@@ -198,16 +220,28 @@ converts from floating point to integer.
198220
# Convert a to an integer
199221
a = 7.83
200222
int(a)
223+
~~~
224+
{: .language-python}
225+
226+
~~~
201227
7
228+
~~~
229+
{: .output}
202230

231+
~~~
203232
# Convert b to a float
204233
b = 7
205234
float(b)
206-
7.0
207235
~~~
208236
{: .language-python}
209237

210-
# Working With Our Survey Data
238+
~~~
239+
7.0
240+
~~~
241+
{: .output}
242+
243+
244+
## Working With Our Survey Data
211245

212246
Getting back to our data, we can modify the format of values within our data, if
213247
we want. For instance, we could convert the `record_id` field to floating point
@@ -250,9 +284,14 @@ over those cells.
250284
251285
~~~
252286
surveys_df['weight'].mean()
253-
42.672428212991356
254287
~~~
255288
{: .language-python}
289+
290+
~~~
291+
42.672428212991356
292+
~~~
293+
{: .output}
294+
256295
Dealing with missing data values is always a challenge. It's sometimes hard to
257296
know why values are missing - was it because of a data entry error? Or data that
258297
someone was unable to collect? Should the value be 0? We need to know how
@@ -298,10 +337,14 @@ out or ignored.
298337
299338
~~~
300339
df1['weight'].mean()
301-
38.751976145601844
302340
~~~
303341
{: .language-python}
304342
343+
~~~
344+
38.751976145601844
345+
~~~
346+
{: .output}
347+
305348
We can fill NaN values with any value that we chose. The code below fills all
306349
NaN values with a mean for all weight values.
307350
@@ -342,7 +385,6 @@ By default, dropna removes rows that contain missing data for even just one colu
342385
343386
~~~
344387
df_na = surveys_df.dropna()
345-
346388
~~~
347389
{: .language-python}
348390

0 commit comments

Comments
 (0)