@@ -3,8 +3,8 @@ title: Data Types and Formats
33teaching : 20
44exercises : 25
55questions :
6- - " What types of data can be contained in a DataFrame?"
7- - " Why is the data type important?"
6+ - " What types of data can be contained in a DataFrame?"
7+ - " Why is the data type important?"
88objectives :
99 - " Describe how information is stored in a Python DataFrame."
1010 - " Define the two main types of data in Python: text and numerics."
@@ -32,7 +32,7 @@ numeric data, you get an error.
3232In this lesson we will review ways to explore and better understand the
3333structure and format of our data.
3434
35- # Types of Data
35+ ## Types of Data
3636
3737How information is stored in a
3838DataFrame or a Python object affects what we can do with it and the outputs of
@@ -170,26 +170,48 @@ subtraction, division and multiplication work on floats and integers as we'd exp
170170
171171~~~
172172print(5+5)
173+ ~~~
174+ {: .language-python}
175+
176+ ~~~
17317710
178+ ~~~
179+ {: .output}
174180
181+ ~~~
175182print(24-4)
176- 20
177183~~~
178184{: .language-python}
179185
186+ ~~~
187+ 20
188+ ~~~
189+ {: .output}
190+
180191If we divide one integer by another, we get a float.
181192The result on Python 3 is different than in Python 2, where the result is an
182193integer (integer division).
183194
184195~~~
185196print(5/9)
197+ ~~~
198+ {: .language-python}
199+
200+ ~~~
1862010.5555555555555556
202+ ~~~
203+ {: .output}
187204
205+ ~~~
188206print(10/3)
189- 3.3333333333333335
190207~~~
191208{: .language-python}
192209
210+ ~~~
211+ 3.3333333333333335
212+ ~~~
213+ {: .output}
214+
193215We can also convert a floating point number to an integer or an integer to
194216floating point number. Notice that Python by default rounds down when it
195217converts from floating point to integer.
@@ -198,16 +220,28 @@ converts from floating point to integer.
198220# Convert a to an integer
199221a = 7.83
200222int(a)
223+ ~~~
224+ {: .language-python}
225+
226+ ~~~
2012277
228+ ~~~
229+ {: .output}
202230
231+ ~~~
203232# Convert b to a float
204233b = 7
205234float(b)
206- 7.0
207235~~~
208236{: .language-python}
209237
210- # Working With Our Survey Data
238+ ~~~
239+ 7.0
240+ ~~~
241+ {: .output}
242+
243+
244+ ## Working With Our Survey Data
211245
212246Getting back to our data, we can modify the format of values within our data, if
213247we want. For instance, we could convert the ` record_id ` field to floating point
@@ -250,9 +284,14 @@ over those cells.
250284
251285~~~
252286 surveys_df[ 'weight'] .mean()
253- 42.672428212991356
254287~~~
255288{: .language-python}
289+
290+ ~~~
291+ 42.672428212991356
292+ ~~~
293+ {: .output}
294+
256295Dealing with missing data values is always a challenge. It's sometimes hard to
257296know why values are missing - was it because of a data entry error? Or data that
258297someone was unable to collect? Should the value be 0? We need to know how
@@ -298,10 +337,14 @@ out or ignored.
298337
299338~~~
300339df1[ 'weight'] .mean()
301- 38.751976145601844
302340~~~
303341{: .language-python}
304342
343+ ~~~
344+ 38.751976145601844
345+ ~~~
346+ {: .output}
347+
305348We can fill NaN values with any value that we chose. The code below fills all
306349NaN values with a mean for all weight values.
307350
@@ -342,7 +385,6 @@ By default, dropna removes rows that contain missing data for even just one colu
342385
343386~~~
344387df_na = surveys_df.dropna()
345-
346388~~~
347389{: .language-python}
348390
0 commit comments