@@ -8,31 +8,39 @@ permalink: /extra_challenges/
88
99A collection of challenges that have been either removed from or not (yet) added to the main lesson.
1010
11- > ## Looping Over Dataframe
11+ > ## Looping Over DataFrame
1212>
1313> (Please refer to lesson ` 06-loops-and-functions.md ` )
1414>
1515> The file ` surveys.csv ` in the ` data ` folder contains 25 years of data from surveys,
16- > starting from 1977. We can load the data and print all the years surveyed using a ` for ` loop:
16+ > starting from 1977. We can extract data corresponding to each year in this DataFrame
17+ > to individual CSV files, by using a ` for ` loop:
1718>
1819> ~~~
1920> import pandas as pd
2021>
2122> # Load the data into a DataFrame
2223> surveys_df = pd.read_csv('data/surveys.csv')
2324>
24- > # Loop through a sequence of years and print the year
25+ > # Loop through a sequence of years and export selected data
2526> start_year = 1977
2627> end_year = 2002
2728> for year in range(start_year, end_year+1):
28- > print(year)
29+ >
30+ > # Select data for the year
31+ > surveys_year = surveys_df[surveys_df.year == year]
32+ >
33+ > # Write the new DataFrame to a CSV file
34+ > filename = 'data/surveys' + str(year) + '.csv'
35+ > surveys_year.to_csv(filename)
2936> ~~~
3037> {: .language-python}
3138>
3239> What happens if there is no data for a year in a sequence? For example,
33- > imagine we used `1976` as the year in `surveys_df[surveys_df.year == year] `
40+ > imagine we used `1976` as the `start_year `
3441>
3542> > ## Solution
36- > > An empty file with only the headers
43+ > > We get the expected files for all years between 1977 and 2002,
44+ > > plus an empty `data/surveys1976.csv` file with only the headers.
3745> {: .solution}
3846{: .challenge}
0 commit comments