You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the students have trouble generating the output, or anything happens with that, there is a file
704
704
called "sample output" that contains the data file they should have generated in lesson 3.
705
+
Answers are embedded with challenges in this lesson.
706
+
707
+
Note `plotnine` contains a *lot* of deprecation warnings in some versions of python/matplotlib, warnings may need to be supressed with
708
+
~~~
709
+
import warnings
710
+
warnings.filterwarnings(action='once')
711
+
~~~
712
+
{: .language-python}
705
713
706
714
iPython notebooks for plotting can be viewed in the `_extras` folder
707
715
708
716
## 08-putting-it-all-together
709
717
718
+
Answers are embedded with challenges in this lesson, other than random distribtuion which is left to the learner to choose, and final plot, for which the learner should investigate the matplotlib gallery.
719
+
710
720
Scientists often operate on mathematical equations. Being able to use them in their graphics has a
711
721
lot of added value. Luckily, Matplotlib provides powerful tools for text control. One of them is the
712
722
ability to use LaTeX mathematical notation, whenever text is used (you can learn more about LaTeX
@@ -740,3 +750,101 @@ plt.show()
740
750
741
751
{% include links.md %}
742
752
753
+
## 09-working-with-sql
754
+
755
+
### Challenge - SQL
756
+
757
+
* Create a query that contains survey data collected between 1998 - 2001 for observations of sex “male” or “female” that includes observation’s genus and species and site type for the sample. How many records are returned?
758
+
759
+
~~~
760
+
#Connect to the database
761
+
con = sqlite3.connect("data/portal_mammals.sqlite")
762
+
763
+
cur = con.cursor()
764
+
765
+
# Return all results of query: year, plot type (site type), genus, species and sex
766
+
# from the join of the tables surveys, plots and species, for the years 1998-2001 where sex is 'M' or 'F'.
FROM surveys INNER JOIN plots ON surveys.plot_id = plots.plot_id INNER JOIN species ON \
769
+
surveys.species_id = species.species_id WHERE surveys.year>=1998 AND surveys.year<=2001 \
770
+
AND ( surveys.sex = "M" OR surveys.sex = "F")')
771
+
772
+
print(len(cur.fetchall()))
773
+
774
+
# Close the connection
775
+
con.close()
776
+
~~~
777
+
{: .language-python}
778
+
~~~
779
+
5546
780
+
~~~
781
+
{: .output}
782
+
783
+
Answer: 5546 records are found.
784
+
785
+
* Create a dataframe that contains the total number of observations (count) made for all years, and sum of observation weights for each site, ordered by site ID.
786
+
787
+
This question is a little ambiguous but we could e.g. do two SQL queries into dataframes, then pivot the second and merge them to create a table of observation count and plot total weight per year. The PIVOT operation could alternatively be performed in SQL.
788
+
789
+
~~~
790
+
import pandas as pd
791
+
import sqlite3
792
+
793
+
# Create two sqlite queries results, read as pandas DataFrame
794
+
# Include 'year' in both queries so we have something to merge (join) on.
795
+
con = sqlite3.connect("data/portal_mammals.sqlite")
796
+
df1 = pd.read_sql_query("SELECT year,COUNT(*) FROM surveys GROUP BY year", con)
797
+
df2 = pd.read_sql_query("SELECT year,plot_id,SUM(weight) FROM surveys GROUP BY \
798
+
year,plot_id ORDER BY plot_id ASC",con)
799
+
800
+
# Turn the plot_id column values into column names by pivoting
* What are some of the reasons you might want to save the results of your queries back into the database? What are some of the reasons you might avoid doing this?
848
+
849
+
If the database is shared with others and common queries (and potentially data corrections) are likely to be required by many it may be efficient for one person to perform the work and save it back to the database as a new table so others can access the results directly instead of performing the query themselves, particularly if it is complex.
850
+
However, we might avoid doing this if the database is an authoritative source (potentially version controlled) which should not be modified by users. Instead, we might save the qeury results to a new database that is more appropriate for downstream work.
0 commit comments