Skip to content

Data science at two-year colleges

StatPREP leaders Kate Kozak (Coconino Community College and AMATYC), Doug Ensley (MAA), and Danny Kaplan (Macalester College), represented StatPREP at last week’s “Two-Year College Data Science Summit”. The purpose of the summit was to lay out a guide to action for two-year colleges interested in developing AA, AAS, or certificate programs in data science.

There are both significant challenges and opportunities for two-year colleges in data science. For the two-year college instructor pressed with high enrollments and working to develop greater skills in teaching introductory statistics, it may well seem a daunting task to take on the larger field of data science. StatPREP’s approach is one step at a time, helping instructors first to become comfortable engaging contemporary, real-world data in their statistics teaching and then, as appropriate, developing greater technical skills with data computing, visualization, and statistical interpretation and modeling.

Working groups at the Two-Year College Data Science Summit looked at many aspects of what’s needed for two-year college students to be able to gain employment in data science: these include employability, collaboration, and communication skills, as well as effective knowledge of data security, privacy, and ethics. Building on previous work by the Oceans of Data Institute groups also identified core technical skills that form the foundation for employment in data science.

Here are the core technical skills identified by one of the working groups. We’re writing about them to show the alignment between them and the tools that StatPREP employs in it’s workshops and teaching materials.

  • Programming in R or Python. StatPREP materials are all based in R. The StatPREP tutorials directly develop skills in using R commands, and even the StatPREP Little Apps use R as the back-end.
  • Database querying and wrangling. StatPREP uses the dplyr R query system which directly interoperates with industry-standard “structured query language” (SQL).
  • Machine learning techniques for modeling. StatPREP uses the mosaicModel R framework which unifies modeling from simple regression to modern techniques such as “random forest.”
  • Visualization technologies. StatPREP uses the “Grammar of graphics” approach. The ggformula (gg for “grammar of graphics”) provides immediate access to professional-level graphics as provided by the hugely popular ggplot2 system.
  • Collaboration support. This includes both the industry-standard git and GitHub collaboration platforms and more broadly familiar collaborative editing as with Google docs and spreadsheets. StatPREP is using this very technology, and helps participants get started and make productive use of it.
  • Application programming interface basics such as RESTful APIs and JSON. StatPREP is … well … not emphasizing programming at this level. You can’t have everything.

StatPREP is primarily about about teaching data-centric statistics, not technology. StatPREP participants can decide for themselves the extent to which they want to dive into the sorts of core technical skills listed above. Still, be assured that should you decide to develop data-science skills, StatPREP professional development is right in tune with that needed for data science.

No comments yet

Leave a Reply