Contributing to the Tidyverse (dbplyr)
Or to describe from ‘top-down’:
Ris a computer programming language used by statisticians and others who want to interpret data.
tidyverseis a collection of software packages for the
Rlanguage which makes it easier for R users to manipulate and process data. So much easier, that
Ris now taught to liberal arts post-graduate students to analyze data e.g. for environmental studies at Harvard Extension School. These students often have no prior experience in computer programming.
tidyversewas largely the creation of a New Zealander, Hadley Wickham, and it looks like he is the chief maintainer of the
tidyverseis ‘open source’, freely available for use and modification, and contributed to by many enthusiasts in the data science community.
dplyris a software package in the
tidyversecollection which does many of the common data manipulation tasks, such as filtering, changing, sorting, summarizing and selection.
dplyrto interact with database backends.
My contributions to the free and open-source
dbplyr are (ironically) related
dbplyr operation with Microsoft SQL Server ‘MSSQL’.
In all credit to Microsoft, the basic versions of Microsoft SQL Server are freely
available, as are client libraries (for use in Linux), and Microsoft also provides
extensive freely available documentation.
As of 21st December 2020, my two accepted contributions (‘pull requests’) are:
NUMERICconverts floating point number to integers, which is not what is intended for
try_castallows more elegant handling of invalid entries.
NA(not available) in situations where
castwill return an error.
As of 21st December 2020, I also have a currently open contribution (‘pull request’) to fix an error in my second contribution.
What I really would like to say is just how friendly Hadley Wickham and others
have been in helping me contribute to and improve
Both in initial discussion and in the process of doing a ‘pull request’, Hadley and Kirrill Müller have answered the simplest of queries, amended my super-clumsy code and really encouraged me along! Hadley is an adjunct professor and something of a data science legend. I have not attended a formal computer programming class at high school, university or trade school, so I’m really humbled to feel like a valued contributor to the data science world.
(And why am I so interested in improving the operation of
It is because I use
dplyr to interrogate the Best Practice
electronic medical record patient information database with my ‘near future’
patient care quality improvement tool GPstat!.)