I’ve been a proponent of homework my entire teaching career. Actually sitting down and solving a problem on your own will lead you to discover intricacies in methods that a teacher cannot always fully convey. However, since I personally haven’t done homework in a while, I completely forgot about this neat little factoid. That is, until I decided to write a function in R.
Recently, I did a survey about board games on Reddit. I used Google Forms, which has different types of questions. One type of questions you can use is the “check marks” questions, where you check every option that applies out of a list. Well, Google Forms generates a CSV file for your results. In that file, a “check mark” question is treated as a single column that has all the options a person selected separated by commas, like such:
Option A, Option B, Option C
Option A, Option C
Option A, Option B, Option D, Option E
etc
In order to analyze these results, I would need each option in its own column, with the entries being 1 for that option being selected or 0 for that option not being selected. So the column above would turn into:
Option A Option B Option C Option D Option E
1 1 1 0 0
1 0 1 0 0
1 1 0 1 1
I had been thinking about how to accomplish this for a little while. From the courses on DataCamp, I knew there was a way to turn a column of single entries into several columns with zeroes and ones, and there was also a way to separate a column into multiple columns on comma separated entries. Sure, if I messed with those two options enough, I could probably figure it out after a little while. However, I decided to give it a shot writing my own function for my specific purpose. I thought about it for about a day and followed some classic advice from some very good programmers, and wrote a working prototype in about 30 minutes. It didn’t have a whole lot of options and it used nested loops (a pretty deadly sin in R) but it did what it was set out to do.
The next day, I took a dive into the ~apply family of functions in R and managed to take out both for loops in my function. I also added some extra functionality like giving the user the option to select a separator other than a comma, and whether to delete the original column.
Lately I’ve been in the process of brainstorming a bit more functionality. I am considering letting the user choose where to insert the new columns (in place of the original or at the very end of the data frame) among a couple of other options. The function could also use some improved readability and commenting. I’ll probably use it to figure out GitHub later and make a separate post about that 🙂
Writing this function taught me a few lessons that reading someone else’s code, or listening to a lecture would just never drive the point home. For example, I know very clearly understand the difference between using single or double bracket sub-setting on data frames. It’s just not something I would remember if someone told me, unless I got to experience it on my own through a simple homework assignment.
So it seems like the process is fairly simple! Come up with a process that works, no matter how inefficient, then improve efficiency and add options. I guess that applies to any solution to a problem though, doesn’t it?
Vazgen Zakaryan