It’s been a while..

Well, I had promised myself that I would try to blog every couple of weeks, but life gets in the way, and here’s the usual “sorry I haven’t updated in a while” post.

To be fair, I have a good amount of excuses; I’ve been keeping quite busy with Shiny app development. So far I’ve developed four complete applications. Here’s a list!

  1. World Explor-R: This is my first Shiny app ever and it was quite a massive undertaking. I did not know what I was getting myself into and I dove really, really deep. I collected a ton of info on all the countries from the CIA World Factbook and this app visualizes that data in several different ways. Here’s the link!  https://vazgenzakaryan.shinyapps.io/world_explor_r/
  2. Song Sentiment Analyzer: This was a pretty interesting experience. This app fetches up to five songs’ lyrics from the Internet and analyzes them for emotions, displaying them on a radar chart. Very interesting to build and I ran into some problems that I solved on my own – learned quite a bit from building this!  https://vazgenzakaryan.shinyapps.io/song_sentiment/
  3. Text Sentiment Analyzer: A poet friend of mine, Taylor Collier, saw the Song Sentiment Analyzer app and asked if I could modify it to take text and analyze its sentiments so he could play around with his poems. I obliged, simplifying the code from Song Sentiment Analyzer to turn it into this app. It proved very handy for the app that’s to follow..
    https://vazgenzakaryan.shinyapps.io/poem_analyzer/
  4. State of the Union Sentiment Analyzer: I took every State of the Union address transcript and used the code to Text Sentiment Analyzer to extract the sentiment from those addresses. This app explores all the sentiments in all the State of the Union addresses and allows you to compare presidents or political parties. Again, ran into a few problems that I haven’t seen before, and solving them proved to be a valuable learning experience.
    https://vazgenzakaryan.shinyapps.io/SotU_sentiments/

Aside from these, I also participated in some Kaggle competitions. I did reasonably well, but I found that my time wasn’t being utilized properly with those. In most real data science jobs, you won’t need to spend days trying to get 0.5% better performance out of a model; instead you’ll spend more time cleaning data, which Kaggle doesn’t allow you to practice because their data is already (mostly) clean.  However, along my Kaggle journey I found and read a fantastic book on Machine Learning, Applied Predictive Modeling by Max Kuhn. I highly recommend it if you’re interested in machine learning with R.

So that’s what I’ve been doing the past few months! I have a couple more Shiny projects in mind that I will be working on alongside job applications, so maybe I’ll update this soon and let you know how it goes!

Vazgen Zakaryan

You learn more by doing

I’ve been a proponent of homework my entire teaching career. Actually sitting down and solving a problem on your own will lead you to discover intricacies in methods that a teacher cannot always fully convey. However, since I personally haven’t done homework in a while, I completely forgot about this neat little factoid. That is, until I decided to write a function in R.

Recently, I did a survey about board games on Reddit. I used Google Forms, which has different types of questions. One type of questions you can use is the “check marks” questions, where you check every option that applies out of a list. Well, Google Forms generates a CSV file for your results. In that file, a “check mark” question is treated as a single column that has all the options a person selected separated by commas, like such:

Option A, Option B, Option C
Option A, Option C
Option A, Option B, Option D, Option E
etc

In order to analyze these results, I would need each option in its own column, with the entries being 1 for that option being selected or 0 for that option not being selected. So the column above would turn into:

Option A         Option B             Option C             Option D                Option E
1                       1                           1                           0                              0
1                       0                           1                           0                              0
1                       1                           0                           1                              1

I had been thinking about how to accomplish this for a little while. From the courses on DataCamp, I knew there was a way to turn a column of single entries into several columns with zeroes and ones, and there was also a way to separate a column into multiple columns on comma separated entries. Sure, if I messed with those two options enough, I could probably figure it out after a little while. However, I decided to give it a shot writing my own function for my specific purpose. I thought about it for about a day and followed some classic advice from some very good programmers, and wrote a working prototype in about 30 minutes. It didn’t have a whole lot of options and it used nested loops (a pretty deadly sin in R) but it did what it was set out to do.

The next day, I took a dive into the ~apply family of functions in R and managed to take out both for loops in my function. I also added some extra functionality like giving the user the option to select a separator other than a comma, and whether to delete the original column.

Lately I’ve been in the process of brainstorming a bit more functionality. I am considering letting the user choose where to insert the new columns (in place of the original or at the very end of the data frame) among a couple of other options. The function could also use some improved readability and commenting. I’ll probably use it to figure out GitHub later and make a separate post about that 🙂

Writing this function taught me a few lessons that reading someone else’s code, or listening to a lecture would just never drive the point home. For example, I know very clearly understand the difference between using single or double bracket sub-setting on data frames. It’s just not something I would remember if someone told me, unless I got to experience it on my own through a simple homework assignment.

So it seems like the process is fairly simple! Come up with a process that works, no matter how inefficient, then improve efficiency and add options. I guess that applies to any solution to a problem though, doesn’t it?

Vazgen Zakaryan