I wanted to let people know that my new book Handbook of Regression Modeling in People Analytics is now available.
Author Archives: Keith McNulty
Five Tidyverse Tricks You May Not Know About
With recent major updates in two core packages, the tidyverse has substantially improved in the flexible options it offers for data wrangling. Here are five examples of what I mean.
Data Scientists Should Learn Through Play
I first heard of Learning Through Play when I sent my kids to pre-school, but now I realize it’s how all Data Scientists should learn
Visualizing How Networks Change Over Time
Watching phenomena change over time is a big component of modern data science techniques and is the basis for time series methodologies. However, when it comes to networks, whether of people or something else, I don’t see a lot of work being done on understanding how they change over time. In this article – the last of my project based on the Friends TV series – I look at ways that you can create visualizations of changing networks using R (for more basic methods) and the Javascript D3 library (for more advanced methods).
Community Detection in R Using Communities of Friends Characters
In this article I will use the community detection capabilities in the igraph package in R to show how to detect communities in a network. By the end of the article we will able to see how the Louvain community detection algorithm breaks up the Friends characters into distinct communities (ignoring the obvious community of the six main characters), and if you are a fan of the show you can decide if this analysis makes sense to you.
What you need to know about dplyr 1.0.0 – Part 3: Working row-wise
dplyr is now much more friendly to row by row operations
What you need to know about dplyr 1.0.0 – Part 2: more flexible summarise()
Summarise – the original workhorse of dplyr – has been made even more flexible in the new release.
What you need to know about dplyr 1.0.0 – Part 1: The across() adverb
In this article I want to highlight one of the key developments of this release – the across() function.
Simple iterative programming and error handling in R
As you develop as a programmer, there are common situations you will find yourself in. One of those situations is where you need to run your code over a number of iterations of one or more loops, and where you know that your code may fail for at least one iteration. You don’t want your code to stop completely, but you do want to know that it failed and log where it happened. I am going to show a simple example of how to do this here.
Scraping Structured Data From Semi-Structured Documents
One of the most powerful capabilities that data science tools bring to the table is the capacity to deal with unstructured data and to turn it into something that can be structured and analyzed. Any data scientist worth their salt should be able to ‘scrape’ data from documents, whether from the web, locally or any other type of text-based asset.