Great R packages for data import, wrangling, and visualization

The table below shows my favorite go-to R packages for data import, wrangling, visualization and analysis — plus a few miscellaneous tasks tossed in. The package names in the table are clickable if you want more information. To find out more about a package once you’ve installed it, type help(package = “packagename”) in your R console (of course substituting the actual package name ).

My favorite R packages for data visualization and munging

Package
Category
Description
Sample Use
Author

dplyr
data wrangling, data analysis
The essential data-munging R package when working with data frames. Especially useful for operating on data by categories. CRAN.
See the intro vignette
Hadley Wickham

purrr
data wrangling
purrr makes it easy to apply a function to each item in a list and return results in the format of your choice. It’s more complex to learn than the older plyr package, but also more robust. And, its functions are more standardized than base R’s apply family — plus it’s got functions for tasks like error-checking. CRAN.
map_df(mylist, myfunction)
More: Charlotte Wickham’s purr tutorial video, the purrr cheat sheet PDF download, easy error checking with purrr’s possibly.
Hadley Wickham

readxl
data import
Fast way to read Excel files in R, without dependencies such as Java. CRAN.
read_excel(“my-spreadsheet.xls”, sheet = 1)
Hadley Wickham

readr and vroom
data import
Base R handles most of these functions; but if you have huge files, these packages offer faster and standardized way to read CSVs and similar files into R. readr has been around for awhile; vroom is a speedier alternative, useful for larger data sets. Eventually the packages may merge. data.table’s fread() is another useful alternative. CRAN.
read_csv(myfile.csv) or vroom(myfile.csv)
Hadley Wickham (readr), Jim Hester (vroom)

rio
data import, data export
rio has a good idea: Pull a lot of separate data-reading packages into one, so you just need to remember 2 functions: import and export. CRAN.
import(“myfile”)
Thomas J. Leeper & others

tidyxl
data import, data wrangling
If you’ve ever wanted to tear your hair out over an Excel file with merged cells, data in column headers, headers mixed in data, and key information in color coding, this is the package for you. Each cell is imported in its own row, with information about data type, position, and color, not just value, allowing you to reshape the data from there. Super time saver for messy data. CRAN.
xlsx_cells(“my_nightmare_file.xlsx”)
Duncan Garmonsway

Hmisc
data analysis
There are a number of useful functions in here. Two of my favorites: describe, a more robust summary function, and Cs, which creates a vector of quoted character strings from unquoted comma-separated text. Cs(so, it, goes) creates c(“so”, “it”, “goes”). CRAN.
describe(mydf)
Cs(so, it, goes)
Frank E Harrell Jr & others

datapasta
data import
Data copy and paste: Meet reproducible research. If you’ve copied data from the Web, a spreadsheet, or other source into your clipboard, datapasta lets you paste it into R as an R object, with the code to reproduce it. It includes RStudio add-ins as well as command-line functions for transposing data, turning it into markdown format, and more. CRAN.
df_paste() to create a data frame, vector_paste() to create a vector.
Miles McBain

sqldf
data wrangling, data analysis
Do you know a great SQL query you’d use if your R data frame were in a SQL database? Run SQL queries on your data frame with sqldf. CRAN.
sqldf(“select * from mydf where mycol > 4”)
G. Grothendieck

jsonlite
data import, data wrangling
Parse json within R or turn R data frames into json. CRAN.
myjson % symbol for chaining R operations, but it’s got other useful operators such as %% for mutating a data frame in place and and . as a placeholder for the original object being operated upon. CRAN.
mydf %% mutate(newcol = myfun(colname))
Stefan Milton Bache & Hadley Wickham

validate
data wrangling
Intuitive data validation based on rules you can define, save and re-use. CRAN.
See the introductory vignette.
Mark van der Loo & Edwin de Jonge

testthat
programming
Package that makes it easy to write unit tests for your R code. CRAN.
See the testing chapter of Hadley Wickham’s book on R packages.
Hadley Wickham

data.table
data wrangling, data analysis
Popular package for heavy-duty data wrangling and computation. While I often prefer dplyr for basic analysis, data.table has become my go-to for large data sets or when speed is critical (such as in Shiny apps). CRAN.
data.table in 5 minutes video, The ultimate data.table cheat sheet, Intro vignette
Matt Dowle & others

stringr
data wrangling
Numerous functions for text manipulation. Some are similar to existing base R functions but in a more standard format, including working with regular expressions. Some of my favorites: str_pad and str_trim. CRAN.
str_pad(myzipcodevector, 5, “left”, “0”)
Hadley Wickham

lubridate
data wrangling
Everything you ever wanted to do with date arithmetic, although understanding & using available functionality can be somewhat complex. CRAN.
mdy(“05/06/2015”) + months(1)
More examples in the package vignette
Garrett Grolemund, Hadley Wickham & others

DataExplorer
data analysis
Not sure where to get started looking at a data set? Want to get a basic handle on that data without running multiple commands like str() and plot()? DataExplorer attempts to offer one-click report generation to show and visualize basics about a data set, such as distributions and missing data. CRAN.
create_report(mydataframe)
Boxuan Cui

zoo
data wrangling, data analysis
Robust package with a slew of functions for dealing with time series data; I like the handy rollmean function with its align=right and fill=NA options for calculating moving averages. CRAN.
rollmean(mydf, 7)
Achim Zeileis & others

tsbox
data wrangling, data analysis
Super easy way to convert data between different R time-series data formats: xts, data frame, zoo, tsibble, and more. Plus some basic analysis functions. CRAN.
ts_zoo(mydf)
Christoph Sax

knitr and rmarkdown
data display
Add R to a markdown document and easily generate reports in HTML, Word and other formats. A must-have if you’re interested in reproducible research and automating the journey from data analysis to report creation. CRAN.
See the Minimal Examples knitr page and RStudio’s R Markdown page.
Yihui Xie & others (knitr), RStudio (rmarkdown)

remedy
data display
RStudio add-in offers a menu for R Markdown formatting commands, so you no longer need to remember and/or type code for things like making an HTML list or embedding a YouTube video. While WYSIWYG editing is now available for R Markdown, this add-in still has benefits: Its commands can be assigned custom keyboard shortcuts, so you can create your own shortcuts for tasks like bolding text. GitHub.
See the package website.
Colin Fay & others

ymlthis
data display
Another useful RStudio add-in for R Markdown, this helps you generate YML headers with proper format. GitHub.
See the package website.
Malcolm Barrett & Richard Iannone

officeR
data display
Import and edit Microsoft Word and PowerPoint documents, making it easy to add R-generated analysis and visualizations to existing as well as new reports and presentations. CRAN.
my_doc %
body_add_img(src = myplot)
The package website has many more examples.
David Gohel

listviewer
data display, data wrangling
While RStudio has since added a list-viewing option, this HTML widget still offers an elegant way to view complex nested lists within R. GitHub timelyportfolio/listviewer.
jsonedit(mylist)
Kent Russell

DT and reactable
data display
Create a sortable, searchable table in one line of code with either of these R packages CRAN.
DT::datatable(mydf)
reactable::reactable(mydf): Quick interactive HTML tables
reactable: reactable: Create tables with expandable rows
DT: RStudio
reactable: Gregg Lin

ggplot2
data visualization
Powerful, flexible and well-thought-out dataviz package following ‘grammar of graphics’ syntax to create static graphics, but be prepared for a steep learning curve. CRAN.
qplot(factor(myfactor), data=mydf, geom=”bar”, fill=factor(myfactor))
See my searchable ggplot2 cheat sheet and
time-saving code snippets.
Hadley Wickham

patchwork
data visualization
Easily combine ggplot2 plots and keep the new, merged plot a ggplot2 object. plot_layout() adds ability to set columns, rows, and relative sizes of each component graphic. GitHub.
plot1 + plot2 + plot_layout(ncol=1)
Thomas Lin Pedersen

ggforce
data visualization
Adds some design functionality to base ggplot2 including easy labeling of plot groups. CRAN.
See this blog post by RStudio’s Edgar Ruiz for several useful examples.
Thomas Lin Pedersen

plotly
data visualization
R interface to the Plotly JavaScript library that was open-sourced in late 2015. Basic graphs have a distinctive look which may not be for everyone, but it’s full-featured, relatively easy to learn (especially if you know ggplot2) and includes a ggplotly() function to turn graphs created with ggplot2 interactive. CRAN.
d

Source:: Computerworld