Thursday, June 21, 2012

R and the web (for beginners), Part I: How is the local nuclear plant doing?



One of the things I especially like about R is its ability to easily access and process data from the web. If you are new to R, or never have used it to access data from the Internet, here is the first part of a little series of posts with examples to get you started. This first post gives a very simple example of how to access a data set that is saved online.

This might be particularly useful if the data set at hand is frequently updated and you want to repeatedly generate a statistical report of some kind based on this data set. Hence, having the analysis and the link to the data in one R-script means you only have to rerun the script whenever you want to update your report or anybody else wants to reproduce it. The data I'm using for this example is exactly of that type. It's a file published by the United States Nuclear Regulatory Commission (U.S. NRC) reporting the power reactor status of U.S. nuclear power plants for the last 365 days, thus it is updated every day. 

How to access that data directly through R? 

# First: save the url of the data file as character string

url.npower <- "http://www.nrc.gov/reading-rm/doc-collections/event-status/reactor-status/PowerReactorStatusForLast365Days.txt"


# then read the data file into R (in this case the data file is a text file with "|" separating the columns) 

npower <- read.table(url.npower, sep="|", header=TRUE)

# and format the date column

npower$ReportDt <- as.Date(npower$ReportDt, format="%m/%d/%Y")


The data set is now ready for analysis. For example: a graphical analysis of the recent power reactor status of some of the nuclear power plants:

# load the necessary lattice package
# (if it isn't installed yet, run: install.packages("lattice")
library(lattice)

# take a subset of the data
sample <- npower[npower$Unit==as.character(unique(npower$Unit)[1:24]),]

# get a graphical overview
xyplot(Power~ReportDt | Unit, data=sample, type="l",col.line="black", xlab="Time",ylab="Power" )



Save the code above in an R-script, rerun it some days later, and your graphical analysis will be up to date.

No comments:

Post a Comment