If you haven’t already done so, read in the demographics information from the last section and create the height column:
To read in the data, you use the read.csv() function.
If you have the file on your local disk, you can give the path name in the
parentheses. Or, in this case, you can read it straight from the website.
(In the example below, we show only a few of the rows rather than waste the
space here on this page.)
> student <- read.csv("http://evc-cit.info/psych018/r_intro/demographics.csv") > student$height <- student$feet * 12 + student$inches
To get a histogram of the heights, just do this:
> hist(student$height)
You get an image like this (click image to see full size). Note that the title and x-axis label leave something to be desired.
To fix this, you need to put some extra parameters into the call to hist.
Don’t type the plus sign—R puts that in for you because the first line
isn’t a complete R command.
> hist(student$height, main="Distribution of Heights", + xlab="Height in Inches")
Now the labels are much nicer:
Notice that the x-axis divisions have been chosen by R, and they aren’t
exactly wonderful. You can specify the lower and upper limits for the x-axis
with the xlim parameter. It requires two numbers, so you need to use
c( ) to put those numbers together. In this example,
the plot will give the x-axis a range from 50 to 80 inches.
> hist(student$height, main="Distribution of Heights", + xlab="Height in Inches", xlim=c(50,80))
This shows us the frequency of each height range. Sometimes, though,
you want to see a percentage of the total. To do this, add
freq=F to the command. In R, F
is an abbreviation for FALSE (you may type either one), and
T is short for TRUE. By typing
freq=F, you are telling R that you do not
want frequencies, but the density instead. Try this:
> hist(student$height, main="Distribution of Heights", + xlab="Height in Inches", xlim=c(50,80), + freq=F)
Click the picture to see it in full size, and note the y-axis. This plot was done on a different system than the others, so the lettering looks a bit different.
Let’s create bar plots for the mean height and weight for males and females. First, separate out the males and females into two new data frames:
> males <- student[student$gender=="M",] > females <- student[student$gender=="F",]
Now create a data frame with the numbers you want. This time, rather than reading in a data frame from an external file, you are creating it one column at a time. The first column is gender, the second is height, and the third is weight.
> hwmean <- data.frame( gender=c("F", "M"), + height=c(mean(females$height), mean(males$height)), + weight=c(mean(females$weight), mean(males$weight))) >hwmean gender height weight 1 F 62.76000 126.42 2 M 68.46154 161.00
You can then do a barplot of the heights, with axes labeled properly:
> barplot(hwmean$height, xlab="Gender", ylab="Mean Height in Inches", names=hwmean$gender)
...more to come