These examples will use the heart attack data which comes with this description of its variables:
Heart Attack Patients This set of data is all of the hospital discharges in New York State with an admitting diagnosis of an Acute Myocardial Infarction (AMI), also called a heart attack, who did not have surgery, in the year 1993. There are 12,844 cases. AGE gives age in years SEX is coded M for males F for females DIAGNOSIS is in the form of an International Classification of Diseases, 9th Edition, Clinical Modification code. These tell which part of the heart was affected. DRG is the Diagnosis Related Group. It groups together patients with similar management. In this data set there are just three different drgs. 121 for AMIs with cardiovascular complications who did not die. 122 for AMIs without cardiovascular complications who did not die. 123 for AMIs where the patient died. LOS gives the hospital length of stay in days. DIED has a 1 for patients who died in hospital and a 0 otherwise. CHARGES gives the total hospital charges in dollars. Data provided by Health Process Management of Doylestown, PA.
This is a very large data set and so is provided as zip files. (You may need a program such as winzip to unzip them). Available are plain text (with tabs separating entries) and Excel versions of the data.
Getting tables into R is a bit complicated so use this file which contains only the data on the DIED variable. Save it on your hard drive in the directory where the R program is located. If you name the file DIED4R.txt, you can use this R command to input the data
> died = scan(file="DIED4R.txt") Read 12844 items
This puts the data into a variable called "died". Use table on this variable to get counts if you do not already have them.
> table(died)
died
0 1
11434 1410
1410 of the patients died. A single command gives confidence intervals and tests any hypothetical p0 specified. Here we compare this proportion to a (hypothetical) usual mortality rate of 10%. Ignore the X-squared value and use the p-value for a hypothesis test.
> prop.test(1410,12844,p=0.1)
1-sample proportions test with continuity correction
data: 1410 out of 12844, null probability 0.1
X-squared = 13.5385, df = 1, p-value = 0.0002337
alternative hypothesis: true p is not equal to 0.1
95 percent confidence interval:
0.1044507 0.1153421
sample estimates:
p
0.1097789
Getting R to read a table containing more than one variable is more complicated but you will need to learn how to do this eventually.
©2006-2007 Robert W. Hayden