The NMES1988 dataset in the AER package contains information on the demand for medical care in the US in 1987 and 1988. Specifically, it has the following columns: visits: Number of physician office visits. nvisits: Number of non-physician office visits. ovisits: Number of physician hospital outpatient visits. novisits: Number of non-physician hospital outpatient visits. emergency: Emergency room visits. hospital: Number of hospital stays. health: Factor indicating self-perceived health status, levels are “poor”, “average” (reference category),“excellent”. chronic: Number of chronic conditions. adl: Factor indicating whether the individual has a condition that limits activities of daily living(“limited”) or not (“normal”). region: Factor indicating region, levels are northeast, midwest, west, other (reference category). age: Age in years (divided by 10). afam: Factor. Is the individual African-American? gender: Factor indicating gender. married: Factor. is the individual married? school: Number of years of education. income: Family income in USD 10,000. employed: Factor. Is the individual employed? insurance: Factor. Is the individual covered by private insurance? medicaid: Factor. Is the individual covered by Medicaid?STAT 4520/7520 – Homework 2
Spring 2021
Due: February 22, 2021
1) LOS data
The NMES1988 dataset in the AER package contains information on the demand for medical care in the
US in 1987 and 1988. Specifically, it has the following columns:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
visits: Number of physician office visits.
nvisits: Number of non-physician office visits.
ovisits: Number of physician hospital outpatient visits.
novisits: Number of non-physician hospital outpatient visits.
emergency: Emergency room visits.
hospital: Number of hospital stays.
health: Factor indicating self-perceived health status, levels are “poor”, “average” (reference category),
“excellent”.
chronic: Number of chronic conditions.
adl: Factor indicating whether the individual has a condition that limits activities of daily living
(“limited”) or not (“normal”).
region: Factor indicating region, levels are northeast, midwest, west, other (reference category).
age: Age in years (divided by 10).
afam: Factor. Is the individual African-American?
gender: Factor indicating gender.
married: Factor. is the individual married?
school: Number of years of education.
income: Family income in USD 10,000.
employed: Factor. Is the individual employed?
insurance: Factor. Is the individual covered by private insurance?
medicaid: Factor. Is the individual covered by Medicaid?
The dataset can be loaded with the command
data(NMES1988,package=”AER”)
We are interested in determining if the number of times patients seek medical care depends on the various
patient demographics. For now, we will focus on the number of normal physician office visits, and remove all
other types of visits.
library(dplyr)
NMES1988 %
dplyr::select(-c(nvisits,ovisits,novisits,emergency,hospital))
a. Create a histogram of the number of physician office visits. Adjust the number of bins with the breaks
argument to obtain a plot which describes the variable well. Briefly describe your findings.
b. Make some exploratory plots to show the relationship between the response, visits, and insurance, with
and without applying the log function to visits. What do you find? Which plot makes it easier to see
the potential relationship?
1
c. Build a Poisson regression model with visits as the response all other variables as possible predictor
variables. Examine the summary output and comment on the significance of variables as well as the
quality of model fit via the deviance.
d. Interpret the value of the coefficient for the employed variable.
e. Compute the mean and variance of the visits variable within each value of the school variable. Comment
on the relationship you observe and the viability of a Poisson model for this data.
f. Fit a negative binomial model with visits as the response all other variables as possible predictor
variables. Examine the summary output and comment on the significance of variables as well as the
quality of model fit via the deviance. Is it better than the Poisson model? Overall, do the directions of
significant relationships make sense?
g. Plot the residuals against the fitted values. Why are there lines of observations on the plot?
2)
The ccancer dataset in the GLMsData package contains the estimated number of deaths from cancer in
three regions of Canada by cancer site and gender.
•
•
•
•
•
Count: the estimated number of deaths by the given cancer; a numeric vector
Gender: gender; a factor with levels either codeF (female) or codeM (male)
Region: the region; a factor with levels Ontario, Newfoundland or Quebec
Site: the cancer site; a factor with levels Lung, Colorectal, Breast, Prostate or Pancreas
Population: the estimated population of the region in 2000/20001; a numeric vector
a. Investigate the data and note the 0 values. Are these counts obtained at random, or are they structural
(impossible to be nonzero)? Remove these observations.
b. Create a variable storing the rate of number of deaths by the given cancer per 10,000 inhabitants. Plot
this against each of the three potential predictors and comment on the relationships.
c. Fit a Poisson rate model for number of deaths using the same variables as predictors. Does this model
fit the data well?
d. Interpret the coefficients of the Region variable.
3)
The kstones dataset in the GLMsData package contains a table summarizing treatment of kidney stones. It
has the variables:
• Counts: the number of subjects in the given classification; a numeric vector
• Size: whether the subject has kidney stones with mean diameter less than 2cm (coded as Small) or
greater than or equal to 2cm (coded as Large); a factor with levels Large and Small
• Method: the treatment method; a factor with levels A (open surgery) or B (percutaneous nephrolithotomy)
• Outcome: the outcome of the stated treatment; a factor with levels Failure and Success
a. Print a three way table of the data using the xtabs function. Describe any patterns you see with
respect to how the variables or relationships between variables effect the counts.
b. Fit a partial independence model where Size and Method interact. Does the model fit well?
c. Fit a conditional independence model where Outcome and Method are independent, given the Size of
the kidney stones. Compare this model to the partial independence model.
d. Fit a uniform association model. Compare this to the conditional independence model. Which model
should be used? Why?
2
e. (7520 only) Interpret the dependence structure in the conditional independence model. In other words,
what does this mean about the relationships between variables and how they determine the counts?
3
Purchase answer to see full
attachment
Why Choose Us
- 100% non-plagiarized Papers
- 24/7 /365 Service Available
- Affordable Prices
- Any Paper, Urgency, and Subject
- Will complete your papers in 6 hours
- On-time Delivery
- Money-back and Privacy guarantees
- Unlimited Amendments upon request
- Satisfaction guarantee
How it Works
- Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
- Fill in your paper’s requirements in the "PAPER DETAILS" section.
- Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
- Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
- From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.