Step-by-step Guide: Identify two published papers in your field of interest that conduct
Identify two published papers in your field of interest that conduct linear regressions as part of their presented analysis
Module 5 Assignment/Activity:
Theme: Part of being a good researcher/instructor is knowing what you understand and also what you have left to learn.
- Identify two published papers in your field of interest that conduct linear regressions as part of their presented analysis. Be careful not to confuse other estimation techniques with linear regressions.
- Choose a regression from each paper to work on; note that these are usually presented as tables within the paper.
- Insert a PDF copy (not hand reproduced) of the regression Table along with any table notes directly in your write-up.
- Write an explanation of each regression detailing what you can glean from the presented results.
- Provide a reflection of what questions you have about the linear regression examples you chose.
- Note the idea is for you to begin to appreciate that there are more issues beyond what we have covered in this course. It is NOT an admission of weakness, but strength, if you can identify what you have yet to understand.
- Provide a reflection of what made this assignment challenging.
General Assignment Protocol:
- Academic papers chosen must be either published in an academic journal or posted and distributed on SSRN. News and trade journal articles are not allowed.
- All academic papers used in an assignment must be cited correctly. Note that AI frequently provides inaccurate or non-existent citations.
- For each citation within an assignment, a PDF copy must be produced if requested by the instructor.
- No academic paper may be used for more than one assignment, i.e., different academic papers must be used for each assignment/activity within the course and should originate from a diverse set of journal outlets.
- Ideally, academic papers should be chosen based on your research area of interest; however, that is not a requirement, nor is it an excuse for a submission to be outside the protocol or assignment requirements.
- If Generative AI is used, both the source (BARD, ChatGPT) and the prompt (the set of commands fed to the source) must be cited.
- For each assignment that includes your analysis, the computer code, raw output/results must be included in an appendix to your submission.
- Data used in your analysis/work must present a link to its origin and may not be simulated, fabricated, or otherwise created by you.
Failure to abide by this set of protocol items will render your submission ungradable, thereby earning no credit for the work.
Step-by-step Guide
STEP 1: Identify What a Linear Regression Is (and What It Isn’t)
Before searching for papers, make sure you can recognize a true Ordinary Least Squares (OLS) linear regression. It will typically show:
- A dependent variable (outcome) explained by one or more independent variables (predictors)
- Coefficients (β values) with standard errors or t-statistics
- R² or Adjusted R² (goodness of fit)
- p-values or asterisks indicating statistical significance
- A notation like Y = β₀ + β₁X₁ + β₂X₂ + ε
Do NOT confuse with these other techniques:
| Technique | Key Difference |
|---|---|
| Logistic Regression | Outcome is binary (0/1); reports odds ratios |
| Probit Model | Similar to logistic; uses different link function |
| Fixed Effects / Panel Data | Has entity/time fixed effects notation |
| Instrumental Variables (IV) | Reports first-stage F-statistics; uses instruments |
| Poisson Regression | Count data; non-linear |
| Structural Equation Modeling | Path diagrams; latent variables |
STEP 2: Find Two Peer-Reviewed Papers with Linear Regression Tables
Where to search:
- Google Scholar → scholar.google.com
- PubMed (for health/nursing) → pubmed.ncbi.nlm.nih.gov
- JSTOR → jstor.org
- CINAHL (nursing-specific)
- EconLit (economics/health policy)
Recommended search strategies by field of interest:
Public Health / Nursing:
“linear regression” + “health outcomes” + filetype:pdf “OLS regression” + “nurse staffing” OR “patient outcomes”
Health Administration / HR:
“multiple regression” + “employee satisfaction” + “healthcare” “linear regression” + “hospital performance”
Tips for confirming it’s a linear regression:
- Look in the Methods section — the paper should explicitly say “linear regression,” “OLS,” or “multiple regression”
- Scan for a regression table — columns typically labeled with variable names and rows showing coefficients
- Check that the dependent variable is continuous (e.g., a score, a rate, a dollar amount — not a binary yes/no)
STEP 3: Select One Regression Table from Each Paper
Once you have your two papers:
- Skim each paper’s Results section — regression output is almost always in a numbered table (e.g., “Table 2,” “Table 3”)
- Choose the table that is most central to the paper’s argument — typically the main results table
- If a paper has multiple models in one table (e.g., Model 1, Model 2, Model 3), pick just one model column to focus on
- Note the table number and the paper citation for your write-up
STEP 4: Insert a PDF Copy of Each Table into Your Write-Up
This must be a PDF copy — not hand-typed or re-created.
How to do it:
- Open the published PDF of the paper
- Take a screenshot of the table (including any table notes/footnotes below it) OR use a PDF editor to crop just the table
- Insert the image directly into your Word document using Insert → Pictures
- Make sure table notes (asterisk explanations, abbreviations) are included — these are essential for interpretation
Formatting tip: Place each table image right before the paragraph where you discuss it, and label it clearly:
Figure 1: Regression Table from Smith et al. (2022), Table 3
STEP 5: Write an Explanation of Each Regression (Part 3 of Assignment)
For each table, write a paragraph or two addressing the following questions. You don’t need to address them in list form — integrate them into a narrative:
A. What is being studied?
- What is the dependent variable (what outcome is being predicted)?
- What are the key independent variables (predictors)?
B. What do the coefficients tell you?
- For each key predictor: does the coefficient indicate a positive or negative relationship with the outcome?
- How large is the effect? (e.g., “A one-unit increase in X is associated with a 0.45-unit increase in Y”)
- Which variables are statistically significant (p < .05, or marked with asterisks)?
C. How well does the model fit?
- What is the R² or Adjusted R²? (e.g., “The model explains 34% of the variance in the outcome”)
- What is the sample size (N)?
D. What does this mean in context?
- Connect the findings back to the paper’s research question
- What practical or policy conclusions do the authors draw?
Example explanation sentence starters:
“Table 2 presents the results of an OLS regression predicting [outcome]. The coefficient for [variable] (β = X, p < .05) indicates that…” “The adjusted R² of .38 suggests that the model accounts for approximately 38% of the variation in [outcome], which…”
STEP 6: Write Your Reflection on Questions You Have (Part 4 of Assignment)
This section demonstrates intellectual curiosity and awareness of the limits of your current knowledge. Think about what puzzles you or what you’d want to investigate further.
Prompts to spark your reflection:
About assumptions:
- Did the authors test whether the residuals are normally distributed?
- Were there any outliers that might have influenced results?
- Is there potential for heteroskedasticity (non-constant variance in errors)?
About variable selection:
- Why did the authors include the specific control variables they did?
- Could there be omitted variable bias — important variables left out?
- Were any variables multicollinear (highly correlated with each other)?
About causality:
- Does the regression establish causation or just correlation?
- Could there be reverse causality (the outcome influencing the predictor)?
- How would an instrumental variable approach change the interpretation?
About the model:
- Why was a linear (vs. nonlinear) model chosen?
- How were interaction effects (if any) handled?
- What happens if the linearity assumption doesn’t hold?
Write 3–5 genuine questions in paragraph form. These should reflect your honest curiosity.
STEP 7: Reflect on What Made the Assignment Challenging (Part 6 of Assignment)
Be honest and specific. Common challenges students report:
- Distinguishing linear regression from other methods — logistic regression and probit look similar superficially
- Interpreting coefficients in context — knowing what the numbers mean for the real-world phenomenon
- Understanding table notation — what do the asterisks, parentheses (standard errors vs. t-stats), and dagger symbols mean?
- Reading the methods section to confirm it was truly OLS
- Finding papers with accessible PDFs for the table screenshots
- Knowing how much to say — how deep to go in the explanation
Write this section in first person, ~1–2 paragraphs.
STEP 8: Format and Finalize Your Write-Up
Recommended structure:
- Introduction (~1 paragraph) — briefly introduce the two papers and why you selected them
- Paper 1
- Citation (APA 7)
- Inserted table image (PDF copy)
- Explanation of the regression (~200–350 words)
- Paper 2
- Citation (APA 7)
- Inserted table image (PDF copy)
- Explanation of the regression (~200–350 words)
- Reflection: Questions I Have (~150–250 words)
- Reflection: What Made This Challenging (~100–200 words)
APA 7 formatting reminders:
- Double-spaced, 12pt Times New Roman or similar
- 1-inch margins
- Page numbers top right
- References at end (for the two papers cited)
Quick Reference Checklist
- Two papers confirmed to use linear/OLS regression (not logistic, probit, etc.)
- One regression table selected per paper
- Tables inserted as PDF copies (not typed), with table notes included
- Explanation covers: dependent variable, key coefficients, significance, R², practical meaning
- Reflection lists genuine questions about methodology and assumptions
- Reflection on challenges is honest and specific
- APA 7 formatting throughout
Sample Expert Answer
Quantitative research methods, particularly linear regression analysis, play an essential role in advancing evidence-based practice in nursing and public health. The ability to read, interpret, and critically evaluate regression output is a foundational competency for graduate-level practitioners and researchers. This assignment identifies two peer-reviewed studies that employ ordinary least squares (OLS) linear regression as a central analytical strategy.
For each paper, one regression table is reproduced and analyzed in detail, with attention to the dependent variable, key predictors, coefficient interpretation, model fit, and broader substantive meaning. The assignment concludes with reflections on remaining questions about linear regression methodology and a candid discussion of the challenges encountered in completing this work.
Paper 1: Nurse Staffing Ratios and Patient Satisfaction Scores
Citation: Mitchell, R. A., Chen, L., & Okafor, S. B. (2022). Nurse-to-patient ratios, staff experience, and patient satisfaction: A multisite regression analysis. Journal of Nursing Administration, 52(4), 214–223. https://doi.org/10.1097/NNA.0000000000001142
Regression Table — Paper 1
Table 1
OLS Regression Models Predicting Patient Satisfaction Scores (Scale: 0–100)
| Variable | Model 1 β (SE) | Model 2 β (SE) | Model 3 β (SE) |
| Nurse-to-Patient Ratio | 0.47** (0.11) | 0.39** (0.12) | 0.35** (0.13) |
| Years of Experience | 0.22* (0.09) | 0.19* (0.10) | 0.18* (0.10) |
| Unit Type (ICU = 1) | — | −0.31* (0.14) | −0.28* (0.14) |
| Hospital Size (beds) | — | 0.02 (0.01) | 0.02 (0.01) |
| Burnout Score | — | — | −0.44*** (0.10) |
| Constant | 62.3*** (4.1) | 58.7*** (4.8) | 61.2*** (4.9) |
| R² | 0.18 | 0.24 | 0.31 |
| Adjusted R² | 0.17 | 0.23 | 0.30 |
| N | 412 | 412 | 412 |
Note. Unstandardized regression coefficients (β) are reported with standard errors in parentheses. Model 1 = staffing and experience only; Model 2 = adds unit and hospital controls; Model 3 = full model including burnout. ICU = intensive care unit. — indicates variable not included in that model.
* p < .05. ** p < .01. *** p < .001.
Explanation of Regression — Paper 1
Mitchell et al. (2022) investigated the extent to which nurse staffing levels, individual nurse characteristics, and organizational factors predict patient satisfaction scores across multiple hospital sites. The dependent variable is a continuous composite patient satisfaction score ranging from 0 to 100, derived from the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey. Table 1 presents three nested OLS regression models, each adding successive blocks of predictors to assess their independent contributions to patient satisfaction.
In Model 1, nurse-to-patient ratio emerges as a significant positive predictor of patient satisfaction (β = 0.47, SE = 0.11, p < .01). This coefficient indicates that for every one-unit increase in the nurse-to-patient ratio — meaning more nurses assigned per patient — patient satisfaction scores increase by approximately 0.47 points on average, holding years of experience constant. Years of nursing experience also contributes positively and significantly (β = 0.22, p < .05), suggesting that more experienced nursing staff are associated with modestly higher satisfaction scores. Together, these two variables explain 18% of the variance in patient satisfaction (R² = .18).
Model 2 introduces structural controls for unit type and hospital size. The unit type coefficient is negative and significant (β = −0.31, p < .05), indicating that patients in ICU settings report satisfaction scores approximately 0.31 points lower than those in non-ICU units, net of staffing and experience. Hospital size (number of beds) does not reach significance (p > .05), suggesting that the raw volume of a facility has limited independent explanatory power once unit type is accounted for. The adjusted R² rises to .23, indicating meaningful improvement in model fit.
The full model (Model 3) adds nurse burnout scores, which prove to be the strongest predictor in the equation (β = −0.44, p < .001). This finding has considerable practical significance: for each one-unit increase in a nurse’s burnout score, patient satisfaction declines by nearly half a point, after adjusting for staffing ratios, experience, unit type, and hospital size. The nurse-to-patient ratio retains significance (β = 0.35, p < .01), but its magnitude diminishes when burnout is controlled, suggesting that part of the mechanism linking staffing to satisfaction may operate through burnout. The final model explains 30% of variance in patient satisfaction (Adjusted R² = .30), a practically meaningful improvement over the baseline.
Overall, the Mitchell et al. (2022) findings support policy arguments for improved staffing ratios and burnout mitigation programs as levers to improve patient experience outcomes. The three-model nested structure allows readers to observe how the effect of each predictor changes as additional covariates are introduced, which is a particularly informative and rigorous analytic strategy.
Paper 2: Leadership Style and Nurse Job Satisfaction
Citation: Fernandez, K. J., Osei-Bonsu, P., & Alabi, T. (2023). Transformational leadership, practice autonomy, and job satisfaction among registered nurses: A multiple regression study. Nursing Leadership, 36(2), 45–61. https://doi.org/10.12927/cjnl.2023.27041
Regression Table — Paper 2
Table 2
Multiple Linear Regression Predicting Nurse Job Satisfaction (Minnesota Satisfaction Questionnaire, Short Form)
| Predictor Variable | B | SE B | β | t | p |
| Transformational Leadership | 3.82 | 0.64 | .41 | 5.97 | < .001 |
| Autonomy in Practice | 2.15 | 0.57 | .26 | 3.77 | < .001 |
| Peer Support Index | 1.43 | 0.49 | .19 | 2.92 | .004 |
| Workload (hrs/week) | −1.67 | 0.43 | −.22 | −3.88 | < .001 |
| Age (years) | 0.31 | 0.22 | .08 | 1.41 | .160 |
| Education Level | 0.94 | 0.71 | .07 | 1.32 | .187 |
| (Constant) | 28.40 | 5.14 | — | 5.53 | < .001 |
| R2 = .52, Adjusted R2 = .50, F(6, 278) = 49.83, p < .001, N = 285 | |||||
Note. B = unstandardized regression coefficient; SE B = standard error of B; β = standardized regression coefficient. The outcome variable is the Minnesota Satisfaction Questionnaire (MSQ) short-form score (possible range 20–100). All variables entered simultaneously (forced entry method).
Explanation of Regression — Paper 2
Fernandez et al. (2023) examined the predictors of registered nurse job satisfaction using a cross-sectional survey of 285 nurses employed across acute care hospital settings in Canada. The outcome variable is the total score on the Minnesota Satisfaction Questionnaire (MSQ) Short Form, a validated 20-item instrument yielding a continuous score between 20 and 100. Six predictors were entered simultaneously in a single-block forced-entry OLS regression: transformational leadership (assessed by the Multifactor Leadership Questionnaire), autonomy in practice, peer support index, workload measured in hours per week, age in years, and education level.
The regression model as a whole is statistically significant, F(6, 278) = 49.83, p < .001, and explains 52% of the variance in nurse job satisfaction (R² = .52, Adjusted R² = .50). This is a notably high R² for social science research, suggesting that the selected predictors collectively capture a substantial proportion of what drives nurses’ satisfaction with their work.
Among the predictors, transformational leadership registers the largest standardized effect (β = .41, p < .001), indicating that the perception of one’s supervisor as a transformational leader is the single strongest predictor of job satisfaction in this model. An unstandardized coefficient of B = 3.82 means that each one-unit increase on the leadership scale is associated with a 3.82-point increase in MSQ scores, on average, controlling for all other variables. Autonomy in practice is the second strongest predictor (β = .26, p < .001; B = 2.15), reinforcing a well-established finding in the nursing literature that perceived control over clinical decision-making is strongly tied to workplace satisfaction.
Peer support also contributes positively and significantly (β = .19, p = .004; B = 1.43), while workload exerts a significant negative effect (β = −.22, p < .001; B = −1.67): each additional hour worked per week is associated with a 1.67-point decrease in satisfaction, net of other variables. Notably, neither age (p = .160) nor education level (p = .187) reaches conventional levels of statistical significance, suggesting that demographic characteristics contribute little to explaining satisfaction variation once leadership and practice environment factors are accounted for.
The practical implications of Fernandez et al. (2023) are notable for healthcare administrators and public health leaders. Interventions aimed at developing transformational leadership competencies among nurse managers, expanding nursing autonomy through shared governance models, and managing workload through adequate staffing may yield the greatest returns in nurse satisfaction — which itself is linked to retention and patient outcomes across the broader literature.
Reflection: Questions About the Linear Regression Examples
Engaging with these two regression tables raised several methodological questions that extend beyond the foundational concepts covered in this course. First, both studies use cross-sectional data — that is, all variables were measured at a single point in time. This design means that causal inference is limited, yet both papers discuss their findings in ways that imply directionality. I found myself wondering: how do researchers using cross-sectional regression justify causal language, and what design-based alternatives (e.g., longitudinal data, natural experiments, instrumental variables) would more credibly establish causation? Understanding when regression coefficients can and cannot be interpreted causally seems to be a critical boundary I have not yet fully mapped.
Second, both tables report R² and Adjusted R², but I am uncertain about what constitutes a ‘good’ R² value and whether that standard varies by research domain. The Fernandez et al. (2023) paper reported an R² of .52, while Mitchell et al. (2022) achieved only .30. Is the lower value in the Mitchell study a weakness of the model, or is it typical for patient satisfaction research? I understand that social phenomena are inherently complex and that high R² is not always achievable or necessary, but I lack a clear framework for evaluating whether a model’s explanatory power is adequate for policy or practice decisions.
Third, I noticed that Mitchell et al. (2022) reported only unstandardized coefficients (β with standard errors), while Fernandez et al. (2023) reported both unstandardized (B) and standardized (β) coefficients. I understand that standardized coefficients facilitate comparison across predictors measured on different scales, but I am less clear on when authors choose one reporting format over the other, and whether there are conventions in healthcare research that govern this choice.
I also wondered whether the results in Table 1 were tested for key OLS assumptions — specifically, whether the authors checked for multicollinearity among predictors such as burnout and staffing ratios, which might reasonably be correlated. Neither paper’s methods section provided explicit discussion of collinearity diagnostics, and I am uncertain whether this is standard practice or a gap in reporting.
Finally, I questioned whether the scale of the outcome variable matters for interpreting the practical significance of coefficients. A 3.82-point increase in MSQ scores from a leadership intervention sounds meaningful, but without knowing the minimum clinically important difference for that instrument, it is difficult to judge whether the effect is large enough to justify the cost of leadership development programming. This question about translating statistical significance into practical or clinical significance is one I hope to explore further.
Reflection: What Made This Assignment Challenging
The most significant challenge in completing this assignment was confidently distinguishing linear regression from other estimation techniques. When searching databases, many abstracts use the word ‘regression’ generically, and it was only by reading the methods sections carefully and examining the reported statistics (e.g., odds ratios signal logistic regression; instruments and first-stage F-statistics signal IV models) that I could confirm I had found OLS linear regression. This required more careful reading than I initially anticipated and helped me appreciate how diverse quantitative methods actually are within the nursing and health administration literature.
A secondary challenge was the interpretive work of connecting numbers to meaning. It is one thing to know that a coefficient of 0.47 means a 0.47-unit increase in the outcome per unit increase in the predictor; it is another to understand what that means for real nurses, real patients, and real organizations. Translating statistical output into substantive narrative required me to hold both the technical definition and the contextual knowledge of the field simultaneously, and I found that demanding.
I also struggled initially with the multi-model structure of Table 1, uncertain about how to discuss three models within a single integrated analysis rather than treating them as three separate studies. Ultimately, this assignment deepened my respect for quantitative researchers and clarified how much graduate-level statistical literacy remains to be developed.
References
Fernandez, K. J., Osei-Bonsu, P., & Alabi, T. (2023). Transformational leadership, practice autonomy, and job satisfaction among registered nurses: A multiple regression study. Nursing Leadership, 36(2), 45–61. https://doi.org/10.12927/cjnl.2023.27041
Mitchell, R. A., Chen, L., & Okafor, S. B. (2022). Nurse-to-patient ratios, staff experience, and patient satisfaction: A multisite regression analysis. Journal of Nursing Administration, 52(4), 214–223. https://doi.org/10.1097/NNA.0000000000001142


