🌱research.literature.review

paste to any AI agent

view raw
# Usage: curl -sSL https://seed.show/research.literature.review | bash -s <install-path>
# <install-path> is the directory where the file should land.

set -euo pipefail
[ -z "${1:-}" ] && {
  echo "install requires a path: curl -sSL https://seed.show/research.literature.review | bash -s <install-path>" >&2
  exit 1
}
TARGET="$1"
mkdir -p "$TARGET"
DEST="$TARGET/seed-fold.3gxHZO.folded.md"

cat > "$DEST" <<'PORTDOWN_6C3762FE'
<!--
This is a `.folded.md` archive — a directory packed into one markdown
file. The markers below are load-bearing; don't edit them directly.

To unpack (requires bash — if you have no shell, tell the user):
  1. curl -sSLf https://fold.dom.vin/skill | bash -s <INSTALL_DIR>
  2. <INSTALL_DIR>/fold/scripts/unfold <this-file>
     (or: unfold <this-file>  if fold/scripts is on your PATH)
-->

---
fold: true
marker: e7edcd
at: 2026-05-07T16:16:23Z
root: seed-pack.hQIKia
---

<!--fold:e7edcd@file path="README.md" mode="644"-->
# research.literature.review

Literature review and research synthesis context for agents. The difference between finding papers and constructing a rigorous argument from evidence — and why that gap is where most AI-assisted literature review fails.

## Before you start: what this context cannot do for you

**AI agents cannot reliably search bibliographic databases.** PubMed, Embase, Cochrane CENTRAL, PsycINFO, and Web of Science all require human-operated search interfaces or authenticated institutional API access. An agent that claims to have searched these databases has almost certainly hallucinated results, returned an incomplete subset, or silently failed. The search stage of a systematic review must be executed by a human using actual database interfaces, with the resulting records exported and handed to the agent.

What agents can support: protocol drafting, search string construction (for human execution), title/abstract screening of exported records, data extraction from full texts, synthesis drafting, and PRISMA documentation. The search itself is not in this list.

---

## Mental model: a literature review is a structured argument

A literature review is not a citation collection. It is a **map of the state of evidence** on a question — what is known, to what degree of certainty, under what conditions, with what gaps. The output is a structured argument about what the evidence, taken together, supports or does not support.

This distinction changes what the agent is optimizing for. Citation collection optimizes for coverage: find everything plausible. Evidence mapping optimizes for epistemic validity: determine what can be concluded and with what confidence. These produce different work.

**The protocol is the pre-commitment that protects against confirmation bias.** Pre-specifying the research question, databases, inclusion/exclusion criteria, and quality assessment approach before reading any papers removes the degree of freedom that allows unconscious cherry-picking. A protocol written after the papers are read is not a protocol — it is a post-hoc rationalization. This is why PROSPERO pre-registration exists and why journals increasingly require it.

The three properties that make a literature review credible:

**Reproducibility.** A second researcher following the same protocol — same databases, same search terms, same inclusion/exclusion criteria — would arrive at the same corpus. This is the demarcation between a systematic review and an opinion piece with footnotes.

**Transparency.** Every decision that shaped the corpus is documented: why certain databases were searched, why certain studies were excluded, what quality thresholds were applied. A reader can evaluate the methodology, not just the conclusions.

**Explicitness about uncertainty.** The review distinguishes between what the evidence establishes (high-certainty findings replicated across multiple high-quality studies), what it suggests (weaker or more limited evidence), and what it fails to address (gaps, unexplored populations, methodological limitations). Conclusions that outrun the evidence are a research integrity problem.

---

## Review type spectrum

**Narrative review:** An expert synthesizes the literature based on their knowledge and judgment. The search strategy is implicit. Inclusion criteria are not pre-specified. The author exercises discretion about which studies to discuss. Strengths: readable, can integrate broad context, useful for complex or emergent topics. Weaknesses: susceptible to confirmation bias, not reproducible, cannot reliably estimate the magnitude of effects.

**Scoping review:** Maps the extent of a literature — what has been studied, what study designs exist, what gaps are present — without synthesizing effect estimates or applying quality assessment. Uses PRISMA-ScR reporting. Appropriate when the field is too heterogeneous or emergent for a systematic review, or when the goal is to identify what a systematic review would need to address.

**Systematic review:** A structured protocol governs every step — research question, search strategy, database selection, inclusion/exclusion criteria, quality assessment, synthesis. The protocol is pre-registered before any papers are read. Strengths: reproducible, transparent, minimizes selection bias, the gold standard for policy and clinical decisions. Weaknesses: resource-intensive, can become outdated quickly, rigid protocol may miss contextually important literature.

**Meta-analysis:** A statistical synthesis layer applied to a systematic review when included studies share the same population, intervention, and outcome measured in compatible formats. Produces a pooled effect estimate. Not always possible or appropriate — clinical and methodological heterogeneity must be low enough to justify pooling.

**Umbrella review / overview of reviews:** Synthesizes findings across multiple systematic reviews on the same topic. Uses AMSTAR 2 to assess the quality of included reviews. Appropriate when many systematic reviews already exist and the question is what the review-level evidence shows in aggregate.

**Why it matters for credibility:** Systematic reviews carry substantially more evidential weight in scientific and policy contexts because their conclusions are traceable to a method. A narrative review that reaches the same conclusion cannot be distinguished from one that cherry-picked supporting evidence. If the context requires defensible conclusions — clinical guidelines, regulatory submissions, meta-analytic inputs — the methodology must be systematic.

---

## What agents get wrong

**Confusing narrative review with systematic review.** Summarizing papers found via a web search is a narrative review, not a systematic review, regardless of how many papers are cited. Systematic reviews require pre-specified protocols, documented search strategies across multiple databases, dual screening, and quality assessment. Using systematic-sounding language for what is actually an unsystematic search misleads about the evidential basis of the conclusions.

**Missing gray literature.** Published literature skews toward positive results. A search restricted to indexed databases misses trial registries (ClinicalTrials.gov, WHO ICTRP), regulatory documents, conference proceedings, dissertations, and unpublished studies. Systematic reviews addressing this gap show consistently larger effect sizes in published literature than in combined published + unpublished evidence — the gap is the publication bias signal. Gray literature searching is not optional in a rigorous systematic review.

**Skipping PROSPERO pre-registration.** Pre-registration at PROSPERO (health reviews) or OSF (social science, education) creates a timestamped record that the protocol preceded the data. Without it, a reader cannot determine whether the question was specified before or after the results were visible. Journals in medicine and public health increasingly require a PROSPERO number for submission; reviewers treat unregistered systematic reviews as lower-evidential-quality.

**Treating all studies as equal weight.** A case report and a well-powered randomized controlled trial both exist in the literature. Counting sources rather than weighting them by study design, sample size, and methodological quality produces conclusions that misrepresent the evidential landscape. Evidence quality hierarchies (GRADE, Oxford CEBM) exist precisely to prevent this.

**Confusing GRADE certainty with study quality.** Study quality (risk of bias, assessed per study using RoB 2 or ROBINS-I) and GRADE certainty of evidence (assessed per outcome across the body of evidence) are different constructs. A body of evidence from methodologically sound studies can still receive a "Low" GRADE certainty rating if the evidence is highly inconsistent or indirect. A single high-quality study does not constitute high-certainty evidence. GRADE is applied to the outcome-level body of evidence, not to individual studies.

**Cherry-picking supporting evidence.** Confirmation bias in literature review means surfacing studies that support a hypothesis while underweighting or omitting contradictory evidence. A rigorous synthesis requires active search for disconfirming evidence, explicit reconciliation of conflicting findings, and documentation of the reconciliation process.

**Failing to distinguish primary from secondary literature.** A primary study reports original data. A secondary source (review, meta-analysis, commentary) synthesizes or interprets primary studies. Citing a review article as evidence for a claim — when the review itself may have methodological problems — stacks uncertainty on uncertainty. Primary sources are the evidentiary base; secondary sources are signposts to them, not substitutes.

**Conflating statistical significance with clinical or practical significance.** A study with 10,000 participants may find a statistically significant effect that is too small to matter in practice. P-values are a function of sample size, not effect size. Reporting "studies show X works" without attending to effect sizes, confidence intervals, and absolute vs. relative risk misrepresents what the evidence establishes.

**Skipping the quality assessment layer.** Every included study must be assessed for risk of bias using a validated, design-appropriate tool (RoB 2 for RCTs, ROBINS-I for observational studies, AMSTAR 2 for systematic reviews). Synthesizing findings from high-risk-of-bias studies alongside low-risk studies without flagging the difference produces conclusions that cannot be trusted and cannot be reproduced.

---

## What AI is changing

AI is actively reshaping the labor profile of literature review — compressing some stages, leaving others unchanged.

**Abstract screening.** Rayyan and Covidence use ML to rank abstracts by predicted relevance, substantially reducing the volume of records that need human attention. Human screening is still required for borderline cases and for validating that the tool's decisions match the eligibility criteria. The tool reduces the queue; it does not replace the judgment.

**Data extraction.** LLMs can extract structured data from full-text articles — sample size, interventions, outcome measures, effect sizes, confidence intervals — at scale. Accuracy is improving but requires spot-check validation against source documents. The extraction template still must be human-designed against the protocol.

**Synthesis assistance.** Elicit queries the literature and summarizes claims from individual papers against a research question; it is best used for exploratory scoping, not as a substitute for a systematic search. Consensus surfaces consensus-level claims across papers with citation support. Semantic Scholar provides citation graphs and author disambiguation useful for snowball searching. None of these tools enforce a synthesis framework, assess study quality, or distinguish evidence strength — they assist retrieval and orientation, not judgment.

**Protocol drafting.** LLMs can draft PICO/SPIDER research questions, eligibility criteria, search string skeletons (for human refinement and execution), and PRISMA documentation. This is a genuine productivity gain at the protocol development stage.

**What stays human.** Protocol design and pre-registration (the pre-commitment that protects the review's integrity). Judgment calls on inclusion/exclusion borderline cases. Risk of bias assessment using RoB 2, ROBINS-I, or AMSTAR 2 (requires judgment that is domain-specific and context-sensitive). GRADE certainty assessment (requires evaluating inconsistency, indirectness, and imprecision across the body of evidence). Conclusions drawn from uncertain or conflicting evidence. These are not automation targets — they are where the epistemology lives.

The agent's role is to support the systematic methodology, not to substitute for it. Finding papers is partially solved. Constructing a rigorous synthesis argument from them is not.
<!--fold:e7edcd@file path="methodology.md" mode="644"-->
# methodology

The systematic review process as a decision framework. Eight sequential stages, each with a defined output that gates the next. A protocol that skips or shortcuts any stage produces a review with traceable integrity problems at the corresponding stage.

---

## 1. Research question formulation

**What it is.** Before searching anything, the review question must be specified in a structured format that makes the inclusion/exclusion criteria derivable. A vague question produces a vague corpus and makes pre-registration meaningless.

**PICO (for intervention questions):**
- **P**opulation — who is the population of interest? (age, condition, setting, geography)
- **I**ntervention — what is being done, tested, or applied?
- **C**omparison — compared to what? (placebo, alternative intervention, no treatment)
- **O**utcome — what is being measured, and over what timeframe?

**SPIDER (for qualitative/experience questions):**
- **S**ample — who are the participants?
- **P**henomenon of **I**nterest — what experience, behavior, or perspective?
- **D**esign — what study designs are eligible?
- **E**valuation — what outcomes or themes?
- **R**esearch type — qualitative, quantitative, or mixed?

**PECO (for exposure/etiology questions):** Population, Exposure, Comparator, Outcome. Used when studying the effect of an exposure (environmental, occupational, behavioral) rather than a deliberate intervention.

**Decision criterion:** If the research question cannot be translated into PICO/SPIDER/PECO without ambiguity, the question is too vague to support a reproducible review. Respecify before proceeding.

**Output:** A written research question in structured format, with each element explicitly defined. This becomes the anchor every subsequent inclusion/exclusion decision is tested against.

---

## 2. Protocol development and pre-registration

**What it is.** The full review method is documented *before* any papers are read. Pre-registration prevents question-drifting once results are visible — a form of confirmation bias at the protocol level that is otherwise undetectable.

**Protocol specifies:**
- Research question (PICO/SPIDER/PECO)
- Eligibility criteria: inclusion and exclusion, stated explicitly enough that two reviewers applying them independently reach the same decisions
- Databases to search (named specifically, not "major databases")
- Search term strategy (including synonyms, abbreviations, and controlled vocabulary terms)
- Date range and language restrictions (with justification for any restriction)
- Study design eligibility (and which designs are excluded, with rationale)
- Primary and secondary outcomes (distinguish these before seeing data — changing from secondary to primary post-hoc is outcome-switching)
- Data extraction fields
- Quality assessment tool to be used (matched to study designs eligible)
- Synthesis approach (meta-analytic or narrative, and the criterion for choosing between them)
- Handling of duplicate publications and companion papers from the same cohort

**Register at:** PROSPERO (https://www.crd.york.ac.uk/prospero/) for health and health-related reviews; OSF (https://osf.io) for social science, education, and psychology. PROSPERO requires the search to not yet have begun. Registration creates a timestamped, public record that the protocol preceded data extraction.

**Decision criterion:** Any deviation from the registered protocol during the review must be documented in the methods section as a protocol amendment, with the date of the change and the reason. Undocumented deviations are integrity problems.

**Output:** A registered protocol with a PROSPERO or OSF ID. If pre-registration is not possible (retrospective review), document why and note this as a limitation.

---

## 3. Search strategy design

**What it is.** A comprehensive, documented search strategy executed across multiple databases. The goal is high sensitivity (find everything relevant) balanced against specificity (not drowning in irrelevant results). An information specialist or medical librarian should design or peer-review the strategy for high-stakes reviews.

**Database selection principles:**
- Health/clinical: MEDLINE (PubMed), Embase, Cochrane CENTRAL, CINAHL
- Mental health and psychology: PsycINFO, PsycARTICLES
- Social science and education: ERIC, Sociological Abstracts, Social Sciences Citation Index
- Interdisciplinary: Web of Science, Scopus, Semantic Scholar
- Grey literature (required, not optional): trial registries (ClinicalTrials.gov, WHO ICTRP), regulatory databases, dissertation repositories (ProQuest, EThOS), conference proceedings, Google Scholar (for grey literature scouting, not primary searching)

**Search term construction:**
- Use controlled vocabulary (MeSH for MEDLINE, Emtree for Embase) AND free-text terms to capture both indexed and unindexed records
- Construct concept blocks for each PICO/SPIDER element, combine blocks with Boolean AND; within each block, combine synonyms and variants with Boolean OR
- Include spelling variants (US/UK), abbreviations, and obsolete terminology
- Use database-specific syntax: truncation (`*` in most databases), wildcards (`?` or `#`), phrase searching (`"..."`)
- Apply filters for study design, date range, language only if specified in the protocol
- Have a second person peer-review the search strategy before execution (Peer Review of Electronic Search Strategies — PRESS checklist)

**Document for each database searched:** date of search, database name and platform, complete search string exactly as run, number of results returned, any filters applied.

**Supplementary search methods:** Snowball search (check reference lists of included studies); forward citation search (find papers that cite included studies, using Semantic Scholar or Google Scholar); contact corresponding authors of included studies for unpublished data or linked studies.

**Decision criterion:** A search strategy that covers fewer than three indexed databases plus at least one grey literature source is insufficient for a systematic review. Document the rationale for every database included and excluded.

**Output:** A documented search log — date, database, search string, number of results — for every source searched. This log is a required PRISMA reporting item.

---

## 4. Screening

**What it is.** A two-stage process to reduce the full search result to the included study set, applying the eligibility criteria defined in the protocol. All decisions at this stage are governed by pre-specified criteria, not judgment about the study's quality or conclusions.

**Stage 1 — Title and abstract screening:**
- Deduplicate records before screening (Rayyan, Covidence, or Zotero)
- Screen each record against inclusion/exclusion criteria using only the title and abstract
- Decision rule: when in doubt, retain — err toward inclusion at this stage; false exclusions cannot be recovered; false inclusions are caught at Stage 2
- Dual screening (two independent reviewers) is the standard; for large-scale reviews, a validated subset approach (dual-screen 20%, resolve disagreements, single-screen the remainder if kappa ≥ 0.8) is acceptable if justified in the protocol
- AI screening tools (Rayyan ML ranking, Covidence) can reduce queue volume by prioritizing likely-relevant records; human judgment is still required for borderline cases and for any record the tool deprioritizes that a reviewer would have retained

**Stage 2 — Full-text screening:**
- Retrieve full text for all records passing Stage 1; document any records where full text could not be retrieved (contact authors; note as limitation if unresolved)
- Assess each full text against all eligibility criteria
- Record the reason for exclusion for every excluded full text using a standardized exclusion code (e.g., wrong population, wrong intervention, wrong outcome, wrong study design, not a primary study, duplicate publication)
- Dual screening; measure inter-rater reliability with Cohen's kappa or percentage agreement; resolve disagreements by discussion, and by third reviewer arbitration if unresolved
- Kappa interpretation: < 0.4 poor, 0.4–0.6 moderate, 0.6–0.8 substantial, > 0.8 near-perfect. Kappa < 0.6 indicates the eligibility criteria need clarification before proceeding.

**Decision criterion:** Every exclusion at the full-text stage requires a documented reason. "Not relevant" is not an acceptable exclusion code — it must map to a specific criterion from the protocol.

**Output:** A PRISMA flow diagram tracking records at each stage: identified → deduplicated → screened (title/abstract) → excluded (title/abstract) → full-text assessed → excluded (with reasons by category) → included in synthesis. An inter-rater reliability statistic for each screening stage.

---

## 5. Data extraction

**What it is.** Structured extraction of information from each included study into a pre-defined template. Extraction happens once per study, into a format designed for the planned synthesis.

**Standard extraction fields:**
- Study identifiers: author(s), year, title, journal, country, funding source (note industry funding — a risk of bias consideration), trial registration number
- Population: sample size (total, per arm), demographics (age, sex, condition severity), setting, recruitment method, inclusion/exclusion criteria used by the study
- Intervention/exposure: description, dose, duration, frequency, delivery mode, comparator description
- Outcomes: which outcomes measured (primary, secondary), measurement tools and instruments, timepoints of measurement
- Results: effect sizes, confidence intervals, p-values, event rates (for dichotomous outcomes: report both absolute and relative measures)
- Study design: RCT, cluster RCT, quasi-RCT, prospective cohort, retrospective cohort, case-control, cross-sectional, before-after
- Risk of bias fields: populated in the next stage using the appropriate tool

**Quality controls:**
- Dual extraction for all quantitative results fields (effect sizes, CIs, sample sizes); discrepancies resolved by consensus or source document check
- Single extraction with verification acceptable for descriptive fields (study identifiers, design description)
- Contact corresponding authors for missing data before concluding data is unavailable; allow 4–6 weeks for response; document attempts and outcomes
- Record every instance of missing or imputed data, and the reason

**Handling multiple publications from the same study:** Companion papers and follow-up reports from the same cohort or trial are linked and treated as one study to avoid double-counting. Extract data from the publication most relevant to each outcome.

**Output:** A completed extraction table for all included studies, structured to feed directly into the synthesis stage. A separate log of data queries sent to authors and their resolution.

---

## 6. Quality assessment (risk of bias)

**What it is.** Each included study is assessed for risk of bias using a validated, design-appropriate tool. Risk of bias assessment determines the weight and confidence assigned to each study's findings in the synthesis — and feeds directly into the GRADE certainty assessment.

**Tool selection by study design:**

| Design | Tool | Domains assessed |
|---|---|---|
| Randomized controlled trials | RoB 2 (Cochrane) | Randomization process; deviations from intended interventions; missing outcome data; measurement of the outcome; selection of the reported result |
| Non-randomized / observational studies | ROBINS-I | Confounding; participant selection; classification of interventions; deviations from intended interventions; missing data; measurement of outcomes; selection of the reported result |
| Diagnostic test accuracy studies | QUADAS-2 | Patient selection; index test; reference standard; flow and timing |
| Systematic reviews (for umbrella reviews) | AMSTAR 2 | Protocol registration; search comprehensiveness; study design justification; risk of bias assessment; appropriateness of meta-analysis; publication bias; source of funding |
| Qualitative studies | CASP Qualitative Checklist | Research design clarity; recruitment strategy; data collection; reflexivity; ethics; rigor of analysis; value of research |

**RoB 2 decision rules (for RCTs):**
- Each domain receives a judgment: Low, Some concerns, High risk of bias
- The overall study judgment is the most severe domain judgment that is not "Some concerns" across all domains — a single "High" domain produces a "High" overall judgment
- A study with "Some concerns" in multiple domains may be upgraded to "High" overall if the concerns are likely to substantially bias the result

**ROBINS-I decision rules (for observational studies):**
- Overall judgment follows the same logic as RoB 2; additionally, a "Critical" rating is possible for ROBINS-I, indicating the study is so susceptible to confounding that it provides no usable evidence
- Confounding (Domain 1) requires identifying and assessing whether all important confounders were measured and controlled; this is the most consequential domain for observational evidence

**GRADE evidence profiling** (applied to the body of evidence per outcome, not to individual studies):
- Start at: High for RCT evidence, Low for observational evidence (the starting point reflects study design, not methodological quality)
- Downgrade one level for each of: serious risk of bias (majority of evidence from high-risk studies), serious inconsistency (I² > 50% with no explanation), serious indirectness (population, intervention, or outcome differs from the review question), serious imprecision (wide CIs that cross the minimal important difference or cross 1.0 for ratio measures; fewer than 400 events for dichotomous outcomes as a rule of thumb), suspected publication bias (strong funnel plot asymmetry or evidence that unpublished trials exist)
- Downgrade two levels for very serious concerns in any domain
- Upgrade one level for: large effect (RR > 2 or < 0.5 with no major confounding), very large effect (RR > 5 or < 0.2), dose-response gradient present, all plausible residual confounding would reduce the effect (for observational evidence only)
- Final rating: High (further research very unlikely to change the estimate), Moderate (further research likely to have important impact), Low (further research very likely to change the estimate), Very Low (very uncertain about the estimate)

**Critical distinction — study quality vs. GRADE certainty:** A body of evidence from methodologically sound studies can still receive a Low or Very Low GRADE rating if the evidence is inconsistent, indirect, or imprecise. GRADE is a judgment about the entire body of evidence for an outcome, not a summary of individual study quality scores.

**Output:** A completed risk of bias table for each included study, using the domain-level judgments from the appropriate tool, with a brief justification for each domain rating. A GRADE evidence profile table for each pre-specified outcome, with the downgrade/upgrade decisions and justifications documented.

---

## 7. Synthesis

**What it is.** The formal combination of findings across included studies. The synthesis approach — quantitative or narrative — must be pre-specified in the protocol, and the decision criterion for choosing between them must be stated.

**Decision criterion for meta-analysis:** Pool quantitatively when studies share the same population (as defined by PICO), the same intervention, and the same outcome measured at the same timepoint, and results are reported in compatible formats. Do not pool if clinical or methodological heterogeneity is high, even if statistical heterogeneity is low — I² reflects statistical variation, not clinical meaningfulness.

**Meta-analysis (quantitative synthesis):**
- Effect measure selection: odds ratio or risk ratio for dichotomous outcomes; mean difference (MD) for continuous outcomes measured on the same scale; standardized mean difference (SMD) for continuous outcomes measured on different scales; hazard ratio for time-to-event outcomes
- Model selection: fixed-effect model assumes a single true effect size (appropriate when studies are close replications); random-effects model assumes a distribution of true effects (appropriate when populations, interventions, or settings vary — the more common choice in practice, but does not solve the heterogeneity problem, only models it)
- Statistical heterogeneity: I² < 25% low, 25–50% moderate, 50–75% substantial, > 75% considerable; Tau² (between-study variance) provides a scale-specific measure of heterogeneity; Chi² test for heterogeneity (note: underpowered with few studies)
- When I² > 50%: investigate sources before pooling; pre-specified subgroup analyses (by population characteristic, study design, intervention intensity, risk of bias) may explain heterogeneity; if sources cannot be explained, narrative synthesis is more appropriate than a pooled estimate
- Sensitivity analyses (pre-specified): re-run excluding high-risk-of-bias studies; if the pooled estimate changes substantially, the overall finding depends on low-quality evidence
- Publication bias: construct a funnel plot when ≥ 10 studies; test for asymmetry with Egger's test (continuous outcomes) or Begg's test; trim-and-fill analysis to estimate the adjusted effect if asymmetry is present
- Report with a forest plot; the pooled estimate should include the prediction interval (the range within which 95% of true effects are expected to fall), not just the confidence interval

**Narrative synthesis (qualitative synthesis):**
- Appropriate when: studies are too heterogeneous to pool; outcomes are qualitative; insufficient studies for meaningful meta-analysis (< 3–5 studies is generally insufficient)
- Narrative synthesis requires structure — it is not a prose description of individual studies. Use a recognized framework:
  - **Thematic synthesis** (Thomas & Harden): for synthesizing qualitative findings; codes themes across studies; develops analytical themes that go beyond individual study findings
  - **Framework synthesis**: applies a pre-existing theoretical or conceptual framework to organize findings; appropriate when a relevant framework exists and the goal is to test or populate it
  - **Realist synthesis**: explanatory — addresses what works, for whom, in what circumstances, and why; traces causal mechanisms rather than pooling effects
- Regardless of approach: tabulate study characteristics and findings; describe direction of evidence (consistent benefit, mixed, consistent null); note consistency or conflict across studies; explain what drives heterogeneity when identifiable

**Output:** A synthesis table (narrative) or a forest plot with pooled estimates plus a prediction interval (meta-analytic), plus a GRADE Summary of Findings (SoF) table for each pre-specified outcome showing: study design, number of participants, number of studies, effect estimate with CI, and GRADE certainty rating.

---

## 8. PRISMA reporting

**What it is.** The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) is the reporting standard for systematic reviews. A completed PRISMA checklist and flow diagram are required for publication in most journals and for any review intended to inform guidelines or policy decisions. The checklist is at https://www.prisma-statement.org/prisma2020.

**PRISMA 2020 checklist — what each section must contain:**

*Title:* Identify the report as a systematic review, meta-analysis, or both.

*Abstract:* Structured — objectives, eligibility criteria, information sources, risk of bias assessment methods, synthesis methods, results (number of studies, participants, main findings), limitations, conclusions, registration number.

*Introduction:* Rationale (why this review is needed, what is already known); objectives (the PICO/SPIDER question explicitly stated).

*Methods — information sources and search:* All databases searched with dates; all supplementary methods (trial registries, grey literature, snowballing, author contact); the full search strategy for at least one database (the primary database, usually MEDLINE).

*Methods — selection and data collection:* Eligibility criteria stated; screening process (who screened, how disagreements resolved, what AI tools used if any); data extraction process (who extracted, dual or single, how validated).

*Methods — synthesis:* Pre-specified synthesis approach and the decision criterion for choosing it; heterogeneity assessment approach; any subgroup or sensitivity analyses, pre-specified in the protocol.

*Results — study selection:* PRISMA flow diagram with record counts at every stage; reasons for exclusion at full-text stage, with counts by reason.

*Results — study characteristics and risk of bias:* Summary table of included study characteristics; risk of bias judgments per study with domain-level detail.

*Results — synthesis:* For meta-analyses: forest plots, heterogeneity statistics (I², Tau², Chi²), sensitivity analyses; for narrative: synthesis tables with direction and consistency of evidence. For all: GRADE Summary of Findings table per outcome.

*Discussion:* Interpretation of results in light of evidence certainty; limitations (search limitations, methodological limitations of included studies, review-level limitations); conclusions that do not outrun the GRADE certainty ratings.

**PRISMA flow diagram stages (PRISMA 2020):**
1. Records identified through database searching (per database)
2. Records identified through other methods (grey literature, snowballing)
3. Records after duplicates removed
4. Records screened (title/abstract)
5. Records excluded (title/abstract)
6. Full-text articles assessed for eligibility
7. Full-text articles excluded, with reasons (and counts per reason)
8. Studies included in synthesis (and, if applicable, studies included in meta-analysis as a subset)

**Extensions for specialized review types:**
- PRISMA-P: for protocols (report before data extraction; required for PROSPERO registration)
- PRISMA-ScR: for scoping reviews — no quality assessment, broader evidence mapping, different flow diagram
- PRISMA-NMA: for network meta-analyses (comparative effectiveness across multiple interventions)
- PRISMA-IPD: for individual participant data meta-analyses
- MOOSE: for meta-analyses of observational studies in epidemiology

**Decision criterion for reporting:** If a PRISMA checklist item cannot be completed, the review has a reportable gap. Either the gap must be justified (with the justification in the methods section) or the corresponding work must be done before the review is submitted or cited.

**Output:** A completed PRISMA 2020 checklist with page number references to where each item is reported. A PRISMA flow diagram as a figure, with accurate counts at every node.
<!--fold:e7edcd@file path="sources.md" mode="644"-->
# sources

Fetch these at task time. Ordered by importance.

## Systematic review methodology

1. Cochrane Handbook for Systematic Reviews of Interventions — the authoritative guide to conducting systematic reviews; covers protocol development, searching, screening, extraction, risk of bias assessment, and meta-analysis:
   https://training.cochrane.org/handbook

2. PRISMA 2020 Statement — Preferred Reporting Items for Systematic Reviews and Meta-Analyses; the current reporting standard for systematic reviews, with the updated flow diagram and checklist:
   https://www.prisma-statement.org/prisma2020

3. PRISMA-P 2015 — PRISMA for Protocols; pre-registration checklist for systematic review protocols before data collection begins:
   https://www.prisma-statement.org/extensions/protocols

4. PRESS 2015 Guideline — Peer Review of Electronic Search Strategies; checklist for having a second expert review a search strategy before execution:
   https://www.cadth.ca/resources/finding-evidence/press

## Evidence quality frameworks

5. GRADE Handbook — Grading of Recommendations Assessment, Development and Evaluation; the standard framework for rating certainty of evidence (high/moderate/low/very low) per outcome across a body of evidence:
   https://gdt.gradepro.org/app/handbook/handbook.html

6. GRADE Working Group — overview papers and applied examples of the GRADE approach, including the SoF table format:
   https://www.gradeworkinggroup.org

7. Oxford Centre for Evidence-Based Medicine (CEBM) Levels of Evidence — the Oxford hierarchy for study design (systematic reviews → RCTs → cohort → case-control → case series):
   https://www.cebm.ox.ac.uk/resources/levels-of-evidence/oxford-centre-for-evidence-based-medicine-levels-of-evidence-march-2009

## Risk of bias tools

8. RoB 2 — Revised Cochrane risk-of-bias tool for randomized trials; the standard for assessing bias in RCTs across five domains with domain-level signaling questions:
   https://www.riskofbias.info/welcome/rob-2-0-tool

9. ROBINS-I — Risk Of Bias In Non-randomized Studies of Interventions; for observational studies and quasi-experimental designs across seven domains including confounding:
   https://www.riskofbias.info/welcome/robins-i-tool

10. AMSTAR 2 — Assessing the Methodological Quality of Systematic Reviews; for quality assessment of existing systematic reviews included in an overview or umbrella review:
    https://amstar.ca/Amstar_Checklist.php

11. QUADAS-2 — Quality Assessment of Diagnostic Accuracy Studies; for diagnostic test accuracy reviews:
    https://www.bristol.ac.uk/population-health-sciences/projects/quadas/quadas-2/

12. CASP Checklists — Critical Appraisal Skills Programme; tools for qualitative studies, cohort studies, case-control studies, and systematic reviews:
    https://casp-uk.net/casp-tools-checklists/

## Research synthesis standards

13. Campbell Collaboration Methods — systematic review methods adapted for social science, education, crime, and development; includes plain-language guides and protocol templates:
    https://www.campbellcollaboration.org/research-resources/methods-resources.html

14. JBI (Joanna Briggs Institute) Manual for Evidence Synthesis — covers scoping reviews, mixed-methods synthesis, and economic evidence, beyond Cochrane's intervention focus:
    https://synthesismanual.jbi.global

15. PROSPERO — International Prospective Register of Systematic Reviews; for searching pre-registered protocols and registering your own before data extraction begins:
    https://www.crd.york.ac.uk/prospero/

## Database and search strategy documentation

16. Cochrane RevMan — reference manager and forest plot tool used in Cochrane reviews; documentation for extraction templates and forest plot specification:
    https://training.cochrane.org/online-learning/core-software-cochrane-reviews/revman

17. Semantic Scholar API — programmatic access to the Semantic Scholar corpus (200M+ papers); supports abstract retrieval, citation graphs, and author disambiguation for systematic search and snowball searching:
    https://api.semanticscholar.org/api-docs/

18. PubMed MeSH (Medical Subject Headings) — the controlled vocabulary used for MEDLINE indexing; essential for constructing comprehensive search strategies in biomedical literature:
    https://www.ncbi.nlm.nih.gov/mesh/

## AI-assisted review tools

19. Elicit — AI research assistant that queries the literature and extracts claims from individual papers against a research question; best for exploratory scoping and search supplementation, not as a substitute for a systematic database search:
    https://elicit.com

20. Consensus — AI tool for surfacing consensus-level claims across scientific papers with citation support; useful for orientation and gap identification:
    https://consensus.app

21. Rayyan — AI-assisted systematic review platform with ML-based abstract screening and ranking; free for researchers, supports dual screening and inter-rater reliability measurement:
    https://rayyan.ai

22. Covidence — systematic review management platform with ML screening support, extraction templates, and PRISMA flow diagram generation; the standard tool used by Cochrane review groups:
    https://www.covidence.org

## Reporting extensions (for specialized review types)

23. PRISMA-ScR — PRISMA extension for Scoping Reviews; when the goal is mapping the literature rather than synthesizing effect estimates:
    https://www.prisma-statement.org/extensions/scoping-reviews

24. PRISMA-NMA — PRISMA extension for Network Meta-Analyses; for comparative effectiveness reviews across multiple interventions:
    https://www.prisma-statement.org/extensions/network-meta-analysis

25. MOOSE — Meta-analysis Of Observational Studies in Epidemiology; reporting guideline for meta-analyses of observational studies:
    https://jamanetwork.com/journals/jama/fullarticle/192614
<!--fold:e7edcd@end-->
PORTDOWN_6C3762FE

# ── post ──
MARKER=$(awk '/^---$/ { f++; if (f==2) exit; next } f==1 && /^marker:[[:space:]]/ { sub(/^marker:[[:space:]]+/, ""); print; exit }' "$DEST")
[ -z "$MARKER" ] && { echo "seed: archive has no marker — corrupt" >&2; exit 1; }
awk -v m="$MARKER" -v outdir="$TARGET" '
  BEGIN {
    # Match <!--fold:<m>@file path="X"--> with an optional mode attr after
    # the path (fold emits  mode="644"  on executables).
    file_re = "^<!--fold:" m "@file path=\"([^\"]+)\"( mode=\"[0-9]+\")?-->$"
    end_re  = "^<!--fold:" m "@end-->$"
  }
  $0 ~ end_re { if (current) close(current); exit }
  $0 ~ file_re {
    if (current) close(current)
    line = $0
    sub(/^<!--fold:[^@]+@file path="/, "", line); sub(/".*$/, "", line)
    current = outdir "/" line
    dir = current; sub(/\/[^\/]*$/, "", dir)
    if (dir != current) system("mkdir -p \"" dir "\"")
    printf "" > current
    next
  }
  current { print >> current }
' "$DEST"
SEED_EXTRACTED=$(find "$TARGET" -type f -not -path "$DEST" 2>/dev/null | wc -l)
if [ "$SEED_EXTRACTED" = "0" ]; then
  echo "seed: archive contained no files — refusing to delete the source" >&2
  echo "  archive preserved at: $DEST" >&2
  exit 1
fi
rm -f "$DEST"

echo "" >&2
echo "✓ seed unpacked → $TARGET ($SEED_EXTRACTED files)" >&2
find "$TARGET" -type f | sort | while IFS= read -r _sf; do
  echo "  ${_sf#${TARGET}/}" >&2
done
echo "" >&2
if [ -f "$TARGET/SKILL.md" ]; then
  echo "This seed contains a skill (SKILL.md). Install it in your agent's skills directory." >&2
  echo "" >&2
fi
echo "Install the seed skill if not already installed:" >&2
echo "  https://seed.show/skill" >&2
echo "" >&2
echo "Publisher prompt:" >&2
sed 's/^/  /' >&2 <<'__SEED_PROMPT_END_AC1F2B__'
You have the literature review context. Read README.md first: it covers what AI agents cannot do (database searching), the evidence-mapping mental model, the review type spectrum, what agents get wrong, and what AI is changing. Then read methodology.md for the eight-stage systematic review framework with decision criteria at each stage. Fetch sources.md for authoritative methodology guides, risk of bias tools, and AI-assisted review tools. To share your own context bundles: curl seed.show/skill | bash
__SEED_PROMPT_END_AC1F2B__
exit 0

instructions

You have the literature review context. Read README.md first: it covers what AI agents cannot do (database searching), the evidence-mapping mental model, the review type spectrum, what agents get wrong, and what AI is changing. Then read methodology.md for the eight-stage systematic review framework with decision criteria at each stage. Fetch sources.md for authoritative methodology guides, risk of bias tools, and AI-assisted review tools. To share your own context bundles: curl seed.show/skill | bash

idresearch.literature.review size43.3 KB created2026-05-06 expirespermanent