GynContraception — Research Platform

Loading...

Conditions

How conditions are counted: Posts are scraped from 12 gyn-focused subreddits: r/endometriosis, r/PCOS (200 /new + 50 /hot each), plus r/Endo, r/Adenomyosis, r/Fibroids, r/Healthyhooha, r/obgyn, r/WomensHealth, r/Periods, r/PMDD (100 /new + 50 /hot each), and r/TwoXChromosomes, r/AskDocs (200 /new + 50 /hot each). Both /new and /hot sort orders are scraped per sub for broader coverage. Each post title and body text is scanned against 15 regex patterns — one per gynecologic condition (endometriosis, PCOS, dysmenorrhea, adenomyosis, fibroids, PMDD, PMS, menorrhagia, amenorrhea, ovarian cysts, pelvic pain, hormonal acne, hirsutism, irregular cycles, vulvodynia). Conditions are counted at the post level only. Cross-posted content is detected and deduplicated. Counts reflect the number of unique posts mentioning each condition. A post that discusses multiple conditions is counted once for each condition it mentions.

Sentiment by Condition

Keyword-based sentiment scoring: Each post and comment (across all 12 subreddits) is scored from -1.0 (very negative) to +1.0 (very positive) using curated word lists:
  • ~40 positive words (love, recommend, effective, relief, helped, works, etc.)
  • ~50 negative words (hate, pain, nightmare, frustrated, scared, bleeding, etc.)
  • Negators (not, never, don't, etc.) flip the next word's polarity — "not painful" counts as positive
  • Intensifiers (very, extremely, so) multiply the next word's weight by 1.5x
Score formula: (positive - negative) / total_sentiment_words, clamped to [-1, 1]. The chart shows the average score across all posts mentioning each condition. Posts with no sentiment words score NULL and are excluded. Use the subreddit filter to compare sentiment across communities. This is best for comparing trends between conditions, not interpreting individual posts — it cannot detect sarcasm, context, or complex phrasing.

Conditions Over Time

Daily condition counts: Shows the top 5 most-mentioned conditions over time, aggregated across all 12 subreddits (or filtered by one). Each data point is the number of unique posts from that day containing a regex match for that condition. Days with zero posts are not shown. The time and subreddit filters control the data displayed.

Treatments

How treatments are counted: Posts and comments are scanned against ~34 regex patterns — one per treatment type, including brand names and common abbreviations. Treatments are counted at both post and comment level. Grouped into 8 categories: Hormonal IUDs, Oral Hormonal, Long-Acting, GnRH Agonists/Antagonists, Surgical, Pain Management, Complementary, and Other Medications.

Sentiment by Treatment

Average sentiment per treatment: Same keyword-based scoring as Sentiment by Condition, but grouped by treatment rather than condition.

Condition × Treatment Heatmap

Core Aim 1 visualization: Each cell shows the number of posts where a condition (row) co-occurs with a treatment (column). This is the primary two-axis analysis of the research platform: which treatments are discussed in the context of which gynecologic conditions. Color intensity scales linearly from transparent (0) to blue (max value). Conditions are post-level; treatments are post+comment level.

Condition × Symptom Heatmap

Condition × symptom matrix: Data is sourced from 12 subreddits. Each cell shows the number of posts and comments that mention both a condition and a symptom. Symptoms are detected using ~30 regex patterns matching symptoms like "pelvic pain," "heavy bleeding," "fatigue," "mood swings," etc. Color intensity scales linearly from transparent (0) to red (max value in the table). Cross-posted content is deduplicated. Limitation: A post saying "I'm worried about weight gain" and one saying "I had no weight gain" both count — the regex detects the mention, not the context. Top 12 conditions and top 15 symptoms are shown.

Top Symptoms

Ranked symptom mentions: Counts the number of unique posts and comments (across all 12 subreddits) that match each of the ~30 symptom regex patterns. A single post mentioning "pelvic pain" and "heavy bleeding" counts once for each category. Patterns include variations (e.g., "headache" and "migraine" both map to "Headaches"). This tracks what people are talking about, not necessarily what they experienced — questions, fears, and reports all count equally.

Knowledge Gaps

Knowledge gap detection: Posts and comments are scanned against 13 proximity-bounded regex patterns sourced from ACOG, WHO, Endocrine Society, Cochrane, Rotterdam Criteria, and FDA, in 3 categories: Treatment role (BC only prevents pregnancy, diet cures endo/PCOS, BC masks symptoms, natural always better), Safety (hormones always bad, BC ruins fertility, continuous BC unsafe, Lupron is dangerous, BC causes endo), Disease understanding (endo goes away with pregnancy, hysterectomy cures endo, painful periods normal, PCOS is just weight). Each pattern requires co-occurrence of two concepts within a bounded character window (typically 30–80 characters) to prevent false positives from distant, unrelated mentions. Stance detection: Each match is classified by analyzing a 150-character context window and sentence boundaries around the regex match for linguistic cues:  asserting,  questioning,  debunking,  unclear. Bar chart segments are colored by stance when available. Sources: ACOG Practice Bulletins, WHO, Endocrine Society, Cochrane, FDA.

Knowledge Gaps × Condition Heatmap

Condition × knowledge gaps matrix: Each cell shows the number of posts and comments that mention both a condition and a knowledge gap. Color intensity scales linearly from transparent (0) to orange (max value). Top conditions and all claims with data are shown.

Questions Being Asked

Question detection: Post titles are scanned for question indicators (contains "?" or starts with question words like "has anyone", "is it", "does", "how", "what", etc.). Detected questions are categorized into 6 types by matching the title + body against keyword patterns: Symptoms/Safety (symptoms, safety, side effects), Effectiveness (effectiveness, treatment outcomes), Usage (how to use, dosing, procedures), Access/Cost (cost, insurance, prescriptions, clinics), Experience (personal experiences, recommendations, advice), Switching (changing treatments, alternatives). A question can belong to multiple categories. Limitation: Only post titles are checked for question detection; body text is used for categorization only.

Questions × Condition Heatmap

Condition × question category matrix: Each cell shows the number of posts that are questions mentioning both a condition and a question category. Color intensity scales linearly from transparent (0) to teal (max value). Top conditions and all question categories are shown.

User Demographics

Self-reported age and gender extraction: Reddit users commonly self-report demographics in posts (e.g., "I'm 23F", "(25F)", "26 year old woman"). The system scans the first 500 characters of each post through 7 ordered regex patterns (most specific first): 1. "I'm 23F" / "I am 23F", 2. "(23F)" / "[23F]", 3. "23F here", 4. "23 year old female/woman/man", 5. "I'm a 23 year old" (age only), 6. "I'm female" / "I'm a woman" (gender only), 7. "I'm 23 years old" (age only). A third-person filter skips matches preceded by "my girlfriend", "my partner", etc. within 30 characters. Age range: 13–65. Gender maps to female/male/null. Posts only (not comments). Limitation: Non-binary genders map to null.

Age × Condition Heatmap

Age band × condition matrix: Each cell shows the number of posts that mention a condition and where the poster self-reported an age in that band. Color intensity scales linearly from transparent (0) to purple (max value). 7 age bands: 13–17, 18–22, 23–27, 28–32, 33–37, 38–42, 43+.

Treatment Categories

Grouped treatment categories: Individual treatments are grouped into 8 categories: Hormonal IUDs (Mirena, Kyleena, Liletta, Skyla), Oral Hormonal (Combined pill, Mini pill, The pill (general), Yaz, Lo Loestrin, Slynd, Norethindrone, Dienogest), Long-Acting (Nexplanon, Depo-Provera, NuvaRing, Xulane patch), GnRH Agonists/Antagonists (Lupron, Orilissa, Myfembree), Surgical (Laparoscopy, Excision surgery, Ablation, Hysterectomy, Myomectomy), Pain Management (NSAIDs, Opioids), Complementary (Pelvic floor PT, Diet/lifestyle, Supplements), Other Medications (Metformin, Spironolactone, Letrozole/Clomid). The chart sums treatment mention counts within each group.

Post Explorer

Browse raw posts and comments: Select a condition or treatment to see the top posts (sorted by engagement score) whose own title or body text mentions it. Posts are sourced from all 12 subreddits. Each post shows:
  • Subreddit badge — cyan pill showing which subreddit the post came from
  • Sentiment badge — green (+), red (-), or gray (neutral/none), based on the keyword scorer
  • Engagement score — composite metric: log2(upvotes) + log2(comments) × 1.5, weighting discussion higher than votes
  • Symptom pills — yellow tags for each symptom detected in that post's text
  • View comments — expands to show scraped Reddit comments with their own sentiment scores
Post text is the original Reddit selftext (body). Comments are fetched up to 200 per post, walking the full reply tree. Only posts with condition mentions in their own text have their comments scraped. Cross-posts are stored but their mentions are not double-counted. Comments are analyzed for symptoms and sentiment but do not affect the post's condition attribution.
Select a type and click Load Posts

Scrape Log

Loading...

Human Validation of Health Literacy Gap Detection

Rate posts for common health misconceptions and knowledge gaps. Your votes are compared against the system's regex-based detection to compute precision, recall, and agreement metrics.


Literacy Gap Validation Statistics

Submit votes to see statistics

Suggestion Box

Suggest features or improvements. Upvote ideas you like — the most popular rise to the top.

Loading...