Design Philosophy

This document records the design decisions and their rationale behind the Emergency Contraception Pharmacy Finder. It is intended for developers who want to understand "why things are the way they are," and as a reference for those building similar projects.

For technical specifications, see the Feature Spec and the Hours Parser Design.


1. This Site Is a "Search Tool," Not a "Medical Information Site"

This is the most fundamental design decision. The site does not cover medical information such as the efficacy, side effects, or administration methods of emergency contraception.

Reasons:

For this reason, medical information and lengthy explanatory text are intentionally omitted from the header and filter areas. We should not add noise to the navigation path of users in a hurry. Questions like "What is the difference between pharmacies and medical institutions?" are addressed with a single FAQ entry, as they relate to usage decisions.


2. Filter Design Principles

This site has two types of filters, each with different criteria for implementation.

Default Exclusion (Options Hidden from the User)

Emergency contraception is time-critical. The decision to hide options by default requires careful, deep consideration along each axis. Evaluated along three axes:

Axis Question
Source of information Is it self-reported by the facility (primary source), or parsed/estimated by us (secondary processing)?
Availability of alternatives Can the purpose of the excluded option be achieved through another route?
Harm of not excluding Is it a risk of wasted trips (noise), or a loss of useful options?

Example: Stock availability filter for medical institutions (ๅŒป็™‚ๆฉŸ้–ข) โ†’ Exclusion OK

The medical institution data includes a "stock availability" (ๅธธๆ™‚ๅœจๅบซใฎๆœ‰็„ก) column from the MHLW PDF. Facilities marked "No" or "Unknown" are hidden by default.

Primary source + alternatives available + practical benefit of noise removal. Exclusion is justified.

Example: "Currently Available" filter โ†’ ~~Not implemented~~ Implemented as user-initiated filter

Initially decided against implementation. However, parser improvements (pharmacy 98.2%, clinic 88.3%) and the "unknown badge" design changed the 3-axis evaluation:

Hides only "confirmed closed (no after-hours)" facilities. Uncertain records (unparseable) and after-hours pharmacies remain visible. Default OFF (user-initiated). The former "After-Hours Available" checkbox was merged into this filter. Badge system: green (Open) / blue (After-Hours Available) / gray (Closed) / amber (Status Unknown). See NOW_OPEN_FILTER.md for full design rationale.

User-Initiated Filters (User Turns Them ON)

Because everything is visible when the filter is OFF, the criteria are more relaxed than for default exclusion:

  1. Data must be from a primary source โ€” based on the facility's self-reporting
  2. The filter rate must be meaningful โ€” a filter with too high a pass rate does not function

Example: Private Room Available (ๅ€‹ๅฎคใ‚ใ‚Š) โ†’ Implemented

Filters to pharmacies whose privacy field contains "private room" (ๅ€‹ๅฎค).

Example: Partition Available (่ก็ซ‹ใ‚ใ‚Š) โ†’ Not implemented


3. Empty Record Exclusion and Two-Layer Count Display

The MHLW dataset contains records where address, phone number, and hours are all empty (177 as of March 2026). Their notes field typically reads "Consolidated into #XXX" โ€” these are defunct or merged facilities. They are worthless to users (no way to call or locate them), so they are excluded from data.json.

Basis for Exclusion

Evaluated against the 3-axis framework from Section 2:

โ†’ This is not even a filter โ€” it is a data quality issue. There is no reason to display these.

Count Display Design

Exclusion reduces data.json to fewer records than MHLW's published total. These two numbers have different meanings (as of 2026-03-25):

Number Meaning Displayed where
11,931 Facilities published by MHLW (data coverage) Highlight card: "๐Ÿ’Š 11,931 nationwide"
11,734 Actually searchable facilities (what users can interact with) Status bar: "Loaded 11,734 pharmacy records"

The highlight card states "Complete official MHLW data." The number shown there should be the MHLW dataset size, not our filtered count. Hence it uses meta.totalPublished.

The status bar is an operational report from the search engine โ€” it accurately shows the number of records loaded as search targets.

Implementation

The title/meta approximate count (e.g., "over 11,000 nationwide") references the MHLW dataset size and is unaffected by the exclusion.


4. Pharmacy-Default with Medical Institution Toggle

By default, only pharmacies (่–ฌๅฑ€) are displayed. Turning on the "Also show medical institutions with stock" toggle lazy-loads clinics.json and integrates the results.

Rejected Alternatives

Approach Reason for Rejection
Tab separation (Pharmacy tab / Medical Institution tab) Cannot cross-sort pharmacies and medical institutions by "nearest." Users may not notice if the nearest option is a medical institution
Full integration (always show both) Filters become complex. Fields unique to pharmacies (female pharmacist, private room, etc.) and fields unique to medical institutions (OB/GYN department, stock status) would be mixed together
Separate page An extra navigation step during an emergency. Everything should be on one page

Why Pharmacies Are the Default

Visual Distinction of Cards

When pharmacies and medical institutions appear in mixed search results, users must be able to distinguish them at a glance. Medical institution cards feature a red left border + a "Medical Institution" (ๅŒป็™‚ๆฉŸ้–ข) label. On the map, pharmacies use blue pins and medical institutions use red pins.


5. Geolocation UX

Sorting by "nearest" requires the browser's geolocation, but we do not use the browser's standard confirm() dialog.

Problem

In the context of emergency contraception, a system dialog feels like a "warning" and amplifies anxiety. The standard OS wording โ€” "This site wants to use your location" โ€” can be frightening in a privacy-sensitive situation.

Solution: Inline Privacy Panel

Geolocation is used solely for in-browser distance calculations and is never sent to a server. As a static site, there is no server to send it to.


6. Hours Parser Philosophy

The MHLW data's hours field contains thousands of distinct format variations. Rather than aiming for 100% parsing, we adopted a design of high-coverage accurate parsing + graceful fallback. Current coverage: pharmacies 98.2%, medical institutions 88.3%.

Why Not Aim for 100%?

Fallback Strategy

Unparseable business hours are displayed as raw data (exactly as recorded by MHLW). "Showing raw data" is safer than "pretending to have parsed it and displaying inaccurate information." With emergency contraception, believing incorrect business hours and arriving at a closed facility can have serious consequences.

Holiday Handling

Holiday detection is implemented as pure computation with zero external API dependencies (getJapaneseHolidays(), approximately 70 lines). It covers fixed-date holidays, Happy Monday holidays, vernal and autumnal equinoxes (astronomical formulas), substitute holidays (ๆŒฏๆ›ฟไผ‘ๆ—ฅ), and citizens' holidays (ๅ›ฝๆฐ‘ใฎไผ‘ๆ—ฅ), valid through approximately 2099.

Reasons for not depending on external holiday APIs: - We do not want to add network-dependent failure points to an emergency tool - Japanese holidays are determined by law and are computable (temporary changes due to special legislation are handled through annual checks)


7. Technology Selection Principles

The consistent principle is free, no API key required, static hosting.

Choice Reason
Leaflet.js + OpenStreetMap Free. Google Maps Platform is paid
University of Tokyo CSIS Geocoding Free, no API key required, address-level precision
GitHub Pages Free static hosting. No server operations required
Vanilla JS (no framework) No build step. Files in docs/ are the production build as-is. Fewer dependencies = easier long-term maintenance
GitHub Actions Automated daily data updates. The free tier is sufficient

For details on the rationale for each technology choice and rejected alternatives, see the Feature Spec.

Why a Static Site?


8. Data Pipeline Cache Design

update_data.py fetches and processes MHLW's XLSX to generate data.json. When a cache file (data/data_*.json) already exists for the same as-of date, the script skips regeneration to avoid unnecessary downloads and noisy git diffs.

Problem: Incomplete Cache Key

Whether a cache is "valid" should depend on two inputs combined:

  1. Source data (MHLW XLSX) โ€” identified by the as_of date
  2. Processing logic (update_data.py itself) โ€” if the script changes, output changes

Originally, only the source data match was checked. This meant that when the processing logic was modified (e.g., adding empty record exclusion, adding meta fields), the existing cache was still judged "valid," and the new logic was never applied.

This has the same structure as a classic build system problem: if you modify source code but do not recompile the object files, the old binary continues to be used.

Solution: Input-Hashed Cache Invalidation

meta.scriptHash stores the SHA-256 hash (first 16 characters) of the script itself. During cache validation, this hash is compared against the current script's hash. A mismatch triggers a cache miss and regeneration from the XLSX.

Cache key = as_of date + scriptHash
  โ†’ Same source data + different script โ†’ cache miss
  โ†’ Same script + different source data โ†’ cache miss

This technique follows the same principle as Docker layer caching, Webpack's contenthash, and Nix/Bazel build hashes: never use the cache unless all inputs match.

Why a Hash Instead of a Version Number?

Approach Advantage Disadvantage
Manual version number Explicit Can be forgotten โ€” reproduces the same structural problem as this bug
Script hash Fully automatic. Impossible to forget Comment-only changes also trigger regeneration (no practical impact โ€” regeneration takes seconds)

Safety mechanisms that depend on manual procedures break the moment the procedure is forgotten. What can be automated, should be automated.

Implementation


9. Stable IDs and Partial-Failure Protection for Medical Institutions

Medical institution data (clinics.json) is generated by parsing PDFs from 47 prefectures. Unlike pharmacy data, the PDFs contain no stable facility identifier.

Problem: Sequential ID Fragility

The original design used sequential IDs (c1, c2, ...). This had two critical problems:

  1. ID shift: Adding or removing a single record shifts all subsequent IDs, breaking geocode_cache.json (lat/lng mappings). Map markers for all medical institutions become incorrect
  2. Partial failure: If some prefectures fail to download, clinics.json is overwritten with incomplete data. This actually occurred on 2026-03-20 when only 361 records (Tokyo only) were written

Solution 1: Hash-Based Stable IDs

A SHA-256 hash is computed from (facility name, address), and the ID is c- + first 8 hex characters.

ID = c- + SHA256("name\taddr")[:8]
Example: c-a3f8b2e1

Solution 2: Partial-Failure Protection (Two-Layer Check)

Check Condition Purpose
Prefecture coverage Any previously present prefecture is missing Detects single-prefecture download failure
Global threshold New record count < 80% of previous Detects large-scale data loss

When a check fails, clinics.json is NOT written and the script exits 0 (the GitHub Actions workflow continues, allowing pharmacy geocoding to proceed normally). Override with --force-write.

Geocache Migration

Migration from old-format IDs (c1, c2, ...) to new-format IDs (c-XXXXXXXX) runs automatically inside update_clinics.py. It fires only when old-format IDs exist in geocode_cache.json, rewrites the keys, and deletes old entries. Automatically skipped after first run.