# Job Alert CLI -- AI-Based Job Newsletter

WP-CLI command suite for dispatching AI-powered job alert newsletters. Uses inverted matching (new jobs first, then users in memory) to scale to 1000+ subscribers without performance bottlenecks.

## Requirements

- WP-CLI installed and accessible
- The `zrm-job-alert` plugin activated
- The `zrm-generalizer` plugin configured with API URL/token (for auto-generalization)

## Quick Start

```bash
# Check system health
wp job-alert status

# Preview what would happen (no writes)
wp job-alert dispatch --dry-run --verbose

# Run for real
wp job-alert dispatch

# Schedule via system cron (every 30 minutes)
# Add to crontab: crontab -e
*/30 * * * * cd /path/to/wordpress && wp job-alert dispatch
```

---

## Commands

### `wp job-alert dispatch`

Runs the full 5-step dispatch pipeline: find new jobs, generalize ungeneralized ones, build title index, match users, log/send results.

```
wp job-alert dispatch [--dry-run] [--verbose] [--user=<email>] [--since=<datetime>] [--limit=<n>] [--export=<path>]
```

| Flag | Description |
|------|-------------|
| `--dry-run` | Run the full pipeline without writing any data (no sent_jobs updates, no timestamp changes, no generalization writes) |
| `--verbose` | Log every decision point: each job found, each title comparison, each skip reason |
| `--user=<email>` | Only process a single user. Useful for debugging a specific subscription |
| `--since=<datetime>` | Override the "new since" timestamp. Accepts any `strtotime()`-compatible value (e.g., `"-7 days"`, `"2026-02-01"`, `"yesterday"`) |
| `--limit=<n>` | Cap the number of new jobs processed |
| `--export=<path>` | Export match results to a CSV file. Semicolon-delimited with UTF-8 BOM for Excel compatibility. Columns: Email, Interests, Matched Jobs, Match Count, Status |

**Pipeline steps:**

1. **Find new jobs** -- single `WP_Query` for jobs published after the global last-run timestamp (or earliest user registration on first run). Uses `post_date_gmt` for timezone safety.
2. **Generalize ungeneralized jobs** -- jobs missing `job_title_generalised` are batch-sent to the ai-generalizer API. If the API is down, those jobs are skipped and retried next run.
3. **Build title index** -- parses comma-separated generalized titles, normalizes them (lowercase, trim), builds an in-memory map: `{title => [job_ids]}`.
4. **Match users** -- iterates users in batches of 100. For each user: resolves generalized titles from their saved job IDs, intersects with the title index in memory, filters out already-sent and pre-registration jobs.
5. **Update global timestamp** -- stores current time so next run only processes newer jobs.

**Examples:**

```bash
# Full dry-run with verbose logging
wp job-alert dispatch --dry-run --verbose

# Process only one user, looking back 7 days
wp job-alert dispatch --user=john@example.com --since="-7 days" --verbose

# Production run, all users
wp job-alert dispatch

# Dry-run with CSV export for analysis in Excel
wp job-alert dispatch --dry-run --since="-7 days" --export=/tmp/job-alert-report.csv
```

---

### `wp job-alert find-new-jobs`

Runs step 1 only. Shows which jobs would be processed without modifying any state.

```
wp job-alert find-new-jobs [--since=<datetime>] [--limit=<n>]
```

**Output:** Table with columns: ID, Title, Post Date, Generalized (yes/no).

```bash
wp job-alert find-new-jobs --since="-7 days"
```

---

### `wp job-alert generalize-jobs`

Runs step 2 only. Finds new jobs missing generalized titles and sends them to the AI generalizer API.

```
wp job-alert generalize-jobs [--since=<datetime>] [--limit=<n>] [--dry-run]
```

```bash
# Preview what would be generalized
wp job-alert generalize-jobs --since="-30 days" --dry-run

# Actually generalize
wp job-alert generalize-jobs --since="-30 days"
```

---

### `wp job-alert build-index`

Runs step 3 only. Builds and displays the title-to-job-ID mapping that would be used for matching.

```
wp job-alert build-index [--since=<datetime>] [--limit=<n>]
```

**Output:** Table with columns: Title (normalized), Job IDs, Count.

```bash
wp job-alert build-index --since="-7 days"
# Example output:
# +----------------+---------+-------+
# | Title          | Job IDs | Count |
# +----------------+---------+-------+
# | koch           | 801,834 | 2     |
# | küchenchef     | 801     | 1     |
# | barkeeper      | 802     | 1     |
# +----------------+---------+-------+
```

---

### `wp job-alert match-users`

Runs step 4 only. Always read-only (implicit dry-run). Shows which users would match which jobs.

```
wp job-alert match-users [--since=<datetime>] [--user=<email>] [--limit=<n>] [--verbose] [--export=<path>]
```

```bash
wp job-alert match-users --user=test@example.com --since="-7 days" --verbose

# Export all matches to CSV
wp job-alert match-users --since="-7 days" --export=/tmp/matches.csv
```

---

### `wp job-alert inspect-user <email>`

Dumps the full stored state of a job alert subscriber.

```
wp job-alert inspect-user test@example.com
```

**Output includes:**
- Post ID, email, DOI status, active status
- Saved job IDs with their current titles and generalization status
- Stored title entries (legacy `jobalertuser_titles`)
- Sent jobs list (last 20 shown)
- Last execution timestamp
- Token (masked)

---

### `wp job-alert inspect-job <id>`

Dumps a job's alert-relevant data and which users reference it.

```
wp job-alert inspect-job 801
```

**Output includes:**
- Job ID, title, post status, publish date
- Raw and parsed generalized titles
- Reverse lookup: which subscribers have this job in their saved IDs

---

### `wp job-alert reset`

Resets the global last-run timestamp. The next dispatch will use the earliest user registration timestamp as the boundary (first-run behavior).

```
wp job-alert reset --confirm
```

The `--confirm` flag is required to prevent accidental resets.

---

### `wp job-alert status`

Shows a system health overview.

```
wp job-alert status
```

**Output:**
```
Last run:     2026-02-23 10:00:00 UTC (3 hours ago)
Users:        1234 active, 56 pending DOI, 12 inactive
Jobs:         5432 total, 5410 with generalized titles, 22 without
API URL:      https://ai-generalizer.zeitraum.dev/api/generalize
API Status:   HTTP 200 (OK)
```

---

## How It Works

### User Registration Flow

1. User visits a single job page and enters their email in the job alert form.
2. Frontend sends `POST /wp-json/job-alert/register` with `{email, post_id}`.
3. If user exists: the job's post ID is appended to `jobalertuser_job_ids` (no duplicates). Generalized titles are also stored in `jobalertuser_titles` for backward compatibility.
4. If user is new: a new `post_type_jobalrtusr` post is created with the job ID, a double opt-in email is sent.

### Dispatch Algorithm (Inverted Matching)

Traditional approach (slow): for each user, query the database for matching jobs. With 1000 users and 10 titles each, that's 10,000 database queries.

Our approach (fast): find new jobs first (1 query, ~50 results), then match all users against them in memory.

```
Step 1: Get new jobs since last run           -> ~50 jobs (1 DB query)
Step 2: Generalize any missing titles         -> 1 batch API call
Step 3: Build {title => [job_ids]} index      -> in-memory, microseconds
Step 4: For each user (batched):
          Read saved job IDs                  -> 1 meta read
          Get generalized titles (cached)     -> 0 DB queries (cached)
          Set intersection with title index   -> in-memory, microseconds
          Filter: already sent? before reg?   -> in-memory checks
Step 5: Update global timestamp               -> 1 option write
```

**Result:** ~300 simple DB queries for 1000 users. Completes in seconds.

### Data Model

**Per user (`post_type_jobalrtusr`):**

| Meta Key | Type | Description |
|----------|------|-------------|
| `jobalertuser_email` | email | Subscriber email |
| `jobalertuser_email_hash` | text | Email used for lookup (currently plain email) |
| `jobalertuser_doi` | boolean | Double opt-in confirmed (0/1) |
| `jobalertuser_active` | boolean | Active subscription (0/1) |
| `jobalertuser_job_ids` | JSON text | Array of job post IDs the user subscribed from |
| `jobalertuser_titles` | JSON text | Legacy: generalized title entries from registration |
| `jobalertuser_sent_jobs` | JSON text | Array of job IDs already sent (capped at 500) |
| `jobalertuser_last_execution_timestamp` | text | Unix timestamp of last alert or registration |
| `jobalertuser_token` | text | Secure token for preferences/unsubscribe links |

**Per job (`post_type_job`):**

| Meta Key | Type | Description |
|----------|------|-------------|
| `job_title` | text | The job title |
| `job_title_generalised` | text | Comma-separated AI-generalized title variations |

**Global:**

| Option Key | Description |
|------------|-------------|
| `zrm_job_alert_last_cron_run` | Unix timestamp of last successful dispatch |

### Safety Features

- **First-run protection:** When no global timestamp exists, uses the earliest user registration timestamp (not epoch). Prevents processing all historical jobs.
- **Per-user date filter:** Users only receive jobs published after their `last_execution_timestamp`. A user who registered on Feb 20 won't get jobs from Feb 10.
- **Deleted job resilience:** Saved job IDs are validated at dispatch time. Deleted/unpublished jobs are skipped. Falls back to stored `jobalertuser_titles` if live lookup fails.
- **Sent jobs pruning:** The `jobalertuser_sent_jobs` array is capped at 500 entries to prevent unbounded post meta growth.
- **Timezone consistency:** All date queries use `post_date_gmt` column and UTC timestamps.
- **Memory management:** `wp_cache_flush()` is called between user batches to prevent object cache exhaustion.
- **API failure handling:** If the ai-generalizer API is unreachable, ungeneralized jobs are skipped with a warning and retried on the next run.

### Configuration

The CLI reads AI generalizer API settings from the same sources as the `zrm-generalizer` plugin:

1. PHP constants (highest priority): `ZRM_GENERALIZER_API_URL`, `ZRM_GENERALIZER_API_TOKEN`
2. WordPress options: `zrm_generalizer_api_url`, `zrm_generalizer_api_token`
3. Default: `https://ai-generalizer.zeitraum.dev/api/generalize`

---

## Test Scenarios

### Registration

| # | Scenario | Command | Expected |
|---|----------|---------|----------|
| 1 | Fresh registration | `curl -X POST .../register -d '{"email":"test@ex.com","post_id":123}'` then `wp job-alert inspect-user test@ex.com` | `jobalertuser_job_ids = [123]` |
| 2 | Second job registration | Same curl with `post_id: 456` | `jobalertuser_job_ids = [123, 456]` |
| 3 | Duplicate same job | Same curl with `post_id: 123` again | `jobalertuser_job_ids = [123, 456]` (no duplicate) |

### Job Discovery

| # | Scenario | Command | Expected |
|---|----------|---------|----------|
| 4 | Find new jobs | `wp job-alert find-new-jobs --since="2026-02-01"` | Table of jobs after Feb 1 |
| 5 | Generalize (dry-run) | `wp job-alert generalize-jobs --since="-30 days" --dry-run` | Shows what would be sent |
| 6 | Generalize (live) | `wp job-alert generalize-jobs --since="-30 days"` | Calls API, stores results |
| 7 | Build index | `wp job-alert build-index --since="-7 days"` | Title-to-job mapping table |

### Dispatch

| # | Scenario | Command | Expected |
|---|----------|---------|----------|
| 8 | Single user dry-run | `wp job-alert dispatch --dry-run --user=test@ex.com --verbose` | Full pipeline output, no writes |
| 9 | Full live dispatch | `wp job-alert dispatch --verbose` | Processes all users, updates state |
| 10 | No new jobs | `wp job-alert dispatch` | "No new jobs... Nothing to do." |
| 11 | No saved job IDs | `wp job-alert dispatch --user=legacy@ex.com --verbose` | "has no saved job IDs, skipping" |
| 12 | API down | `wp job-alert generalize-jobs --since="-7 days"` | Warning about unreachable API |
| 13 | Already sent (dedup) | `wp job-alert dispatch --user=test@ex.com --verbose` | "already in sent_jobs, skipping" |

### Edge Cases

| # | Scenario | Command | Expected |
|---|----------|---------|----------|
| 14 | Reset + reprocess | `wp job-alert reset --confirm` then `dispatch --dry-run --verbose` | Uses earliest user timestamp |
| 15 | System status | `wp job-alert status` | Health overview with counts |
| 16 | Deleted saved job | `wp job-alert dispatch --user=test@ex.com --verbose` | "no longer published, skipping" |
| 17 | Per-user date filter | `wp job-alert dispatch --dry-run --verbose` | User B skips pre-registration jobs |
| 18 | Sent jobs pruning | `wp job-alert dispatch --user=test@ex.com --verbose` | Pruned to 500 entries |
| 19 | First run | `wp job-alert dispatch --dry-run --verbose` | Uses earliest user timestamp |

---

## Coexistence with Existing System

The CLI dispatch system runs **in parallel** with the existing event-driven MeiliSearch-based alerts. Neither system interferes with the other:

- **Existing system:** Triggered on job publish, after WP All Import, after MeiliSearch rebuild. Uses MeiliSearch for search. Sends via `ZRM_MailingGateway`.
- **CLI system:** Triggered via system cron / manual WP-CLI. Uses in-memory set intersection. Currently logs to `WP_CLI::log()` and `error_log()` (dummy send).

When ready for production email sending, replace the `WP_CLI::log("[MATCH]...")` and `error_log()` calls in `find_and_record_matches()` with `ZRM_MailingGateway` integration.
