17 Commits
v1 ... v1.2

Author SHA1 Message Date
8ab07b83ab Update docs, changelog, examples, and add ADRs for v1.2
- Add v1.1.0 and v1.2.0 changelog entries
- Add exclude field to config reference and example config
- Add ADRs documenting all major design decisions
- Fix step numbering in reverse_sync()
- Fix action.yml to copy VERSION file
- Add dist/ and .env to .gitignore
- Use refs/tags/ format for Nix flake tag refs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 21:28:40 +03:00
95b83bd538 Fix PR body newlines rendering as literal \n
Bash double-quoted strings don't interpret \n as newlines.
Use actual newlines in the pr_body strings instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 16:13:13 +03:00
ce53d3c1d2 Fix reconciliation parent order and add reverse sync tree check
- Swap parent order in reconcile_filter_change(): josh-filtered must
  be first parent so josh can follow first-parent traversal to map
  history back to the monorepo. Old subrepo history on parent 2.
- Add tree comparison in reverse_sync() before commit detection:
  if subrepo tree matches josh-filtered tree, skip immediately.
  Prevents false positive PRs after reconciliation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 15:11:31 +03:00
16257f25d7 Fix reverse sync false positive after filter reconciliation
Add --ancestry-path to git log in reverse_sync() to prevent old
subrepo history from leaking through reconciliation merge parents.
Without this, every old subrepo commit appears as a "human commit"
triggering a spurious 0-commit PR on the monorepo.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 14:19:56 +03:00
c0ddb887ff Fix filter reconciliation for pre-v1.2 state and unrelated histories
Three bugs found during first CI run after enabling :exclude:

- Derive old filter (:/subfolder) when state has no josh_filter stored
  (pre-v1.2 upgrade path)
- Detect unrelated histories in forward_sync() and fall back to
  reconcile_filter_change() instead of creating a useless conflict PR
- Skip state update on conflict result (prevents storing wrong filter
  and mono SHA that blocks retries)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 13:30:24 +03:00
22bd59a9d7 Auto-reconcile subrepo history when josh filter changes
When the exclude list changes, josh-proxy recomputes filtered history
with new SHAs, breaking common ancestry with the subrepo. Instead of
requiring a manual reset (force-push), forward sync now detects the
filter change and creates a reconciliation merge commit that connects
the old and new histories — no force-push, no re-clone needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:40:08 +03:00
d7f8618b38 Use inline :exclude in josh-proxy URL instead of stored filter files
The :+ stored filter syntax doesn't work in josh-proxy URLs.
Inline :exclude[::p1,::p2] works directly — no files to generate
or commit, no extra dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:19:41 +03:00
5929585d6c Fix josh-proxy rejecting stored filter path with slash
Josh-proxy's parser treats "/" in :+ paths as a filter separator,
so :+.josh-filters/backend fails. Use flat naming at repo root:
.josh-filter-<target>.josh referenced as :+.josh-filter-<target>.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 09:47:16 +03:00
187a9ead14 Add file exclusion via josh stored filters (v1.2.0)
New `exclude` config field per target generates .josh-filters/<name>.josh
files with josh :exclude clauses. Josh-proxy applies exclusions at the
transport layer — excluded files never appear in the subrepo.

Preflight checks that generated filter files are committed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:45:13 +03:00
401d0e87a4 Show [migrated] marker and summary in migrate-pr
Interactive picker now marks already-migrated PRs. All modes (--all,
explicit numbers, interactive) track and display success/fail/skip
counts at the end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:08:23 +03:00
fbacec7f6f Improve PR migration: fetch branches locally + 3-way merge
Instead of fetching the API diff (which has context-sensitive patches
that break after josh-filtered reset), fetch the archived repo's
branches directly as a second remote and compute the diff locally.
Apply with git apply --3way for resilience against context mismatches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 20:51:22 +03:00
553f006174 Fix onboard import cloning from empty new repo instead of archived repo
initial_import() now accepts an optional clone URL override parameter.
onboard_flow() passes the archived repo URL so content is cloned from
the right source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 19:48:46 +03:00
cb14cf9bd4 Add docs for updating josh-sync version in Nix devenv
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 18:38:44 +03:00
0363b0ee77 Fix VERSION not included in Nix package and Makefile bundle
- flake.nix: copy VERSION file to $out/ so josh_sync_version() finds it
- Makefile: add lib/onboard.sh to the bundle loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 18:38:07 +03:00
72430714af Update docs for onboard and migrate-pr commands
- README: add onboard and migrate-pr to CLI reference
- Guide Step 5: add onboard as recommended Option A, move manual
  import/reset to Option B, document migrate-pr usage
- Guide "Adding a New Target": mention onboard as preferred path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 18:33:53 +03:00
105216a27e Add onboard and migrate-pr commands (v1.1.0)
New commands for safely onboarding existing subrepos into the monorepo
without losing open PRs:

- josh-sync onboard <target>: interactive, resumable 5-step flow
  (import → wait for merge → reset to new repo)
- josh-sync migrate-pr <target> [PR#...] [--all]: migrate PRs from
  archived repo to new repo via patch application

Also refactors create_pr() to wrap create_pr_number(), eliminating
duplicated curl/jq logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 12:41:44 +03:00
405e5f4535 Update guide.md 2026-02-13 09:31:41 +03:00
30 changed files with 1620 additions and 62 deletions

3
.gitignore vendored
View File

@@ -1,3 +1,4 @@
.claude/*local*
dist/
.env
result

View File

@@ -1,5 +1,32 @@
# Changelog
## 1.2.0
### Features
- **File exclusion**: `exclude` config field removes files/directories from the subrepo at the josh-proxy transport layer. Patterns are embedded inline in the josh-proxy URL using `:exclude[::pattern,...]` syntax — no extra files to generate or commit.
- **Filter change reconciliation**: When the josh filter changes (e.g., adding/removing exclude patterns), josh-sync automatically creates a reconciliation merge commit that connects old and new histories. No manual reset or force-push required.
- **Tree comparison guard**: Reverse sync now compares subrepo tree to josh-filtered tree before checking commit log. Skips immediately when trees are identical, avoiding false positives from reconciliation merge history.
- **Unrelated histories detection**: Forward sync detects when histories are unrelated (no common ancestor) and falls back to reconciliation instead of creating a useless conflict PR.
### Fixes
- Pre-v1.2 state compatibility: When upgrading from v1.0/v1.1 (no `josh_filter` stored in state), the old filter is derived from `subfolder` so reconciliation triggers correctly.
- Reconciliation merge parent order: Josh-filtered history is always first parent so josh-proxy can follow first-parent traversal back to the monorepo.
- Reverse sync `--ancestry-path` flag prevents old subrepo history from leaking through reconciliation merge parents.
- PR body `\n` now renders as actual newlines instead of literal text.
- Conflict result no longer updates sync state (added `continue` to skip state write).
- `action.yml` now copies VERSION file for correct `--version` output in CI.
- `.gitignore` now includes `dist/` and `.env`.
## 1.1.0
### Features
- **`onboard` command**: Interactive, resumable workflow for importing existing subrepos into the monorepo. Walks through: prerequisites check, import (creates PRs), wait for merge, reset (pushes josh-filtered history). Checkpoint/resume at every step.
- **`migrate-pr` command**: Migrates open PRs from an archived subrepo to the new one. Supports interactive selection, `--all` flag, and specific PR numbers. Uses `git apply --3way` for resilient patch application.
- **Onboard state tracking**: Stored on the `josh-sync-state` branch at `<target>/onboard.json`. Tracks step progress, import PR numbers, reset branches, and migrated PRs.
## 1.0.0
Initial release. Extracted from [private-monorepo-example](https://code.itkan.io/pe/private-monorepo-example) into a standalone reusable library.

View File

@@ -23,7 +23,7 @@ dist/josh-sync: bin/josh-sync lib/*.sh VERSION
@echo '# Generated by: make build' >> dist/josh-sync
@echo '' >> dist/josh-sync
@# Inline all library modules (strip shebangs and source directives)
@for f in lib/core.sh lib/config.sh lib/auth.sh lib/state.sh lib/sync.sh; do \
@for f in lib/core.sh lib/config.sh lib/auth.sh lib/state.sh lib/sync.sh lib/onboard.sh; do \
echo "# --- $$f ---" >> dist/josh-sync; \
grep -v '^#!/' "$$f" | grep -v '^# shellcheck source=' >> dist/josh-sync; \
echo '' >> dist/josh-sync; \

View File

@@ -16,12 +16,12 @@ josh:
targets:
- name: "billing"
subfolder: "services/billing"
josh_filter: ":/services/billing"
subrepo_url: "git@gitea.example.com:ext/billing.git"
subrepo_auth: "ssh"
branches:
main: main
forward_only: []
exclude: # files excluded from subrepo (optional)
- ".monorepo/"
bot:
name: "josh-sync-bot"
@@ -58,8 +58,10 @@ Run `josh-sync preflight` to validate your setup.
## Documentation
- **[Setup Guide](docs/guide.md)** — Step-by-step: prerequisites, importing existing subrepos, CI workflows, and troubleshooting
- **[Setup Guide](docs/guide.md)** — Step-by-step: prerequisites, importing existing subrepos, CI workflows, file exclusion, and troubleshooting
- **[Configuration Reference](docs/config-reference.md)** — Full `.josh-sync.yml` field documentation
- **[Architecture Decision Records](docs/adr/)** — Design rationale and trade-offs
- **[Changelog](CHANGELOG.md)** — Version history
## CLI
@@ -68,6 +70,8 @@ josh-sync sync [--forward|--reverse] [--target NAME[,NAME]] [--branch BRANCH]
josh-sync preflight
josh-sync import <target>
josh-sync reset <target>
josh-sync onboard <target> [--restart]
josh-sync migrate-pr <target> [PR#...] [--all]
josh-sync status
josh-sync state show <target> [branch]
josh-sync state reset <target> [branch]
@@ -77,12 +81,16 @@ josh-sync state reset <target> [branch]
- **Forward sync** (mono → subrepo): pushes directly if clean, creates conflict PR if not. Uses `--force-with-lease` for safety.
- **Reverse sync** (subrepo → mono): always creates a PR, never pushes directly.
- **File exclusion**: `exclude` patterns are embedded inline in the josh-proxy URL. Excluded files exist only in the monorepo.
- **Filter reconciliation**: Changing the exclude list auto-creates a merge commit that connects old and new histories — no force-push needed.
- **Loop prevention**: `Josh-Sync-Origin:` git trailer filters out bot commits.
- **State tracking**: orphan branch `josh-sync-state` stores JSON per target/branch.
## Dependencies
`bash >=4`, `git`, `curl`, `jq`, `yq` ([mikefarah/yq](https://github.com/mikefarah/yq) v4+), `openssh`
`bash >=4`, `git`, `curl`, `jq`, `yq` ([mikefarah/yq](https://github.com/mikefarah/yq) v4+), `openssh`, `rsync`
> The Nix flake bundles all dependencies automatically.
## License

View File

@@ -1 +1 @@
1.0.0
1.2.0

View File

@@ -26,6 +26,7 @@ runs:
run: |
JOSH_DIR="$(mktemp -d)"
cp -r "${{ github.action_path }}/bin" "${{ github.action_path }}/lib" "${JOSH_DIR}/"
cp "${{ github.action_path }}/VERSION" "${JOSH_DIR}/" 2>/dev/null || true
chmod +x "${JOSH_DIR}/bin/josh-sync"
echo "${JOSH_DIR}/bin" >> "$GITHUB_PATH"
echo "JOSH_SYNC_ROOT=${JOSH_DIR}" >> "$GITHUB_ENV"

View File

@@ -9,6 +9,8 @@
# preflight Validate config, connectivity, auth
# import <target> Initial import: pull subrepo into monorepo
# reset <target> Reset subrepo to josh-filtered view
# onboard <target> Import existing subrepo into monorepo (interactive)
# migrate-pr <target> [PR#...] [--all] Move PRs from archived to new subrepo
# status Show target config and sync state
# state show|reset Manage sync state directly
#
@@ -39,6 +41,8 @@ source "${JOSH_LIB_DIR}/auth.sh"
source "${JOSH_LIB_DIR}/state.sh"
# shellcheck source=../lib/sync.sh
source "${JOSH_LIB_DIR}/sync.sh"
# shellcheck source=../lib/onboard.sh
source "${JOSH_LIB_DIR}/onboard.sh"
# ─── Version ────────────────────────────────────────────────────────
@@ -69,6 +73,8 @@ Commands:
preflight Validate config, connectivity, auth, workflow coverage
import <target> Initial import: pull existing subrepo into monorepo (creates PR)
reset <target> Reset subrepo to josh-filtered view (after merging import PR)
onboard <target> Import existing subrepo into monorepo (interactive, resumable)
migrate-pr <target> [PR#...] [--all] Move PRs from archived to new subrepo
status Show target config and sync state
state show <target> [branch] Show sync state JSON
state reset <target> [branch] Reset sync state to {}
@@ -202,13 +208,42 @@ _sync_direction() {
fi
fi
# Run sync
# Check for filter change (forward only — reverse uses same filter)
local result
if [ "$direction" = "forward" ]; then
local prev_filter
prev_filter=$(echo "$state" | jq -r '.last_forward.josh_filter // empty')
# If no filter stored (pre-v1.2 state) but a previous sync exists,
# the old filter was the simple :/subfolder (before exclude was added)
if [ -z "$prev_filter" ]; then
local prev_mono_sha
prev_mono_sha=$(echo "$state" | jq -r '.last_forward.mono_sha // empty')
if [ -n "$prev_mono_sha" ]; then
local subfolder
subfolder=$(echo "$TARGET_JSON" | jq -r '.subfolder')
prev_filter=":/${subfolder}"
fi
fi
if [ -n "$prev_filter" ] && [ "$prev_filter" != "$JOSH_FILTER" ]; then
log "WARN" "Josh filter changed — reconciling histories"
log "INFO" "Old: ${prev_filter}"
log "INFO" "New: ${JOSH_FILTER}"
result=$(reconcile_filter_change)
else
result=$(forward_sync)
fi
else
result=$(reverse_sync)
fi
# If forward sync hit unrelated histories, fall back to reconciliation
if [ "$result" = "unrelated" ]; then
log "WARN" "Unrelated histories detected — falling back to filter reconciliation"
result=$(reconcile_filter_change)
log "INFO" "Reconciliation result: ${result}"
fi
log "INFO" "Result: ${result}"
# Handle warnings
@@ -218,6 +253,7 @@ _sync_direction() {
fi
if [ "$result" = "conflict" ]; then
echo "::warning::Target ${target_name}, branch ${branch}: merge conflict — PR created on subrepo"
continue
fi
if [ "$result" = "josh-rejected" ]; then
echo "::error::Target ${target_name}, branch ${branch}: josh rejected push — check proxy logs"
@@ -234,8 +270,9 @@ _sync_direction() {
--arg s_sha "${subrepo_sha_now:-}" \
--arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--arg status "$result" \
--arg filter "$JOSH_FILTER" \
--argjson prev "$state" \
'$prev + {last_forward: {mono_sha:$m_sha, subrepo_sha:$s_sha, timestamp:$ts, status:$status}}')
'$prev + {last_forward: {mono_sha:$m_sha, subrepo_sha:$s_sha, timestamp:$ts, status:$status, josh_filter:$filter}}')
else
local mono_sha_now
mono_sha_now=$(git rev-parse "origin/${branch}" 2>/dev/null || echo "")
@@ -643,6 +680,173 @@ cmd_state() {
esac
}
# ─── Onboard Command ──────────────────────────────────────────────
cmd_onboard() {
local config_file=".josh-sync.yml"
local target_name=""
local restart=false
while [ $# -gt 0 ]; do
case "$1" in
--config) config_file="$2"; shift 2 ;;
--debug) export JOSH_SYNC_DEBUG=1; shift ;;
--restart) restart=true; shift ;;
-*) die "Unknown flag: $1" ;;
*) target_name="$1"; shift ;;
esac
done
if [ -z "$target_name" ]; then
echo "Usage: josh-sync onboard <target> [--restart]" >&2
parse_config "$config_file"
echo "Available targets:" >&2
echo "$JOSH_SYNC_TARGETS" | jq -r '.[].name' | sed 's/^/ /' >&2
exit 1
fi
parse_config "$config_file"
local target_json
target_json=$(echo "$JOSH_SYNC_TARGETS" | jq -c --arg n "$target_name" '.[] | select(.name == $n)')
[ -n "$target_json" ] || die "Target '${target_name}' not found in config"
log "INFO" "══════ Onboard target: ${target_name} ══════"
load_target "$target_json"
onboard_flow "$target_json" "$restart"
}
# ─── Migrate PR Command ──────────────────────────────────────────
cmd_migrate_pr() {
local config_file=".josh-sync.yml"
local target_name=""
local all=false
local pr_numbers=()
while [ $# -gt 0 ]; do
case "$1" in
--config) config_file="$2"; shift 2 ;;
--debug) export JOSH_SYNC_DEBUG=1; shift ;;
--all) all=true; shift ;;
-*) die "Unknown flag: $1" ;;
*)
if [ -z "$target_name" ]; then
target_name="$1"
else
pr_numbers+=("$1")
fi
shift ;;
esac
done
if [ -z "$target_name" ]; then
echo "Usage: josh-sync migrate-pr <target> [PR#...] [--all]" >&2
parse_config "$config_file"
echo "Available targets:" >&2
echo "$JOSH_SYNC_TARGETS" | jq -r '.[].name' | sed 's/^/ /' >&2
exit 1
fi
parse_config "$config_file"
local target_json
target_json=$(echo "$JOSH_SYNC_TARGETS" | jq -c --arg n "$target_name" '.[] | select(.name == $n)')
[ -n "$target_json" ] || die "Target '${target_name}' not found in config"
load_target "$target_json"
# Load archived repo info from onboard state
local onboard_state archived_api
onboard_state=$(read_onboard_state "$target_name")
archived_api=$(echo "$onboard_state" | jq -r '.archived_api')
if [ -z "$archived_api" ] || [ "$archived_api" = "null" ]; then
die "No archived repo info found. Run 'josh-sync onboard ${target_name}' first."
fi
log "INFO" "Archived repo: ${archived_api}"
# Load already-migrated PR numbers for skip detection and display
local migrated_numbers
migrated_numbers=$(echo "$onboard_state" | jq -r '[.migrated_prs // [] | .[].old_number] | map(tostring) | .[]')
# Counters for summary
local migrated=0 failed=0 skipped=0
# Helper: attempt migration of one PR with counting
_try_migrate() {
local num="$1"
if echo "$migrated_numbers" | grep -qx "$num"; then
log "INFO" "PR #${num} already migrated — skipping"
skipped=$((skipped + 1))
elif migrate_one_pr "$num"; then
migrated=$((migrated + 1))
else
failed=$((failed + 1))
fi
}
if [ "$all" = true ]; then
# Migrate all open PRs from archived repo
local prs
prs=$(list_open_prs "$archived_api" "$SUBREPO_TOKEN") \
|| die "Failed to list PRs on archived repo"
local count
count=$(echo "$prs" | jq 'length')
log "INFO" "Found ${count} open PR(s) on archived repo"
while read -r num; do
_try_migrate "$num"
done < <(echo "$prs" | jq -r '.[] | .number')
elif [ ${#pr_numbers[@]} -gt 0 ]; then
# Migrate specific PR numbers
for num in "${pr_numbers[@]}"; do
_try_migrate "$num"
done
else
# Interactive: list open PRs, let user pick
local prs
prs=$(list_open_prs "$archived_api" "$SUBREPO_TOKEN") \
|| die "Failed to list PRs on archived repo"
local count
count=$(echo "$prs" | jq 'length')
if [ "$count" -eq 0 ]; then
log "INFO" "No open PRs on archived repo"
return
fi
# Display PRs with [migrated] marker for already-processed ones
echo "" >&2
echo "Open PRs on archived repo:" >&2
while IFS=$'\t' read -r num title base_ref head_ref; do
if echo "$migrated_numbers" | grep -qx "$num"; then
echo " #${num}: ${title} (${base_ref} <- ${head_ref}) [migrated]" >&2
else
echo " #${num}: ${title} (${base_ref} <- ${head_ref})" >&2
fi
done < <(echo "$prs" | jq -r '.[] | "\(.number)\t\(.title)\t\(.base.ref)\t\(.head.ref)"')
echo "" >&2
echo "Enter PR numbers to migrate (space-separated), or 'all':" >&2
local selection
read -r selection
if [ "$selection" = "all" ]; then
while read -r num; do
_try_migrate "$num"
done < <(echo "$prs" | jq -r '.[] | .number')
else
for num in $selection; do
_try_migrate "$num"
done
fi
fi
log "INFO" "Migration complete: ${migrated} migrated, ${failed} failed, ${skipped} skipped"
}
# ─── Main ───────────────────────────────────────────────────────────
main() {
@@ -666,6 +870,8 @@ main() {
preflight) cmd_preflight "$@" ;;
import) cmd_import "$@" ;;
reset) cmd_reset "$@" ;;
onboard) cmd_onboard "$@" ;;
migrate-pr) cmd_migrate_pr "$@" ;;
status) cmd_status "$@" ;;
state) cmd_state "$@" ;;
*)

View File

@@ -0,0 +1,42 @@
# ADR-001: Josh-proxy for Bidirectional Sync
**Status:** Accepted
**Date:** 2026-01
## Context
We need bidirectional sync between a monorepo and N external subrepos. Each subrepo corresponds to a subfolder in the monorepo. Developers on both sides should see a clean, complete git history — not synthetic commits or squashed blobs.
### Alternatives considered
1. **git subtree**: Built into git. `git subtree split` extracts a subfolder into a standalone repo. However, subtree split rewrites history on every run (O(n) on total commits), creating new SHAs each time. Bidirectional sync requires manual `subtree merge` with conflict-prone history grafting. No transport-layer filtering — all content must be fetched.
2. **git submodule**: Tracks external repos via `.gitmodules` pointer commits. Does not provide content-level integration — monorepo commits don't contain subrepo files directly. Developers must run `git submodule update`. Bidirectional sync is not a supported workflow.
3. **Custom diff-and-patch scripts**: Compute diffs between monorepo subfolder and subrepo, apply patches in both directions. Fragile with renames, binary files, and merge conflicts. Loses authorship and commit granularity.
4. **josh-proxy**: A git proxy that computes filtered views of repositories in real-time. Clients `git clone` through josh and receive a repo containing only the specified subfolder, with history rewritten to match. Josh maintains a persistent SHA mapping, so the same monorepo commit always produces the same filtered SHA. Bidirectional: pushing back through josh maps filtered commits to monorepo commits.
## Decision
Use josh-proxy as the transport layer for all sync operations.
## Consequences
**Positive:**
- Clean git history in both directions — no synthetic commits
- Deterministic SHA mapping — same monorepo state always produces same filtered SHA
- Bidirectional by design — push through josh maps back to monorepo
- Transport-layer filtering — content exclusion happens at clone/push time, not via generated files
- Supports any git hosting platform (Gitea, GitHub, GitLab) since it's a proxy
**Negative:**
- Requires running a josh-proxy instance (operational overhead)
- Josh-proxy is a Rust project with a smaller community than git-native tools
- Proxy must have network access to the monorepo's git server
- Josh's SHA mapping is opaque — debugging requires understanding josh internals
- First-parent traversal behavior must be respected in merge commits (see ADR-008)
**Risks:**
- Josh-proxy downtime blocks all sync operations
- Josh-proxy bugs could corrupt history mapping (mitigated by force-with-lease on forward, always-PR on reverse)

View File

@@ -0,0 +1,50 @@
# ADR-002: State Storage on Orphan Git Branch
**Status:** Accepted
**Date:** 2026-01
## Context
Josh-sync needs persistent state to track what has already been synced (last-synced commit SHAs, timestamps, status). This prevents re-syncing unchanged content and enables incremental operation. The state must survive CI runner teardown — runners are ephemeral containers.
### Alternatives considered
1. **File in the repo**: Commit a state JSON file to the monorepo. Every sync run creates a commit, polluting history. Race conditions when multiple sync jobs run concurrently.
2. **External database/KV store**: Redis, SQLite, or a cloud KV service. Adds an infrastructure dependency. Credentials and connectivity to manage.
3. **CI artifacts/cache**: Platform-specific (GitHub Actions cache, Gitea cache). Not portable across CI platforms. Expiry policies vary.
4. **Orphan git branch**: A branch with no parent relationship to the main history. Stores JSON files in a simple `<target>/<branch>.json` layout. Pushed to origin, so it survives runner teardown. No external dependencies — uses git itself.
## Decision
Store sync state as JSON files on an orphan branch (`josh-sync-state`) in the monorepo.
### Storage layout
```
origin/josh-sync-state/
<target>/<branch>.json # sync state per target/branch
<target>/onboard.json # onboard workflow state (v1.1+)
```
### Implementation
- `read_state()`: `git fetch origin josh-sync-state && git show origin/josh-sync-state:<key>.json`
- `write_state()`: Uses `git worktree` to check out the orphan branch in a temp directory, writes JSON, commits, and pushes. This avoids touching the main working tree.
## Consequences
**Positive:**
- Zero external dependencies — only git
- Portable across CI platforms (Gitea Actions, GitHub Actions, local)
- Human-readable JSON files — easy to inspect and debug
- Atomic updates via git commit + push
- Natural namespacing via directory structure
**Negative:**
- Concurrent writes can race (mitigated by concurrency groups in CI workflows)
- `git worktree` adds complexity to the write path
- State branch appears in `git branch -a` output (minor clutter)
- Push failures on the state branch are non-fatal (logged as warning, sync still succeeds)

View File

@@ -0,0 +1,33 @@
# ADR-003: Force-with-Lease for Forward Sync
**Status:** Accepted
**Date:** 2026-01
## Context
Forward sync pushes monorepo changes to the subrepo. If someone pushes directly to the subrepo between when josh-sync reads its HEAD and when josh-sync pushes, a naive `git push` would overwrite their work. A `git push --force` would be worse — it would silently destroy concurrent changes.
## Decision
Use `git push --force-with-lease=refs/heads/<branch>:<expected-sha>` for all forward sync pushes. The expected SHA is recorded at the start of the sync operation (the "lease").
### How it works
1. Record subrepo HEAD SHA before any operations: `subrepo_sha=$(subrepo_ls_remote "$branch")`
2. Perform merge of monorepo changes onto subrepo state
3. Push with explicit lease: `--force-with-lease=refs/heads/main:<subrepo_sha>`
4. If the subrepo HEAD changed since step 1, git rejects the push
5. Josh-sync reports `lease-rejected` and retries on the next run
## Consequences
**Positive:**
- Never overwrites concurrent changes — git atomically checks the expected SHA
- Explicit SHA lease (not just "current tracking ref") prevents stale-ref bugs
- Failed leases are retried on the next sync run — no data loss, just delay
- Works correctly with josh-proxy's SHA mapping
**Negative:**
- Lease-rejected means the sync run did work that gets discarded (clone, merge, etc.)
- Persistent lease failures indicate a concurrent push pattern that needs investigation
- Requires the `--force-with-lease` flag with explicit SHA — the shorthand form (`--force-with-lease` without `=`) is unsafe because it uses the local tracking ref, which may be stale

View File

@@ -0,0 +1,41 @@
# ADR-004: Always-PR Policy for Reverse Sync
**Status:** Accepted
**Date:** 2026-01
## Context
Reverse sync brings subrepo changes back into the monorepo. The monorepo is the source of truth and typically has CI checks, code review requirements, and branch protection rules. Pushing directly to the monorepo's main branch would bypass these safeguards.
### Alternatives considered
1. **Direct push**: Fast, but bypasses all review and CI. A bad subrepo commit could break the entire monorepo with no review gate.
2. **Always create a PR**: Pushes to a staging branch (`auto-sync/subrepo-<branch>-<timestamp>`), then creates a PR via API. Humans review and merge.
3. **Configurable per-target**: Let users choose direct push vs PR. Adds complexity and a dangerous default.
## Decision
Reverse sync always creates a PR on the monorepo. Never pushes directly to the target branch.
### Implementation
1. Push subrepo HEAD through josh-proxy to a staging branch: `git push -o "base=main" josh://... HEAD:refs/heads/auto-sync/subrepo-main-<ts>`
2. Create PR via Gitea/GitHub API targeting the monorepo's main branch
3. PR includes a review checklist: scoped to subfolder, no leaked credentials, CI passes
The `-o "base=main"` option tells josh-proxy which monorepo branch to map the push against.
## Consequences
**Positive:**
- All monorepo changes go through review — consistent with team workflow
- CI runs on the PR branch before merge
- Bad subrepo changes are caught before they affect the monorepo
- Audit trail via PR history
**Negative:**
- Reverse sync is not instant — requires human action to merge the PR
- Stale PRs accumulate if subrepo changes frequently but PRs aren't merged promptly
- Adds API dependency (needs token with PR creation scope)

View File

@@ -0,0 +1,52 @@
# ADR-005: Git Trailer for Loop Prevention
**Status:** Accepted
**Date:** 2026-01
## Context
Bidirectional sync creates an infinite loop risk: forward sync pushes commit A to the subrepo, reverse sync sees commit A as "new" and creates a PR back to the monorepo, forward sync sees the merged PR as "new" and pushes again, etc.
### Alternatives considered
1. **SHA tracking only**: Compare SHAs to skip already-synced content. Breaks when josh-proxy rewrites SHAs (which it always does for filtered views). The monorepo commit SHA and the filtered/subrepo commit SHA are never the same.
2. **Commit message prefix**: Add `[sync]` to bot commit messages. Fragile — humans might use the same prefix. Requires string matching on message content.
3. **Git trailer**: A structured key-value pair in the commit message body (after a blank line), following the `git interpret-trailers` convention. Format: `Key: value`. Machine-parseable, unlikely to be used by humans, and supported by `git log --grep`.
## Decision
All bot commits include a git trailer with a configurable key (default: `Josh-Sync-Origin`). Both sync directions filter out commits containing this trailer.
### Format
```
Sync from monorepo 2026-02-12T10:30:00Z
Josh-Sync-Origin: forward/main/2026-02-12T10:30:00Z
```
The trailer value encodes: direction, branch, and timestamp. This aids debugging but is not parsed by the loop filter — only the trailer key presence matters.
### Filtering
- **Reverse sync**: `git log --invert-grep --grep="^${BOT_TRAILER}:"` excludes all commits with the trailer
- **CI loop guard**: The composite action checks if HEAD commit has the trailer before running sync at all
### Configuration
The trailer key is set in `.josh-sync.yml` under `bot.trailer`. This allows multiple josh-sync instances (with different bots) to operate on the same repos without interfering.
## Consequences
**Positive:**
- Reliable loop prevention — trailer is part of the immutable commit object
- Configurable key avoids conflicts between multiple sync bots
- Human-readable — `git log` shows the trailer in commit messages
- CI loop guard prevents unnecessary sync runs entirely
**Negative:**
- Commits with manually-added trailers matching the key would be incorrectly filtered
- Trailer must be in the commit body (after blank line), not the subject line
- Squash-and-merge on PRs may lose the trailer if the platform doesn't preserve commit message body

View File

@@ -0,0 +1,55 @@
# ADR-006: Inline Exclude in Josh-Proxy URL
**Status:** Accepted
**Date:** 2026-02
## Context
Some files in a monorepo subfolder should not appear in the subrepo (e.g., monorepo-specific CI configs, internal tooling, secrets templates). We need a mechanism to exclude these files from sync.
### Alternatives considered
1. **`.josh-sync-exclude` file committed to the repo**: A gitignore-style file listing patterns. Requires generating and committing a file. Changes to the exclude list create commits. The file itself would need to be excluded from the subrepo (circular dependency).
2. **Post-clone file deletion**: Clone through josh, then `rm -rf` excluded paths before pushing. Fragile — deletions create diff noise. Doesn't work for reverse sync (excluded files would appear as "deleted" in the subrepo).
3. **Josh `:exclude` filter inline in the URL**: Josh-proxy supports `:exclude[::pattern1,::pattern2]` appended to the filter path. The exclusion happens at the transport layer — git objects for excluded files are never transferred. Works identically for clone (forward) and push (reverse).
4. **Separate josh filter file**: Generate a josh filter expression and store it somewhere. Adds state management complexity.
## Decision
Embed exclusion patterns inline in the josh-proxy URL using josh's native `:exclude` syntax. The `exclude` config field in `.josh-sync.yml` is transformed at config parse time into the josh filter string.
### Example
Config:
```yaml
exclude:
- ".monorepo/"
- "**/internal/"
```
Produces josh filter:
```
:/services/billing:exclude[::.monorepo/,::**/internal/]
```
### Implementation
The `parse_config()` function in `lib/config.sh` uses jq to conditionally append `:exclude[...]` to the josh filter when the `exclude` array is non-empty. The enriched filter is stored in `JOSH_SYNC_TARGETS` JSON and used everywhere via `$JOSH_FILTER`.
## Consequences
**Positive:**
- Zero committed files — exclusion is purely in the URL
- Transport-layer filtering — excluded content never leaves the git server
- Works identically for forward sync (clone), reverse sync (push), and reset
- Tree comparison (`skip` detection) works correctly since excluded files aren't in the filtered view
- Standard josh syntax — no custom invention
**Negative:**
- Josh's `:exclude` pattern syntax is limited (no negation, no regex — only glob-style patterns with `::` prefix)
- Long exclude lists make the URL unwieldy (though this is cosmetic — git handles long URLs fine)
- Changing the exclude list changes the josh filter, which changes all filtered SHAs (see ADR-007 for how this is handled)
- Debugging requires understanding josh's filter composition syntax

View File

@@ -0,0 +1,53 @@
# ADR-007: Reconciliation Merge for Filter Changes
**Status:** Accepted
**Date:** 2026-02
## Context
When the josh filter changes (e.g., adding exclude patterns), josh-proxy recomputes the entire filtered history with new SHAs. The subrepo's existing history (based on the old filter) shares no common ancestor with the new filtered history. A naive forward sync would see "unrelated histories" and fail.
### Alternatives considered
1. **Force-push to subrepo**: Replace subrepo history with the new filtered view (same as `josh-sync reset`). Destructive — all local clones become invalid, open PRs are orphaned, developers must re-clone.
2. **Cherry-pick new commits**: Identify commits that exist in the new filtered history but not the old, cherry-pick them onto the subrepo. Complex — the "same" commit has different SHAs in old vs new filtered history. No reliable way to match them.
3. **Reconciliation merge commit**: Create a merge commit on the subrepo that has both the new filtered HEAD and the old subrepo HEAD as parents, using the new filtered tree. This establishes shared ancestry without rewriting history.
## Decision
When josh-sync detects a filter change (stored filter in state differs from current `$JOSH_FILTER`), create a reconciliation merge commit using `git commit-tree`.
### How it works
1. Clone subrepo (has old history)
2. Fetch josh-proxy filtered view (has new history)
3. If trees are identical → skip (filter change had no effect on content)
4. Create merge commit: `git commit-tree <josh-tree> -p <josh-head> -p <subrepo-head>`
5. Push with `--force-with-lease`
The merge commit uses the josh-filtered tree (new content) and has two parents:
- **Parent 1**: josh-filtered HEAD (new filter history) — must be first (see ADR-008)
- **Parent 2**: subrepo HEAD (old filter history) — preserves old history as a side branch
### Detection
Filter change is detected by comparing the stored `josh_filter` in sync state with the current `$JOSH_FILTER`. For pre-v1.2 state (no filter stored), the old filter is derived as `:/<subfolder>`.
As a reactive fallback, `forward_sync()` also detects unrelated histories via `git merge-base` and falls back to reconciliation.
## Consequences
**Positive:**
- Non-destructive — old history is preserved as parent 2 of the merge
- Developers don't need to re-clone the subrepo
- Open PRs on the subrepo remain valid (they're based on commits that are still ancestors)
- Automatic — no manual intervention needed when changing exclude patterns
- Force-with-lease protects against concurrent changes during reconciliation
**Negative:**
- The merge commit is synthetic (created by bot, not a real merge of concurrent work)
- Parent ordering is critical — wrong order breaks josh's reverse mapping (see ADR-008)
- The reconciliation merge contains a bot trailer, so reverse sync correctly ignores it
- If the subrepo has diverged significantly (manual commits during filter change), the reconciliation merge may produce unexpected tree content (uses josh-filtered tree unconditionally)

View File

@@ -0,0 +1,42 @@
# ADR-008: First-Parent Ordering in Reconciliation Merges
**Status:** Accepted
**Date:** 2026-02
## Context
Josh-proxy uses **first-parent traversal** when mapping subrepo history back to the monorepo. When you push a commit through josh-proxy, josh walks the first-parent chain to find a commit it can map to a monorepo commit. If the first parent leads to unmappable history, josh cannot reconstruct the monorepo-side branch correctly.
This became critical when the reconciliation merge (ADR-007) initially had the wrong parent order: old subrepo history as parent 1, josh-filtered as parent 2. Josh followed parent 1, couldn't find any mappable commit, and created a monorepo branch containing only the subrepo subfolder content — effectively deleting 1280 files from the rest of the monorepo.
## Decision
In reconciliation merge commits, the josh-filtered HEAD **must be parent 1** (first parent). The old subrepo HEAD is parent 2.
```bash
git commit-tree "$josh_tree" \
-p "$josh_head" \ # parent 1: josh-filtered — josh follows this
-p "$subrepo_head" \ # parent 2: old history — side branch, ignored by josh
-m "..."
```
### Why this is safe
- The old subrepo HEAD (`subrepo_head`) is still an ancestor of the merge commit regardless of parent order — push succeeds either way
- `--ancestry-path` in reverse sync still follows `B → M → C` regardless of parent order (it traces all paths, not just first-parent)
- Josh follows first-parent and finds the josh-filtered commit, which maps cleanly back to the monorepo
## Consequences
**Positive:**
- Josh can map the reconciliation merge back to the monorepo correctly
- Reverse sync through josh produces correct diffs (only subrepo-scoped changes)
- `git log --first-parent` on the subrepo shows the clean josh-filtered lineage
**Negative:**
- This is a subtle invariant — future changes to merge commit creation must preserve parent order
- The constraint is undocumented in josh-proxy's own documentation (discovered empirically)
- No automated test can verify this without a running josh-proxy instance
**Lesson learned:**
Parent order in `git commit-tree -p` is not cosmetic. For tools that rely on first-parent traversal (josh-proxy, `git log --first-parent`), parent 1 must be the "mainline" that the tool should follow.

View File

@@ -0,0 +1,53 @@
# ADR-009: Tree Comparison as Sync Skip Guard
**Status:** Accepted
**Date:** 2026-02
## Context
Both forward and reverse sync need to detect "nothing to do" quickly. The primary mechanism is SHA comparison against stored state (last-synced SHA). However, this misses cases where:
- State is reset or lost
- Reconciliation merges change SHAs without changing content
- Multiple sync runs overlap
Additionally, reverse sync originally relied on `git log <base>..HEAD` to find new commits. After a reconciliation merge, the `..` range can leak old subrepo history through the merge's second parent, creating false positives.
## Decision
Add tree-level comparison as an early skip guard in both forward and reverse sync. Compare the git tree objects (which represent directory content, not commit history) to determine if there's actually any content difference.
### Forward sync
```bash
mono_tree=$(git rev-parse 'HEAD^{tree}')
subrepo_tree=$(git rev-parse "subrepo/${branch}^{tree}")
[ "$mono_tree" = "$subrepo_tree" ] && echo "skip"
```
### Reverse sync
```bash
subrepo_tree=$(git rev-parse "HEAD^{tree}")
josh_tree=$(git rev-parse "mono-filtered/${branch}^{tree}")
[ "$subrepo_tree" = "$josh_tree" ] && echo "skip"
```
Tree comparison happens **before** commit log analysis. If trees are identical, there is definitionally nothing to sync, regardless of what the commit history looks like.
### Combined with `--ancestry-path`
For reverse sync, even when trees differ, `git log --ancestry-path` restricts the commit range to the direct lineage between the two endpoints. This prevents old history from leaking through reconciliation merge parents.
## Consequences
**Positive:**
- Eliminates false positives from reconciliation merges (trees are identical after reconciliation)
- Fast — tree SHA comparison is O(1), no content traversal
- Correct by definition — if trees match, content is identical
- Defense in depth — works even when state tracking has gaps
**Negative:**
- Tree comparison alone doesn't tell you *which* commits are new (still need `git log` for PR descriptions)
- Adds an extra `git rev-parse` call per sync direction (negligible cost)
- Cannot detect file-mode-only changes if josh normalizes modes (theoretical edge case)

View File

@@ -0,0 +1,76 @@
# ADR-010: Onboard Workflow with Checkpoint/Resume
**Status:** Accepted
**Date:** 2026-02
## Context
Onboarding an existing subrepo into the monorepo is a multi-step process that involves human interaction (renaming repos, merging PRs). The full flow is:
1. Prerequisites: rename existing repo, create new empty repo
2. Import: copy subrepo content into monorepo, create import PR(s)
3. Wait: human merges the import PR(s)
4. Reset: force-push josh-filtered history to the new empty repo
5. (Optional) Migrate open PRs from archived repo
Each step can fail or be interrupted. The process may span hours or days (waiting for PR review). If interrupted, restarting from scratch wastes work and can create duplicate PRs.
### Alternatives considered
1. **Single-shot script**: Run all steps in sequence. If interrupted, must restart from scratch. Duplicate PRs if import step is re-run.
2. **Manual step-by-step commands**: `import`, then manually run `reset`. Simple but error-prone — users may forget steps or run them out of order.
3. **Checkpoint/resume with persistent state**: Track the current step and intermediate results (PR numbers, reset branches) in persistent state. On re-run, resume from the last completed step.
## Decision
Implement `josh-sync onboard` as a checkpoint/resume workflow with state stored on the `josh-sync-state` branch at `<target>/onboard.json`.
### State machine
```
start → importing → waiting-for-merge → resetting → complete
```
Each transition is persisted before proceeding. Re-running `josh-sync onboard <target>` reads the current step and resumes.
### State schema
```json
{
"step": "waiting-for-merge",
"archived_api": "https://host/api/v1/repos/org/repo-archived",
"archived_url": "git@host:org/repo-archived.git",
"archived_auth": "ssh",
"import_prs": { "main": 42 },
"reset_branches": ["main"],
"migrated_prs": [
{ "old_number": 5, "new_number": 12, "title": "Fix login" }
],
"timestamp": "2026-02-10T14:30:00Z"
}
```
### Per-branch progress
Import and reset both iterate over branches. Progress is saved after each branch, so interruption mid-iteration resumes at the next unprocessed branch.
### PR migration
`josh-sync migrate-pr` is a separate command that reads onboard state (for the archived repo URL) and tracks migrated PRs. It uses `git apply --3way` for resilient patch application — the subrepo's content is identical after reset, so patches apply cleanly.
## Consequences
**Positive:**
- Safe to interrupt at any point — no duplicate work on resume
- Per-branch tracking prevents duplicate import PRs or redundant resets
- Archived repo URL stored in state — `migrate-pr` can operate independently
- `--restart` flag allows starting over if state is corrupted
- Human-friendly — prints instructions at each step
**Negative:**
- State management adds complexity (read/write onboard state, step validation)
- Interactive steps (`read -r`) are not suitable for fully automated pipelines
- Onboard state persists on the state branch even after completion (minor clutter)
- The step machine is linear — cannot skip steps or run them out of order

18
docs/adr/README.md Normal file
View File

@@ -0,0 +1,18 @@
# Architecture Decision Records
This directory contains Architecture Decision Records (ADRs) for josh-sync. Each ADR documents a significant design decision, its context, the alternatives considered, and the rationale for the chosen approach.
## Index
| ADR | Title | Status |
|-----|-------|--------|
| [001](001-josh-proxy-for-sync.md) | Josh-proxy for bidirectional sync | Accepted |
| [002](002-state-on-orphan-branch.md) | State storage on orphan git branch | Accepted |
| [003](003-force-with-lease-forward.md) | Force-with-lease for forward sync | Accepted |
| [004](004-always-pr-reverse.md) | Always-PR policy for reverse sync | Accepted |
| [005](005-git-trailer-loop-prevention.md) | Git trailer for loop prevention | Accepted |
| [006](006-inline-exclude-filter.md) | Inline exclude in josh-proxy URL | Accepted |
| [007](007-reconciliation-merge.md) | Reconciliation merge for filter changes | Accepted |
| [008](008-first-parent-ordering.md) | First-parent ordering in reconciliation merges | Accepted |
| [009](009-tree-comparison-guard.md) | Tree comparison as sync skip guard | Accepted |
| [010](010-onboard-checkpoint-resume.md) | Onboard workflow with checkpoint/resume | Accepted |

View File

@@ -32,6 +32,7 @@ Each target maps a monorepo subfolder to an external subrepo.
| `subrepo_ssh_key_var` | string | No | `"SUBREPO_SSH_KEY"` | Name of the env var holding the SSH private key for this target. |
| `branches` | object | Yes | — | Branch mapping: `mono_branch: subrepo_branch`. Each key-value pair syncs those branches bidirectionally. |
| `forward_only` | string[] | No | `[]` | Branches that only sync mono → subrepo, never reverse. |
| `exclude` | string[] | No | `[]` | File/directory patterns to exclude from sync via josh `:exclude` filter. Excluded files exist only in the monorepo, never in the subrepo. See [Excluding Files](guide.md#excluding-files-from-sync). |
## `bot` Section

View File

@@ -91,6 +91,9 @@ targets:
branches:
main: main # mono_branch: subrepo_branch
forward_only: []
exclude: # files excluded from subrepo (optional)
- ".monorepo/" # monorepo-only config dir
- "**/internal/" # internal dirs at any depth
- name: "auth"
subfolder: "services/auth"
@@ -165,6 +168,34 @@ SUBREPO_SSH_KEY="-----BEGIN OPENSSH PRIVATE KEY-----
# AUTH_REPO_TOKEN=<auth-specific-token>
```
### Updating josh-sync in devenv
To update to the latest version:
```bash
devenv update josh-sync
```
Or with plain Nix flakes:
```bash
nix flake lock --update-input josh-sync
```
To pin to a specific version, use a tag ref in `devenv.yaml`:
```yaml
josh-sync:
url: git+https://your-gitea.example.com/org/josh-sync?ref=refs/tags/v1.2
flake: true
```
After updating, verify the version:
```bash
josh-sync --version
```
### Option B: Manual installation
Install the required tools, then either:
@@ -189,11 +220,61 @@ For a new monorepo before import, preflight may warn that subfolders don't exist
## Step 5: Import Existing Subrepos
This is the critical onboarding step. For each existing subrepo, you run a three-step cycle: **import → merge → reset**.
This is the critical onboarding step. There are two approaches:
- **`josh-sync onboard`** (recommended) — interactive, resumable, preserves open PRs
- **Manual `import` → merge → `reset`** — lower-level, for automation or when there are no open PRs to preserve
### Option A: Onboard (recommended)
The `onboard` command walks you through the entire process interactively, with checkpoint/resume at every step.
**Before you start:**
1. **Rename** the existing subrepo on your Git server (e.g., `stores/storefront``stores/storefront-archived`)
2. **Create a new empty repo** at the original path (e.g., a new `stores/storefront` with no commits)
The rename preserves the archived repo with all its history and open PRs. The new empty repo will receive josh-filtered history.
**Run onboard:**
```bash
josh-sync onboard billing
```
The command will:
1. **Verify prerequisites** — checks the new empty repo is reachable, asks for the archived repo URL
2. **Import** — copies subrepo content into monorepo and creates import PRs (one per branch)
3. **Wait for merge** — shows PR numbers and waits for you to merge them
4. **Reset** — pushes josh-filtered history to the new subrepo (per-branch, with resume)
5. **Done** — prints instructions for developers and PR migration
If the process is interrupted at any point, re-run `josh-sync onboard billing` to resume from where it left off. Use `--restart` to start over.
**Migrate open PRs:**
After onboard completes, migrate PRs from the archived repo to the new one:
```bash
# Interactive — lists open PRs and lets you pick
josh-sync migrate-pr billing
# Migrate all open PRs at once
josh-sync migrate-pr billing --all
# Migrate specific PRs by number
josh-sync migrate-pr billing 5 8 12
```
PR migration works by fetching the diff from the archived repo's PR, applying it to the new repo, and creating a new PR. File content is identical after reset, so patches apply cleanly.
### Option B: Manual import → merge → reset
Use this when the subrepo has no open PRs to preserve, or for scripted automation.
> Do this **one target at a time** to keep PRs reviewable.
### 5a. Import
#### 5b-1. Import
```bash
josh-sync import billing
@@ -208,13 +289,13 @@ This:
Review the import PR — check for leaked credentials, environment-specific config, or files that shouldn't be in the monorepo.
### 5b. Merge the import PR
#### 5b-2. Merge the import PR
Merge the PR using your Git platform's UI. This lands the subrepo content into the monorepo's main branch.
> At this point, the monorepo has the content but the histories are disconnected. Sync will **not** work until you complete the reset step.
### 5c. Reset
#### 5b-3. Reset
```bash
josh-sync reset billing
@@ -228,9 +309,20 @@ This:
This establishes **shared commit ancestry** between josh's filtered view and the subrepo. Without this, josh-proxy can't compute diffs between the two.
> **Warning:** This is a destructive force-push that replaces the subrepo's history. Back up any important branches or tags in the subrepo beforehand.
> **Warning:** This is a destructive force-push that replaces the subrepo's history. Back up any important branches or tags in the subrepo beforehand. Merge or close all open pull requests on the subrepo first — they will be invalidated.
### 5d. Repeat for each target
After reset, **every developer with a local clone of the subrepo** must update their local copy to match the new history:
```bash
cd /path/to/local-subrepo
git fetch origin
git checkout main && git reset --hard origin/main
git checkout stage && git reset --hard origin/stage # repeat for each branch
```
Or simply delete and re-clone the subrepo. Local-only branches (not pushed to the remote) will be lost either way.
#### 5b-4. Repeat for each target
```
For each target:
@@ -239,9 +331,9 @@ For each target:
3. josh-sync reset <target>
```
### 5e. Verify
### Verify
After all targets are imported and reset:
After all targets are imported and reset (whichever option you used):
```bash
# Check all targets show state
@@ -426,6 +518,65 @@ Bot commits include a git trailer like `Josh-Sync-Origin: forward/main/2024-02-1
Sync state is stored as JSON files on an orphan branch (`josh-sync-state`), one file per target/branch. This tracks the last-synced commit SHAs and timestamps to avoid re-syncing the same changes.
## Excluding Files from Sync
Some files in the monorepo subfolder may not belong in the subrepo (e.g., monorepo-specific CI configs, internal tooling). The `exclude` config field removes these at the josh-proxy layer — excluded files never appear in the subrepo.
### Configuration
Add an `exclude` list to any target:
```yaml
targets:
- name: "billing"
subfolder: "services/billing"
subrepo_url: "git@host:org/billing.git"
exclude:
- ".monorepo/" # directory at subfolder root
- "**/internal/" # directory at any depth
- "*.secret" # files by extension
branches:
main: main
```
### How it works
When `exclude` is present, josh-sync appends an inline `:exclude` filter to the josh-proxy URL. For the example above, the josh filter becomes:
```
:/services/billing:exclude[::.monorepo/,::**/internal/,::*.secret]
```
Josh-proxy applies this filter at the transport layer — no extra files to generate or commit. This means:
- **Forward sync**: the filtered clone already excludes the files
- **Reverse sync**: pushes through josh also respect the exclusion
- **Reset**: the subrepo history never contains excluded files
- **Tree comparison**: `skip` detection works correctly (excluded files are not in the diff)
### Pattern syntax
Josh uses `::` patterns inside `:exclude[...]`:
| Pattern | Matches |
|---------|---------|
| `dir/` | Directory at subfolder root |
| `file` | File at subfolder root |
| `**/dir/` | Directory at any depth |
| `**/file` | File at any depth |
| `*.ext` | Glob pattern (single `*` only) |
### Setup
1. Add `exclude` to the target in `.josh-sync.yml`
2. Run `josh-sync preflight` to verify the filter works
3. Forward sync will now exclude the specified files
No extra files to generate or commit — the exclusion is embedded directly in the josh-proxy URL.
### Changing the exclude list
You can safely add or remove patterns from `exclude` at any time. When josh-sync detects that the filter has changed since the last sync, it automatically creates a reconciliation merge commit on the subrepo that connects the old and new histories — no manual reset or force-push required. Developers do not need to re-clone the subrepo.
## Adding a New Target
To add a new subrepo after initial setup:
@@ -433,8 +584,12 @@ To add a new subrepo after initial setup:
1. Add the target to `.josh-sync.yml`
2. Update the forward workflow's `paths:` list to include the new subfolder
3. Commit and push
4. Run the import-merge-reset cycle for the new target:
4. Import the target:
```bash
# Recommended: interactive onboard (preserves open PRs)
josh-sync onboard new-target
# Or manual: import → merge PR → reset
josh-sync import new-target
# merge the PR
josh-sync reset new-target
@@ -471,6 +626,24 @@ The subfolder already contains the same content as the subrepo. This is fine —
Verify `bot.trailer` in config matches what's in commit messages. Check the loop guard in the CI workflow is active.
### "cannot lock ref" or "expected X but got Y"
**After reset (subrepo):** The subrepo's history was replaced by force-push. Local clones still have the old history:
```bash
cd /path/to/subrepo
git fetch origin
git checkout main && git reset --hard origin/main
```
Or simply delete and re-clone.
**After import/reset cycle (monorepo):** The import and reset steps create and update branches rapidly (`auto-sync/import-*`, `josh-sync-state`). If your local clone fetched partway through, tracking refs go stale:
```bash
git remote prune origin && git pull
```
### State issues
```bash

View File

@@ -5,12 +5,12 @@
# In devenv.yaml:
# inputs:
# josh-sync:
# url: github:org/josh-sync/v1.0.0
# url: git+https://your-gitea.example.com/org/josh-sync?ref=refs/tags/v1.2
# flake: true
#
# Or in flake.nix:
# inputs.josh-sync = {
# url = "github:org/josh-sync/v1.0.0";
# url = "git+https://your-gitea.example.com/org/josh-sync?ref=refs/tags/v1.2";
# inputs.nixpkgs.follows = "nixpkgs";
# };
@@ -26,6 +26,8 @@
# josh-sync preflight Validate config and connectivity
# josh-sync import <target> Initial import from subrepo
# josh-sync reset <target> Reset subrepo to josh-filtered view
# josh-sync onboard <target> Interactive import + reset workflow
# josh-sync migrate-pr <target> Migrate PRs from archived repo
# josh-sync status Show target config and sync state
# josh-sync state show <t> [b] Show state JSON
# josh-sync state reset <t> [b] Reset state

View File

@@ -62,7 +62,7 @@ jobs:
done | sort -u | paste -sd ',' -)
echo "targets=${TARGETS}" >> "$GITHUB_OUTPUT"
- uses: https://your-gitea.example.com/org/josh-sync@v1
- uses: https://your-gitea.example.com/org/josh-sync@v1.2
with:
direction: forward
target: ${{ github.event.inputs.target || steps.detect.outputs.targets }}

View File

@@ -10,17 +10,19 @@ josh:
targets:
- name: "billing"
subfolder: "services/billing"
josh_filter: ":/services/billing"
# josh_filter auto-derived as ":/services/billing" if omitted
subrepo_url: "https://gitea.example.com/ext/billing.git"
subrepo_auth: "https"
branches:
main: main
develop: develop
forward_only: []
exclude: # files excluded from subrepo (optional)
- ".monorepo/" # directory at subfolder root
- "**/internal/" # directory at any depth
- name: "auth"
subfolder: "services/auth"
josh_filter: ":/services/auth"
subrepo_url: "git@gitea.example.com:ext/auth.git"
subrepo_auth: "ssh"
# Per-target credential override (reads from $AUTH_SSH_KEY instead of $SUBREPO_SSH_KEY)
@@ -31,7 +33,6 @@ targets:
- name: "shared-lib"
subfolder: "libs/shared"
josh_filter: ":/libs/shared"
subrepo_url: "https://gitea.example.com/ext/shared-lib.git"
branches:
main: main

View File

@@ -40,7 +40,7 @@ jobs:
curl -sL "https://github.com/mikefarah/yq/releases/download/v4.44.6/yq_linux_amd64" \
-o /usr/local/bin/yq && chmod +x /usr/local/bin/yq
- uses: https://your-gitea.example.com/org/josh-sync@v1
- uses: https://your-gitea.example.com/org/josh-sync@v1.2
with:
direction: reverse
target: ${{ github.event.inputs.target || '' }}

View File

@@ -28,6 +28,7 @@
installPhase = ''
mkdir -p $out/{bin,lib}
cp VERSION $out/
cp lib/*.sh $out/lib/
cp bin/josh-sync $out/bin/
chmod +x $out/bin/josh-sync

View File

@@ -39,16 +39,15 @@ subrepo_ls_remote() {
}
# ─── PR Creation ────────────────────────────────────────────────────
# Shared helper for creating PRs on Gitea/GitHub API.
# Shared helpers for creating PRs on Gitea/GitHub API.
# Usage: create_pr <api_url> <token> <base> <head> <title> <body>
# number=$(create_pr_number <api_url> <token> <base> <head> <title> <body>)
#
# create_pr — fire-and-forget (stdout suppressed, safe inside sync functions)
# create_pr_number — returns the new PR number via stdout
create_pr() {
local api_url="$1"
local token="$2"
local base="$3"
local head="$4"
local title="$5"
local body="$6"
create_pr_number() {
local api_url="$1" token="$2" base="$3" head="$4" title="$5" body="$6"
curl -sf -X POST \
-H "Authorization: token ${token}" \
@@ -59,5 +58,36 @@ create_pr() {
--arg title "$title" \
--arg body "$body" \
'{base:$base, head:$head, title:$title, body:$body}')" \
"${api_url}/pulls" >/dev/null
"${api_url}/pulls" | jq -r '.number'
}
create_pr() {
create_pr_number "$@" >/dev/null
}
# ─── PR API Helpers ──────────────────────────────────────────────
# Used by onboard and migrate-pr commands.
# List open PRs on a repo. Returns JSON array.
# Usage: list_open_prs <api_url> <token>
list_open_prs() {
local api_url="$1" token="$2"
curl -sf -H "Authorization: token ${token}" \
"${api_url}/pulls?state=open&limit=50"
}
# Get PR diff as plain text.
# Usage: get_pr_diff <api_url> <token> <pr_number>
get_pr_diff() {
local api_url="$1" token="$2" pr_number="$3"
curl -sf -H "Authorization: token ${token}" \
"${api_url}/pulls/${pr_number}.diff"
}
# Get single PR as JSON (for checking merge status, metadata, etc.).
# Usage: get_pr <api_url> <token> <pr_number>
get_pr() {
local api_url="$1" token="$2" pr_number="$3"
curl -sf -H "Authorization: token ${token}" \
"${api_url}/pulls/${pr_number}"
}

View File

@@ -36,7 +36,10 @@ parse_config() {
export JOSH_SYNC_TARGETS
JOSH_SYNC_TARGETS=$(echo "$config_json" | jq '[.targets[] | . +
# Auto-derive josh_filter from subfolder if not set
(if (.josh_filter // "") == "" then
# When exclude patterns are present, append inline :exclude[::p1,::p2,...] to the filter
(if (.exclude // [] | length) > 0 then
{josh_filter: (":/" + .subfolder + ":exclude[" + (.exclude | map("::" + .) | join(",")) + "]")}
elif (.josh_filter // "") == "" then
{josh_filter: (":/" + .subfolder)}
else {} end) +
# Derive gitea_host and subrepo_repo_path from subrepo_url

451
lib/onboard.sh Normal file
View File

@@ -0,0 +1,451 @@
#!/usr/bin/env bash
# lib/onboard.sh — Onboard orchestration and PR migration
#
# Provides:
# onboard_flow() — Interactive: import → wait for merge → reset to new repo
# migrate_one_pr() — Migrate a single PR from archived repo to new repo
#
# Onboard state is stored on the josh-sync-state branch at <target>/onboard.json.
# Steps: start → importing → waiting-for-merge → resetting → complete
#
# Requires: lib/core.sh, lib/config.sh, lib/auth.sh, lib/state.sh, lib/sync.sh sourced
# Expects: JOSH_SYNC_TARGET_NAME, BOT_NAME, BOT_EMAIL, SUBREPO_API, SUBREPO_TOKEN, etc.
# ─── Onboard State Helpers ────────────────────────────────────────
# Follow the same pattern as read_state()/write_state() in lib/state.sh.
read_onboard_state() {
local target_name="${1:-$JOSH_SYNC_TARGET_NAME}"
git fetch origin "$STATE_BRANCH" 2>/dev/null || true
git show "origin/${STATE_BRANCH}:${target_name}/onboard.json" 2>/dev/null || echo '{}'
}
write_onboard_state() {
local target_name="${1:-$JOSH_SYNC_TARGET_NAME}"
local state_json="$2"
local key="${target_name}/onboard"
local tmp_dir
tmp_dir=$(mktemp -d)
if git rev-parse "origin/${STATE_BRANCH}" >/dev/null 2>&1; then
git worktree add "$tmp_dir" "origin/${STATE_BRANCH}" 2>/dev/null
else
git worktree add --detach "$tmp_dir" 2>/dev/null
(cd "$tmp_dir" && git checkout --orphan "$STATE_BRANCH" && { git rm -rf . 2>/dev/null || true; })
fi
mkdir -p "$(dirname "${tmp_dir}/${key}.json")"
echo "$state_json" | jq '.' > "${tmp_dir}/${key}.json"
(
cd "$tmp_dir" || exit
git add -A
if ! git diff --cached --quiet 2>/dev/null; then
git -c user.name="$BOT_NAME" -c user.email="$BOT_EMAIL" \
commit -m "onboard: update ${target_name}"
git push origin "HEAD:${STATE_BRANCH}" || log "WARN" "Failed to push onboard state"
fi
)
git worktree remove "$tmp_dir" 2>/dev/null || rm -rf "$tmp_dir"
}
# ─── Derive Archived API URL ─────────────────────────────────────
# Given a URL like "git@host:org/repo-archived.git" or
# "https://host/org/repo-archived.git", derive the Gitea API URL.
_archived_api_from_url() {
local url="$1"
# Strip .git suffix first — avoids non-greedy regex issues in POSIX ERE
url="${url%.git}"
local host repo_path
if echo "$url" | grep -qE '^(ssh://|git@)'; then
# SSH URL
if echo "$url" | grep -q '^ssh://'; then
host=$(echo "$url" | sed -E 's|ssh://[^@]*@([^/]+)/.*|\1|')
repo_path=$(echo "$url" | sed -E 's|ssh://[^@]*@[^/]+/(.+)$|\1|')
else
host=$(echo "$url" | sed -E 's|git@([^:/]+)[:/].*|\1|')
repo_path=$(echo "$url" | sed -E 's|git@[^:/]+[:/](.+)$|\1|')
fi
else
# HTTPS URL
host=$(echo "$url" | sed -E 's|https?://([^/]+)/.*|\1|')
repo_path=$(echo "$url" | sed -E 's|https?://[^/]+/(.+)$|\1|')
fi
echo "https://${host}/api/v1/repos/${repo_path}"
}
# ─── Onboard Flow ────────────────────────────────────────────────
# Interactive orchestrator with checkpoint/resume.
# Usage: onboard_flow <target_json> <restart>
onboard_flow() {
local target_json="$1"
local restart="${2:-false}"
local target_name="$JOSH_SYNC_TARGET_NAME"
# Load existing onboard state (or empty)
local onboard_state
onboard_state=$(read_onboard_state "$target_name")
local current_step
current_step=$(echo "$onboard_state" | jq -r '.step // "start"')
if [ "$restart" = true ]; then
log "INFO" "Restarting onboard from scratch"
current_step="start"
onboard_state='{}'
fi
log "INFO" "Onboard step: ${current_step}"
# ── Step 1: Prerequisites + archived repo info ──
if [ "$current_step" = "start" ]; then
echo "" >&2
echo "=== Onboarding ${target_name} ===" >&2
echo "" >&2
echo "Before proceeding, you should have:" >&2
echo " 1. Renamed the existing subrepo (e.g., storefront → storefront-archived)" >&2
echo " 2. Created a new EMPTY repo at the original URL" >&2
echo "" >&2
# Verify the new (empty) subrepo is reachable (no HEAD ref — works on empty repos)
if git ls-remote "$(subrepo_auth_url)" >/dev/null 2>&1; then
# shellcheck disable=SC2001 # sed is clearer for URL pattern replacement
log "INFO" "New subrepo is reachable at $(echo "$SUBREPO_URL" | sed 's|://[^@]*@|://***@|')"
else
log "WARN" "New subrepo is not reachable — make sure you created the new empty repo"
fi
echo "Enter the archived repo URL (e.g., git@host:org/repo-archived.git):" >&2
local archived_url
read -r archived_url
[ -n "$archived_url" ] || die "Archived URL is required"
# Determine auth type for archived repo (same as current subrepo)
local archived_auth="${SUBREPO_AUTH:-https}"
# Derive API URL
local archived_api
archived_api=$(_archived_api_from_url "$archived_url")
# Verify archived repo is reachable via API
if curl -sf -H "Authorization: token ${SUBREPO_TOKEN}" \
"${archived_api}" >/dev/null 2>&1; then
log "INFO" "Archived repo reachable: ${archived_api}"
else
log "WARN" "Cannot reach archived repo API — check URL and token"
echo "Continue anyway? (y/N):" >&2
local confirm
read -r confirm
[ "$confirm" = "y" ] || [ "$confirm" = "Y" ] || die "Aborted"
fi
# Save state
onboard_state=$(jq -n \
--arg step "importing" \
--arg archived_api "$archived_api" \
--arg archived_url "$archived_url" \
--arg archived_auth "$archived_auth" \
--arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
'{step:$step, archived_api:$archived_api, archived_url:$archived_url,
archived_auth:$archived_auth, import_prs:{}, reset_branches:[],
migrated_prs:[], timestamp:$ts}')
write_onboard_state "$target_name" "$onboard_state"
current_step="importing"
fi
# ── Step 2: Import (reuses initial_import()) ──
if [ "$current_step" = "importing" ]; then
echo "" >&2
log "INFO" "Step 2: Importing subrepo content into monorepo..."
local branches
branches=$(echo "$target_json" | jq -r '.branches | keys[]')
# Load existing import_prs from state (resume support)
local import_prs
import_prs=$(echo "$onboard_state" | jq -r '.import_prs // {}')
# Build the archived repo clone URL for initial_import().
# The content lives in the archived repo — the new repo at SUBREPO_URL is empty.
local archived_url archived_clone_url
archived_url=$(echo "$onboard_state" | jq -r '.archived_url')
if [ "${SUBREPO_AUTH:-https}" = "ssh" ]; then
archived_clone_url="$archived_url"
else
# shellcheck disable=SC2001
archived_clone_url=$(echo "$archived_url" | sed "s|https://|https://${BOT_USER}:${SUBREPO_TOKEN}@|")
fi
for branch in $branches; do
local mapped
mapped=$(echo "$target_json" | jq -r --arg b "$branch" '.branches[$b] // empty')
[ -z "$mapped" ] && continue
# Skip branches that already have an import PR recorded
if echo "$import_prs" | jq -e --arg b "$branch" 'has($b)' >/dev/null 2>&1; then
log "INFO" "Import PR already recorded for ${branch} — skipping"
continue
fi
export SYNC_BRANCH_MONO="$branch"
export SYNC_BRANCH_SUBREPO="$mapped"
log "INFO" "Importing branch: ${branch} (subrepo: ${mapped})"
local result
result=$(initial_import "$archived_clone_url")
log "INFO" "Import result for ${branch}: ${result}"
if [ "$result" = "pr-created" ]; then
# Find the import PR number via API
local prs pr_number
prs=$(list_open_prs "$MONOREPO_API" "$GITEA_TOKEN")
pr_number=$(echo "$prs" | jq -r --arg t "$target_name" --arg b "$branch" \
'[.[] | select(.title | test("\\[Import\\] " + $t + ":")) | select(.base.ref == $b)] | .[0].number // empty')
if [ -n "$pr_number" ]; then
import_prs=$(echo "$import_prs" | jq --arg b "$branch" --arg n "$pr_number" '. + {($b): ($n | tonumber)}')
log "INFO" "Import PR for ${branch}: #${pr_number}"
else
log "WARN" "Could not find import PR number for ${branch} — check monorepo PRs"
fi
fi
# Save progress after each branch (resume support)
onboard_state=$(echo "$onboard_state" | jq --argjson prs "$import_prs" '.import_prs = $prs')
write_onboard_state "$target_name" "$onboard_state"
done
# Update state
onboard_state=$(echo "$onboard_state" | jq \
--arg step "waiting-for-merge" \
--argjson prs "$import_prs" \
'.step = $step | .import_prs = $prs')
write_onboard_state "$target_name" "$onboard_state"
current_step="waiting-for-merge"
fi
# ── Step 3: Wait for merge ──
if [ "$current_step" = "waiting-for-merge" ]; then
echo "" >&2
log "INFO" "Step 3: Waiting for import PR(s) to be merged..."
local import_prs
import_prs=$(echo "$onboard_state" | jq -r '.import_prs')
local pr_count
pr_count=$(echo "$import_prs" | jq 'length')
if [ "$pr_count" -eq 0 ]; then
log "WARN" "No import PRs recorded — skipping merge check"
else
echo "" >&2
echo "Import PRs to merge:" >&2
echo "$import_prs" | jq -r 'to_entries[] | " \(.key): PR #\(.value)"' >&2
echo "" >&2
echo "Merge the import PR(s) on the monorepo, then press Enter..." >&2
read -r
# Verify each PR is merged
local all_merged=true
for branch in $(echo "$import_prs" | jq -r 'keys[]'); do
local pr_number
pr_number=$(echo "$import_prs" | jq -r --arg b "$branch" '.[$b]')
local pr_json merged
pr_json=$(get_pr "$MONOREPO_API" "$GITEA_TOKEN" "$pr_number")
merged=$(echo "$pr_json" | jq -r '.merged // false')
if [ "$merged" = "true" ]; then
log "INFO" "PR #${pr_number} (${branch}): merged"
else
log "ERROR" "PR #${pr_number} (${branch}): NOT merged — merge it first"
all_merged=false
fi
done
if [ "$all_merged" = false ]; then
die "Not all import PRs are merged. Re-run 'josh-sync onboard ${target_name}' after merging."
fi
fi
# Update state
onboard_state=$(echo "$onboard_state" | jq '.step = "resetting"')
write_onboard_state "$target_name" "$onboard_state"
current_step="resetting"
fi
# ── Step 4: Reset (pushes josh-filtered history to new repo) ──
if [ "$current_step" = "resetting" ]; then
echo "" >&2
log "INFO" "Step 4: Pushing josh-filtered history to new subrepo..."
local branches
branches=$(echo "$target_json" | jq -r '.branches | keys[]')
local already_reset
already_reset=$(echo "$onboard_state" | jq -r '.reset_branches // []')
for branch in $branches; do
# Skip branches already reset (resume support)
if echo "$already_reset" | jq -e --arg b "$branch" 'index($b) != null' >/dev/null 2>&1; then
log "INFO" "Branch ${branch} already reset — skipping"
continue
fi
local mapped
mapped=$(echo "$target_json" | jq -r --arg b "$branch" '.branches[$b] // empty')
[ -z "$mapped" ] && continue
export SYNC_BRANCH_MONO="$branch"
export SYNC_BRANCH_SUBREPO="$mapped"
local result
result=$(subrepo_reset)
log "INFO" "Reset result for ${branch}: ${result}"
# Track progress
onboard_state=$(echo "$onboard_state" | jq --arg b "$branch" \
'.reset_branches += [$b]')
write_onboard_state "$target_name" "$onboard_state"
done
# Update state
onboard_state=$(echo "$onboard_state" | jq \
--arg step "complete" \
--arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
'.step = $step | .timestamp = $ts')
write_onboard_state "$target_name" "$onboard_state"
current_step="complete"
fi
# ── Step 5: Done ──
if [ "$current_step" = "complete" ]; then
echo "" >&2
echo "=== Onboarding complete! ===" >&2
echo "" >&2
echo "The new subrepo now has josh-filtered history." >&2
echo "Developers should re-clone or reset their local copies:" >&2
echo " git fetch origin && git reset --hard origin/main" >&2
echo "" >&2
echo "To migrate open PRs from the archived repo:" >&2
echo " josh-sync migrate-pr ${target_name} # interactive picker" >&2
echo " josh-sync migrate-pr ${target_name} --all # migrate all" >&2
echo " josh-sync migrate-pr ${target_name} 5 8 12 # specific PRs" >&2
fi
}
# ─── Migrate One PR ──────────────────────────────────────────────
# Fetches the PR's branch from the archived repo, computes a local diff,
# and applies it to the new subrepo with --3way for resilience.
# Usage: migrate_one_pr <pr_number>
#
# Expects: JOSH_SYNC_TARGET_NAME, SUBREPO_API, SUBREPO_TOKEN, BOT_NAME, BOT_EMAIL loaded
migrate_one_pr() {
local pr_number="$1"
local target_name="$JOSH_SYNC_TARGET_NAME"
# Read archived repo info from onboard state
local onboard_state archived_api
onboard_state=$(read_onboard_state "$target_name")
archived_api=$(echo "$onboard_state" | jq -r '.archived_api')
if [ -z "$archived_api" ] || [ "$archived_api" = "null" ]; then
die "No archived repo info found. Run 'josh-sync onboard ${target_name}' first."
fi
# Check if this PR was already migrated
local already_migrated
already_migrated=$(echo "$onboard_state" | jq -r \
--argjson num "$pr_number" '.migrated_prs // [] | map(select(.old_number == $num)) | length')
if [ "$already_migrated" -gt 0 ]; then
log "INFO" "PR #${pr_number} already migrated — skipping"
return 0
fi
# Same credentials — the repo was just renamed
local archived_token="$SUBREPO_TOKEN"
# 1. Get PR metadata from archived repo
local pr_json title base head body
pr_json=$(get_pr "$archived_api" "$archived_token" "$pr_number") \
|| die "Failed to fetch PR #${pr_number} from archived repo"
title=$(echo "$pr_json" | jq -r '.title')
base=$(echo "$pr_json" | jq -r '.base.ref')
head=$(echo "$pr_json" | jq -r '.head.ref')
body=$(echo "$pr_json" | jq -r '.body // ""')
log "INFO" "Migrating PR #${pr_number}: \"${title}\" (${base} <- ${head})"
# 2. Clone new subrepo, add archived repo as second remote
# Save cwd so we can restore it (function runs in caller's shell, not subshell)
local original_dir
original_dir=$(pwd)
local work_dir
work_dir=$(mktemp -d)
# shellcheck disable=SC2064 # Intentional early expansion
trap "cd '$original_dir' 2>/dev/null; rm -rf '$work_dir'" RETURN
git clone "$(subrepo_auth_url)" --branch "$base" --single-branch \
"${work_dir}/subrepo" 2>&1 || die "Failed to clone new subrepo (branch: ${base})"
cd "${work_dir}/subrepo" || exit
git config user.name "$BOT_NAME"
git config user.email "$BOT_EMAIL"
# Build authenticated URL for the archived repo
local archived_url archived_clone_url
archived_url=$(echo "$onboard_state" | jq -r '.archived_url')
if [ "${SUBREPO_AUTH:-https}" = "ssh" ]; then
archived_clone_url="$archived_url"
else
# shellcheck disable=SC2001
archived_clone_url=$(echo "$archived_url" | sed "s|https://|https://${BOT_USER}:${SUBREPO_TOKEN}@|")
fi
# Fetch the PR's head and base branches from the archived repo
git remote add archived "$archived_clone_url"
git fetch archived "$head" "$base" 2>&1 \
|| die "Failed to fetch branches from archived repo"
# 3. Compute diff locally and apply with --3way
git checkout -B "$head" >&2
local diff
diff=$(git diff "archived/${base}..archived/${head}")
if [ -z "$diff" ]; then
log "WARN" "Empty diff for PR #${pr_number} — skipping"
return 1
fi
if echo "$diff" | git apply --3way 2>&1; then
git add -A
git commit -m "${title}
Migrated from archived repo PR #${pr_number}" >&2
git push "$(subrepo_auth_url)" "$head" >&2 \
|| die "Failed to push branch ${head}"
# 4. Create PR on new repo
local new_number
new_number=$(create_pr_number "$SUBREPO_API" "$SUBREPO_TOKEN" \
"$base" "$head" "$title" "$body")
log "INFO" "Migrated PR #${pr_number} -> #${new_number}: \"${title}\""
# 5. Record in onboard state
cd "$original_dir" || true
onboard_state=$(read_onboard_state "$target_name")
onboard_state=$(echo "$onboard_state" | jq \
--argjson old "$pr_number" \
--argjson new_num "${new_number}" \
--arg title "$title" \
'.migrated_prs += [{"old_number":$old, "new_number":$new_num, "title":$title}]')
write_onboard_state "$target_name" "$onboard_state"
else
log "ERROR" "Could not apply changes for PR #${pr_number} even with 3-way merge"
log "ERROR" "Manual migration needed: branch '${head}' from archived repo"
return 1
fi
}

View File

@@ -11,7 +11,7 @@
# ─── Forward Sync: Monorepo → Subrepo ──────────────────────────────
#
# Returns: fresh | skip | clean | lease-rejected | conflict
# Returns: fresh | skip | clean | lease-rejected | conflict | unrelated
forward_sync() {
local mono_branch="$SYNC_BRANCH_MONO"
@@ -97,7 +97,14 @@ ${BOT_TRAILER}: forward/${mono_branch}/$(date -u +%Y-%m-%dT%H:%M:%SZ)" >&2
fi
else
# Conflict!
# Check: unrelated histories (filter change) vs normal merge conflict
if ! git merge-base "subrepo/${subrepo_branch}" "$mono_head" >/dev/null 2>&1; then
log "INFO" "No common ancestor — histories are unrelated (filter change?)"
echo "unrelated"
return
fi
# Normal merge conflict
local conflicted
conflicted=$(git diff --name-only --diff-filter=U 2>/dev/null || echo "(unknown)")
git merge --abort
@@ -115,7 +122,14 @@ ${BOT_TRAILER}: forward/${mono_branch}/$(date -u +%Y-%m-%dT%H:%M:%SZ)" >&2
local pr_body conflicted_list
# shellcheck disable=SC2001
conflicted_list=$(echo "$conflicted" | sed 's/^/- /')
pr_body="## Sync Conflict\n\nMonorepo \`${mono_branch}\` has changes that conflict with \`${subrepo_branch}\`.\n\n**Conflicted files:**\n${conflicted_list}\n\nPlease resolve and merge this PR to complete the sync."
pr_body="## Sync Conflict
Monorepo \`${mono_branch}\` has changes that conflict with \`${subrepo_branch}\`.
**Conflicted files:**
${conflicted_list}
Please resolve and merge this PR to complete the sync."
create_pr "${SUBREPO_API}" "${SUBREPO_TOKEN}" \
"$subrepo_branch" "$conflict_branch" \
@@ -128,6 +142,87 @@ ${BOT_TRAILER}: forward/${mono_branch}/$(date -u +%Y-%m-%dT%H:%M:%SZ)" >&2
fi
}
# ─── Filter Change Reconciliation ─────────────────────────────────
# When the josh filter changes (e.g., exclude patterns added/removed),
# josh-proxy recomputes filtered history with new SHAs. This creates a
# merge commit on the subrepo that connects old and new histories,
# re-establishing shared ancestry without a destructive force-push.
# Returns: reconciled | lease-rejected
reconcile_filter_change() {
local mono_branch="$SYNC_BRANCH_MONO"
local subrepo_branch="$SYNC_BRANCH_SUBREPO"
local work_dir
work_dir=$(mktemp -d)
# shellcheck disable=SC2064 # Intentional early expansion — work_dir is local
trap "rm -rf '$work_dir'" EXIT
log "INFO" "=== Filter change reconciliation: ${mono_branch} ==="
# 1. Clone subrepo
git clone "$(subrepo_auth_url)" \
--branch "$subrepo_branch" --single-branch \
"${work_dir}/subrepo" || die "Failed to clone subrepo"
cd "${work_dir}/subrepo" || exit
git config user.name "$BOT_NAME"
git config user.email "$BOT_EMAIL"
local subrepo_head
subrepo_head=$(git rev-parse HEAD)
log "INFO" "Subrepo HEAD: ${subrepo_head:0:12}"
# 2. Fetch josh-proxy filtered view (new filter)
git remote add josh-filtered "$(josh_auth_url)"
git fetch josh-filtered "$mono_branch" || die "Failed to fetch from josh-proxy"
local josh_head josh_tree
josh_head=$(git rev-parse "josh-filtered/${mono_branch}")
# shellcheck disable=SC1083 # {tree} is git syntax, not shell brace expansion
josh_tree=$(git rev-parse "josh-filtered/${mono_branch}^{tree}")
log "INFO" "Josh-proxy HEAD (new filter): ${josh_head:0:12}"
# 3. Check if trees are already identical (filter change had no effect)
local subrepo_tree
# shellcheck disable=SC1083
subrepo_tree=$(git rev-parse "HEAD^{tree}")
if [ "$josh_tree" = "$subrepo_tree" ]; then
log "INFO" "Trees identical after filter change — no reconciliation needed"
echo "skip"
return
fi
# 4. Create merge commit: josh-proxy HEAD (first parent) + subrepo HEAD, with josh-proxy's tree
# Josh follows first-parent traversal — josh-filtered MUST be first so josh can map
# the history back to the monorepo. Old subrepo history hangs off parent 2.
local merge_commit
merge_commit=$(git commit-tree "$josh_tree" \
-p "$josh_head" \
-p "$subrepo_head" \
-m "Sync: filter configuration updated
${BOT_TRAILER}: filter-change/${mono_branch}/$(date -u +%Y-%m-%dT%H:%M:%SZ)")
git reset --hard "$merge_commit" >&2
log "INFO" "Created reconciliation merge: ${merge_commit:0:12}"
# 5. Record lease and push
local subrepo_sha
subrepo_sha=$(subrepo_ls_remote "$subrepo_branch")
if git push \
--force-with-lease="refs/heads/${subrepo_branch}:${subrepo_sha}" \
"$(subrepo_auth_url)" \
"HEAD:refs/heads/${subrepo_branch}"; then
log "INFO" "Filter change reconciled — shared ancestry re-established"
echo "reconciled"
else
log "WARN" "Force-with-lease rejected — subrepo changed during reconciliation"
echo "lease-rejected"
fi
}
# ─── Reverse Sync: Subrepo → Monorepo ──────────────────────────────
#
# Always creates a PR on the monorepo — never pushes directly.
@@ -156,9 +251,24 @@ reverse_sync() {
git remote add mono-filtered "$(josh_auth_url)"
git fetch mono-filtered "$mono_branch" || die "Failed to fetch from josh-proxy"
# 3. Find new human commits (excludes bot commits from forward sync)
# 3. Compare trees — skip if subrepo matches josh-filtered view
local subrepo_tree josh_tree
# shellcheck disable=SC1083 # {tree} is git syntax, not shell brace expansion
subrepo_tree=$(git rev-parse "HEAD^{tree}")
# shellcheck disable=SC1083
josh_tree=$(git rev-parse "mono-filtered/${mono_branch}^{tree}")
if [ "$subrepo_tree" = "$josh_tree" ]; then
log "INFO" "Subrepo tree matches josh-filtered view — nothing to sync"
echo "skip"
return
fi
# 4. Find new human commits (excludes bot commits from forward sync)
# Uses --ancestry-path to restrict to the direct lineage and avoid
# leaking old history through reconciliation merge parents.
local human_commits
human_commits=$(git log "mono-filtered/${mono_branch}..HEAD" \
human_commits=$(git log --ancestry-path "mono-filtered/${mono_branch}..HEAD" \
--oneline --invert-grep --grep="^${BOT_TRAILER}:" 2>/dev/null || echo "")
if [ -z "$human_commits" ]; then
@@ -170,7 +280,7 @@ reverse_sync() {
log "INFO" "New human commits to sync:"
echo "$human_commits" >&2
# 4. Push through josh to a staging branch
# 5. Push through josh to a staging branch
local ts
ts=$(date +%Y%m%d-%H%M%S)
local staging_branch="auto-sync/subrepo-${subrepo_branch}-${ts}"
@@ -178,9 +288,20 @@ reverse_sync() {
if git push -o "base=${mono_branch}" "$(josh_auth_url)" "HEAD:refs/heads/${staging_branch}"; then
log "INFO" "Pushed to staging branch via josh: ${staging_branch}"
# 5. Create PR on monorepo (NEVER direct push)
# 6. Create PR on monorepo (NEVER direct push)
local pr_body
pr_body="## Subrepo changes\n\nNew commits from subrepo \`${subrepo_branch}\`:\n\n\`\`\`\n${human_commits}\n\`\`\`\n\n**Review checklist:**\n- [ ] Changes scoped to synced subfolder\n- [ ] No leaked credentials or environment-specific config\n- [ ] CI passes"
pr_body="## Subrepo changes
New commits from subrepo \`${subrepo_branch}\`:
\`\`\`
${human_commits}
\`\`\`
**Review checklist:**
- [ ] Changes scoped to synced subfolder
- [ ] No leaked credentials or environment-specific config
- [ ] CI passes"
create_pr "${MONOREPO_API}" "${GITEA_TOKEN}" \
"$mono_branch" "$staging_branch" \
@@ -200,9 +321,13 @@ reverse_sync() {
#
# Used when a subrepo already has content and you're adding it to the
# monorepo for the first time. Creates a PR.
# Usage: initial_import [clone_url_override]
# clone_url_override — if set, clone from this URL instead of subrepo_auth_url()
# (used by onboard to clone from the archived repo)
# Returns: skip | pr-created
initial_import() {
local clone_url="${1:-$(subrepo_auth_url)}"
local mono_branch="$SYNC_BRANCH_MONO"
local subrepo_branch="$SYNC_BRANCH_SUBREPO"
local subfolder
@@ -225,8 +350,8 @@ initial_import() {
--branch "$mono_branch" --single-branch \
"${work_dir}/monorepo" || die "Failed to clone monorepo"
# 2. Clone subrepo
git clone "$(subrepo_auth_url)" \
# 2. Clone subrepo (or archived repo when clone_url is overridden)
git clone "$clone_url" \
--branch "$subrepo_branch" --single-branch \
"${work_dir}/subrepo" || die "Failed to clone subrepo"
@@ -264,7 +389,14 @@ ${BOT_TRAILER}: import/${JOSH_SYNC_TARGET_NAME}/${ts}" >&2
# 5. Create PR on monorepo
local pr_body
pr_body="## Initial import\n\nImporting existing subrepo \`${subrepo_branch}\` (${file_count} files) into \`${subfolder}/\`.\n\n**Review checklist:**\n- [ ] Content looks correct\n- [ ] No leaked credentials or environment-specific config\n- [ ] CI passes"
pr_body="## Initial import
Importing existing subrepo \`${subrepo_branch}\` (${file_count} files) into \`${subfolder}/\`.
**Review checklist:**
- [ ] Content looks correct
- [ ] No leaked credentials or environment-specific config
- [ ] CI passes"
create_pr "${MONOREPO_API}" "${GITEA_TOKEN}" \
"$mono_branch" "$staging_branch" \

View File

@@ -70,6 +70,12 @@
"items": { "type": "string" },
"default": [],
"description": "Branches that only sync mono → subrepo (never reverse)"
},
"exclude": {
"type": "array",
"items": { "type": "string" },
"default": [],
"description": "File/directory patterns to exclude from sync via josh :exclude filter. Josh pattern syntax: 'dir/' for directories, '*.ext' for globs, '**/dir/' for nested matches. Patterns are embedded inline in the josh-proxy URL."
}
}
}