# Setup Guide Step-by-step guide to setting up josh-sync for a new monorepo with existing subrepos. ## Overview josh-sync provides bidirectional sync between a monorepo and N external subrepos via [josh-proxy](https://josh-project.github.io/josh/): ``` MONOREPO SUBREPOS ├── services/billing/ ──── forward ────► billing-repo/ ├── services/auth/ (push or cron) auth-repo/ └── libs/shared/ ◄──── reverse ───── shared-lib-repo/ (cron → always PR) via josh-proxy (filtered git views) ``` **Key safety properties:** - Forward sync (mono → subrepo) uses `--force-with-lease` — never overwrites concurrent changes - Reverse sync (subrepo → mono) always creates a PR — never pushes directly - Git trailers (`Josh-Sync-Origin:`) prevent infinite sync loops - State tracked on an orphan branch (`josh-sync-state`) — survives CI runner teardown ## Prerequisites Before you begin, you need: ### josh-proxy instance A running [josh-proxy](https://josh-project.github.io/josh/) that can access your monorepo's Git server. Verify connectivity: ```bash git ls-remote https://josh.example.com/org/monorepo.git HEAD ``` ### Bot account A dedicated Git user (e.g., `josh-sync-bot`) with: - Write access to the monorepo - Write access to all subrepos - Ability to create PRs on both monorepo and subrepo platforms ### Credentials | Variable | Purpose | Required | |----------|---------|----------| | `SYNC_BOT_USER` | Bot's Git username | Yes | | `SYNC_BOT_TOKEN` | API token with repo scope (monorepo + josh-proxy auth) | Yes | | `SUBREPO_SSH_KEY` | SSH private key for subrepo access (if using SSH auth) | If SSH | | `SUBREPO_TOKEN` | HTTPS token for subrepo access (defaults to `SYNC_BOT_TOKEN`) | No | Per-target credential overrides are supported — see [Configuration Reference](config-reference.md). ### Tool dependencies `bash >=4`, `git`, `curl`, `jq`, `yq` ([mikefarah/yq](https://github.com/mikefarah/yq) v4+), `openssh`, `rsync` > The Nix flake bundles all dependencies automatically. ## Step 1: Create the Monorepo Create a new repository on your Git server (e.g., `org/monorepo`). Create subdirectories for each subrepo you want to sync: ```bash mkdir -p services/billing services/auth libs/shared ``` These directories will be populated during the import step. They can be empty or contain `.gitkeep` files for now. Verify josh-proxy can see the monorepo: ```bash git ls-remote https://josh.example.com/org/monorepo.git HEAD ``` ## Step 2: Configure `.josh-sync.yml` Create `.josh-sync.yml` at the monorepo root. Each target maps a monorepo subfolder to an external subrepo: ```yaml josh: proxy_url: "https://josh.example.com" # josh-proxy URL (no trailing slash) monorepo_path: "org/monorepo" # repo path as josh sees it targets: - name: "billing" # unique identifier subfolder: "services/billing" # monorepo subfolder # josh_filter auto-derived as ":/services/billing" if omitted subrepo_url: "git@gitea.example.com:ext/billing.git" subrepo_auth: "ssh" # "https" (default) or "ssh" branches: main: main # mono_branch: subrepo_branch forward_only: [] - name: "auth" subfolder: "services/auth" subrepo_url: "https://gitea.example.com/ext/auth.git" subrepo_auth: "https" subrepo_token_var: "AUTH_REPO_TOKEN" # per-target credential override branches: main: main develop: develop # multiple branches supported forward_only: [] - name: "shared-lib" subfolder: "libs/shared" subrepo_url: "https://gitea.example.com/ext/shared-lib.git" branches: main: main forward_only: [main] # one-way: mono → subrepo only bot: name: "josh-sync-bot" email: "josh-sync-bot@example.com" trailer: "Josh-Sync-Origin" # git trailer for loop prevention ``` For the full field reference, see [Configuration Reference](config-reference.md). ## Step 3: Set Up Local Dev Environment ### Option A: Nix + devenv (recommended) **`devenv.yaml`** — declare josh-sync as a flake input: ```yaml inputs: nixpkgs: url: github:cachix/devenv-nixpkgs/rolling josh-sync: url: git+https://your-gitea.example.com/org/josh-sync?ref=main flake: true ``` **`devenv.nix`** — import the josh-sync module: ```nix { inputs, ... }: { imports = [ inputs.josh-sync.devenvModules.default ]; name = "my-monorepo"; # .env contains secrets, not devenv config dotenv.disableHint = true; } ``` **`.envrc`** — activate devenv automatically: ```bash DEVENV_WARN_TIMEOUT=20 use devenv ``` **`.env`** — local credentials (add to `.gitignore`): ```bash SYNC_BOT_USER=sync-bot SYNC_BOT_TOKEN= SUBREPO_SSH_KEY="-----BEGIN OPENSSH PRIVATE KEY----- ... -----END OPENSSH PRIVATE KEY-----" # Per-target overrides: # AUTH_REPO_TOKEN= ``` ### Updating josh-sync in devenv To update to the latest version: ```bash devenv update josh-sync ``` Or with plain Nix flakes: ```bash nix flake lock --update-input josh-sync ``` To pin to a specific version, use a tag ref in `devenv.yaml`: ```yaml josh-sync: url: git+https://your-gitea.example.com/org/josh-sync?ref=v1.1 flake: true ``` After updating, verify the version: ```bash josh-sync --version ``` ### Option B: Manual installation Install the required tools, then either: - Clone the josh-sync repo and add `bin/` to your `PATH` - Run `make build` to create a single bundled script at `dist/josh-sync` ## Step 4: Validate with Preflight ```bash josh-sync preflight ``` This validates: - Config syntax and required fields - josh-proxy connectivity (via `git ls-remote` through josh) - Subrepo connectivity and authentication - Branch mappings - CI workflow path coverage (checks if `.gitea/workflows/josh-sync-forward.yml` paths match target subfolders) For a new monorepo before import, preflight may warn that subfolders don't exist yet — that's expected. ## Step 5: Import Existing Subrepos This is the critical onboarding step. There are two approaches: - **`josh-sync onboard`** (recommended) — interactive, resumable, preserves open PRs - **Manual `import` → merge → `reset`** — lower-level, for automation or when there are no open PRs to preserve ### Option A: Onboard (recommended) The `onboard` command walks you through the entire process interactively, with checkpoint/resume at every step. **Before you start:** 1. **Rename** the existing subrepo on your Git server (e.g., `stores/storefront` → `stores/storefront-archived`) 2. **Create a new empty repo** at the original path (e.g., a new `stores/storefront` with no commits) The rename preserves the archived repo with all its history and open PRs. The new empty repo will receive josh-filtered history. **Run onboard:** ```bash josh-sync onboard billing ``` The command will: 1. **Verify prerequisites** — checks the new empty repo is reachable, asks for the archived repo URL 2. **Import** — copies subrepo content into monorepo and creates import PRs (one per branch) 3. **Wait for merge** — shows PR numbers and waits for you to merge them 4. **Reset** — pushes josh-filtered history to the new subrepo (per-branch, with resume) 5. **Done** — prints instructions for developers and PR migration If the process is interrupted at any point, re-run `josh-sync onboard billing` to resume from where it left off. Use `--restart` to start over. **Migrate open PRs:** After onboard completes, migrate PRs from the archived repo to the new one: ```bash # Interactive — lists open PRs and lets you pick josh-sync migrate-pr billing # Migrate all open PRs at once josh-sync migrate-pr billing --all # Migrate specific PRs by number josh-sync migrate-pr billing 5 8 12 ``` PR migration works by fetching the diff from the archived repo's PR, applying it to the new repo, and creating a new PR. File content is identical after reset, so patches apply cleanly. ### Option B: Manual import → merge → reset Use this when the subrepo has no open PRs to preserve, or for scripted automation. > Do this **one target at a time** to keep PRs reviewable. #### 5b-1. Import ```bash josh-sync import billing ``` This: 1. Clones the monorepo directly (not through josh) 2. Clones the subrepo 3. Copies subrepo content into the monorepo subfolder via `rsync` 4. Creates a branch `auto-sync/import-billing-` 5. Pushes it and creates a PR on the monorepo Review the import PR — check for leaked credentials, environment-specific config, or files that shouldn't be in the monorepo. #### 5b-2. Merge the import PR Merge the PR using your Git platform's UI. This lands the subrepo content into the monorepo's main branch. > At this point, the monorepo has the content but the histories are disconnected. Sync will **not** work until you complete the reset step. #### 5b-3. Reset ```bash josh-sync reset billing ``` > **You do NOT need to `git pull` locally before running reset.** The reset command clones fresh from josh-proxy — it never uses your local working copy. This: 1. Clones the monorepo through josh-proxy with the josh filter (the "filtered view") 2. Force-pushes that filtered view to the subrepo, replacing its history This establishes **shared commit ancestry** between josh's filtered view and the subrepo. Without this, josh-proxy can't compute diffs between the two. > **Warning:** This is a destructive force-push that replaces the subrepo's history. Back up any important branches or tags in the subrepo beforehand. Merge or close all open pull requests on the subrepo first — they will be invalidated. After reset, **every developer with a local clone of the subrepo** must update their local copy to match the new history: ```bash cd /path/to/local-subrepo git fetch origin git checkout main && git reset --hard origin/main git checkout stage && git reset --hard origin/stage # repeat for each branch ``` Or simply delete and re-clone the subrepo. Local-only branches (not pushed to the remote) will be lost either way. #### 5b-4. Repeat for each target ``` For each target: 1. josh-sync import 2. Review and merge the import PR on the monorepo 3. josh-sync reset ``` ### Verify After all targets are imported and reset (whichever option you used): ```bash # Check all targets show state josh-sync status # Test forward sync — should return "skip" (trees are identical after reset) josh-sync sync --forward --target billing # Test reverse sync — should return "skip" (no new human commits) josh-sync sync --reverse --target billing ``` ## Step 6: Set Up CI Workflows ### Forward sync (mono → subrepo) Create `.gitea/workflows/josh-sync-forward.yml`: ```yaml name: "Josh Sync → Subrepo" on: push: branches: [main] paths: # List ALL target subfolders: - "services/billing/**" - "services/auth/**" - "libs/shared/**" schedule: - cron: "0 */6 * * *" # every 6 hours as fallback workflow_dispatch: inputs: target: description: "Target to sync (empty = detect from push or all)" required: false default: "" branch: description: "Branch to sync (empty = triggered branch or all)" required: false default: "" concurrency: group: josh-sync-fwd-${{ github.ref_name }} cancel-in-progress: false jobs: sync: runs-on: docker container: node:20-bookworm timeout-minutes: 10 steps: - uses: actions/checkout@v4 with: fetch-depth: 2 # needed for target detection - name: Install tools run: | apt-get update -qq && apt-get install -y -qq jq curl git openssh-client >/dev/null 2>&1 curl -sL "https://github.com/mikefarah/yq/releases/download/v4.44.6/yq_linux_amd64" \ -o /usr/local/bin/yq && chmod +x /usr/local/bin/yq - name: Detect changed target if: github.event_name == 'push' id: detect run: | CHANGED=$(git diff --name-only HEAD~1 HEAD 2>/dev/null || echo "") TARGETS=$(yq -o json '.targets' .josh-sync.yml \ | jq -r '.[] | "\(.name):\(.subfolder)"' \ | while IFS=: read -r name prefix; do echo "$CHANGED" | grep -q "^${prefix}/" && echo "$name" done | sort -u | paste -sd ',' -) echo "targets=${TARGETS}" >> "$GITHUB_OUTPUT" - uses: https://your-gitea.example.com/org/josh-sync@v1 with: direction: forward target: ${{ github.event.inputs.target || steps.detect.outputs.targets }} branch: ${{ github.event.inputs.branch || github.ref_name }} env: SYNC_BOT_USER: ${{ secrets.SYNC_BOT_USER }} SYNC_BOT_TOKEN: ${{ secrets.SYNC_BOT_TOKEN }} SUBREPO_TOKEN: ${{ secrets.SUBREPO_TOKEN || secrets.SYNC_BOT_TOKEN }} SUBREPO_SSH_KEY: ${{ secrets.SUBREPO_SSH_KEY }} ``` ### Reverse sync (subrepo → mono) Create `.gitea/workflows/josh-sync-reverse.yml`: ```yaml name: "Josh Sync ← Subrepo" on: schedule: - cron: "0 1,7,13,19 * * *" # every 6h, offset from forward workflow_dispatch: inputs: target: description: "Target to sync (empty = all)" required: false default: "" branch: description: "Branch to sync (empty = all eligible)" required: false default: "" concurrency: group: josh-sync-rev-${{ github.event.inputs.target || 'all' }} cancel-in-progress: false jobs: sync: runs-on: docker container: node:20-bookworm timeout-minutes: 10 steps: - uses: actions/checkout@v4 - name: Install tools run: | apt-get update -qq && apt-get install -y -qq jq curl git openssh-client >/dev/null 2>&1 curl -sL "https://github.com/mikefarah/yq/releases/download/v4.44.6/yq_linux_amd64" \ -o /usr/local/bin/yq && chmod +x /usr/local/bin/yq - uses: https://your-gitea.example.com/org/josh-sync@v1 with: direction: reverse target: ${{ github.event.inputs.target || '' }} branch: ${{ github.event.inputs.branch || '' }} env: SYNC_BOT_USER: ${{ secrets.SYNC_BOT_USER }} SYNC_BOT_TOKEN: ${{ secrets.SYNC_BOT_TOKEN }} SUBREPO_TOKEN: ${{ secrets.SUBREPO_TOKEN || secrets.SYNC_BOT_TOKEN }} SUBREPO_SSH_KEY: ${{ secrets.SUBREPO_SSH_KEY }} ``` ### Required CI secrets | Secret | Purpose | |--------|---------| | `SYNC_BOT_USER` | Bot username | | `SYNC_BOT_TOKEN` | Bot API token (monorepo access + josh-proxy auth) | | `SUBREPO_SSH_KEY` | SSH private key for subrepo push (if using SSH auth) | | `SUBREPO_TOKEN` | Optional separate subrepo token (defaults to `SYNC_BOT_TOKEN`) | > **GitHub Actions note:** These examples target Gitea Actions. For GitHub Actions, change the `uses:` reference to a GitHub repo (e.g., `org/josh-sync@v1`) and `runs-on:` to a GitHub runner (e.g., `ubuntu-latest`). ## How Ongoing Sync Works Once set up, sync runs automatically: ### Forward sync (mono → subrepo) Triggered by pushes to target subfolders or on a cron schedule: 1. Clones the monorepo through josh-proxy (filtered view of the subfolder) 2. Fetches the subrepo branch for comparison 3. If trees are identical → skip 4. If subrepo branch doesn't exist → fresh push 5. Merges mono changes on top of subrepo state 6. If clean merge → pushes with `--force-with-lease` (protects against concurrent changes) 7. If lease rejected → retries on next run (subrepo changed during sync) 8. If merge conflict → creates a conflict PR on the subrepo ### Reverse sync (subrepo → mono) Runs on a cron schedule (never triggered by subrepo pushes): 1. Clones the subrepo 2. Fetches the monorepo's josh-filtered view for comparison 3. Finds new human commits (filters out bot commits by checking for the `Josh-Sync-Origin:` trailer) 4. If no new human commits → skip 5. Pushes through josh-proxy to a staging branch 6. Creates a PR on the monorepo — **never pushes directly** ### Loop prevention Bot commits include a git trailer like `Josh-Sync-Origin: forward/main/2024-02-12T10:30:00Z`. Each sync direction filters out commits with this trailer, preventing changes from bouncing back and forth. The CI action also has a loop guard that skips entirely if the HEAD commit has the trailer. ### State tracking Sync state is stored as JSON files on an orphan branch (`josh-sync-state`), one file per target/branch. This tracks the last-synced commit SHAs and timestamps to avoid re-syncing the same changes. ## Adding a New Target To add a new subrepo after initial setup: 1. Add the target to `.josh-sync.yml` 2. Update the forward workflow's `paths:` list to include the new subfolder 3. Commit and push 4. Import the target: ```bash # Recommended: interactive onboard (preserves open PRs) josh-sync onboard new-target # Or manual: import → merge PR → reset josh-sync import new-target # merge the PR josh-sync reset new-target ``` 5. Verify with `josh-sync status` ## Troubleshooting ### "Failed to clone through josh-proxy" - Check josh-proxy is running and accessible - Verify `monorepo_path` matches what josh-proxy expects - Test manually: `git ls-remote https://:@josh.example.com/org/repo.git:/services/app.git` ### SSH authentication failures - `SUBREPO_SSH_KEY` must contain the actual key content, not a file path - For per-target keys, ensure `subrepo_ssh_key_var` in config matches the env var name - Check the key has write access to the subrepo ### "Force-with-lease rejected" Normal: the subrepo changed while sync was running. The next sync run will pick it up. If persistent, check for another process pushing to the subrepo simultaneously. ### "Josh rejected push" (reverse sync) Josh-proxy couldn't map the push back to the monorepo. Check josh-proxy logs, verify the josh filter is correct. May indicate a history divergence — consider running `josh-sync reset `. ### Import PR shows "No changes" The subfolder already contains the same content as the subrepo. This is fine — the import is a no-op. ### Duplicate/looping commits Verify `bot.trailer` in config matches what's in commit messages. Check the loop guard in the CI workflow is active. ### "cannot lock ref" or "expected X but got Y" **After reset (subrepo):** The subrepo's history was replaced by force-push. Local clones still have the old history: ```bash cd /path/to/subrepo git fetch origin git checkout main && git reset --hard origin/main ``` Or simply delete and re-clone. **After import/reset cycle (monorepo):** The import and reset steps create and update branches rapidly (`auto-sync/import-*`, `josh-sync-state`). If your local clone fetched partway through, tracking refs go stale: ```bash git remote prune origin && git pull ``` ### State issues ```bash # View current state josh-sync state show [branch] # Reset state (forces next sync to run regardless of SHA comparison) josh-sync state reset [branch] ```