Files
josh-sync/docs/guide.md
Slim B 22bd59a9d7 Auto-reconcile subrepo history when josh filter changes
When the exclude list changes, josh-proxy recomputes filtered history
with new SHAs, breaking common ancestry with the subrepo. Instead of
requiring a manual reset (force-push), forward sync now detects the
filter change and creates a reconciliation merge commit that connects
the old and new histories — no force-push, no re-clone needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:40:08 +03:00

21 KiB

Setup Guide

Step-by-step guide to setting up josh-sync for a new monorepo with existing subrepos.

Overview

josh-sync provides bidirectional sync between a monorepo and N external subrepos via josh-proxy:

MONOREPO                                     SUBREPOS
├── services/billing/  ──── forward ────►    billing-repo/
├── services/auth/     (push or cron)        auth-repo/
└── libs/shared/       ◄──── reverse ─────   shared-lib-repo/
                       (cron → always PR)
                  via josh-proxy (filtered git views)

Key safety properties:

  • Forward sync (mono → subrepo) uses --force-with-lease — never overwrites concurrent changes
  • Reverse sync (subrepo → mono) always creates a PR — never pushes directly
  • Git trailers (Josh-Sync-Origin:) prevent infinite sync loops
  • State tracked on an orphan branch (josh-sync-state) — survives CI runner teardown

Prerequisites

Before you begin, you need:

josh-proxy instance

A running josh-proxy that can access your monorepo's Git server. Verify connectivity:

git ls-remote https://josh.example.com/org/monorepo.git HEAD

Bot account

A dedicated Git user (e.g., josh-sync-bot) with:

  • Write access to the monorepo
  • Write access to all subrepos
  • Ability to create PRs on both monorepo and subrepo platforms

Credentials

Variable Purpose Required
SYNC_BOT_USER Bot's Git username Yes
SYNC_BOT_TOKEN API token with repo scope (monorepo + josh-proxy auth) Yes
SUBREPO_SSH_KEY SSH private key for subrepo access (if using SSH auth) If SSH
SUBREPO_TOKEN HTTPS token for subrepo access (defaults to SYNC_BOT_TOKEN) No

Per-target credential overrides are supported — see Configuration Reference.

Tool dependencies

bash >=4, git, curl, jq, yq (mikefarah/yq v4+), openssh, rsync

The Nix flake bundles all dependencies automatically.

Step 1: Create the Monorepo

Create a new repository on your Git server (e.g., org/monorepo). Create subdirectories for each subrepo you want to sync:

mkdir -p services/billing services/auth libs/shared

These directories will be populated during the import step. They can be empty or contain .gitkeep files for now.

Verify josh-proxy can see the monorepo:

git ls-remote https://josh.example.com/org/monorepo.git HEAD

Step 2: Configure .josh-sync.yml

Create .josh-sync.yml at the monorepo root. Each target maps a monorepo subfolder to an external subrepo:

josh:
  proxy_url: "https://josh.example.com"     # josh-proxy URL (no trailing slash)
  monorepo_path: "org/monorepo"              # repo path as josh sees it

targets:
  - name: "billing"                          # unique identifier
    subfolder: "services/billing"            # monorepo subfolder
    # josh_filter auto-derived as ":/services/billing" if omitted
    subrepo_url: "git@gitea.example.com:ext/billing.git"
    subrepo_auth: "ssh"                      # "https" (default) or "ssh"
    branches:
      main: main                             # mono_branch: subrepo_branch
    forward_only: []
    exclude:                                 # files excluded from subrepo (optional)
      - ".monorepo/"                         # monorepo-only config dir
      - "**/internal/"                       # internal dirs at any depth

  - name: "auth"
    subfolder: "services/auth"
    subrepo_url: "https://gitea.example.com/ext/auth.git"
    subrepo_auth: "https"
    subrepo_token_var: "AUTH_REPO_TOKEN"      # per-target credential override
    branches:
      main: main
      develop: develop                        # multiple branches supported
    forward_only: []

  - name: "shared-lib"
    subfolder: "libs/shared"
    subrepo_url: "https://gitea.example.com/ext/shared-lib.git"
    branches:
      main: main
    forward_only: [main]                      # one-way: mono → subrepo only

bot:
  name: "josh-sync-bot"
  email: "josh-sync-bot@example.com"
  trailer: "Josh-Sync-Origin"                # git trailer for loop prevention

For the full field reference, see Configuration Reference.

Step 3: Set Up Local Dev Environment

devenv.yaml — declare josh-sync as a flake input:

inputs:
  nixpkgs:
    url: github:cachix/devenv-nixpkgs/rolling
  josh-sync:
    url: git+https://your-gitea.example.com/org/josh-sync?ref=main
    flake: true

devenv.nix — import the josh-sync module:

{ inputs, ... }:
{
  imports = [ inputs.josh-sync.devenvModules.default ];

  name = "my-monorepo";

  # .env contains secrets, not devenv config
  dotenv.disableHint = true;
}

.envrc — activate devenv automatically:

DEVENV_WARN_TIMEOUT=20
use devenv

.env — local credentials (add to .gitignore):

SYNC_BOT_USER=sync-bot
SYNC_BOT_TOKEN=<your-api-token>
SUBREPO_SSH_KEY="-----BEGIN OPENSSH PRIVATE KEY-----
...
-----END OPENSSH PRIVATE KEY-----"
# Per-target overrides:
# AUTH_REPO_TOKEN=<auth-specific-token>

Updating josh-sync in devenv

To update to the latest version:

devenv update josh-sync

Or with plain Nix flakes:

nix flake lock --update-input josh-sync

To pin to a specific version, use a tag ref in devenv.yaml:

josh-sync:
  url: git+https://your-gitea.example.com/org/josh-sync?ref=v1.1
  flake: true

After updating, verify the version:

josh-sync --version

Option B: Manual installation

Install the required tools, then either:

  • Clone the josh-sync repo and add bin/ to your PATH
  • Run make build to create a single bundled script at dist/josh-sync

Step 4: Validate with Preflight

josh-sync preflight

This validates:

  • Config syntax and required fields
  • josh-proxy connectivity (via git ls-remote through josh)
  • Subrepo connectivity and authentication
  • Branch mappings
  • CI workflow path coverage (checks if .gitea/workflows/josh-sync-forward.yml paths match target subfolders)

For a new monorepo before import, preflight may warn that subfolders don't exist yet — that's expected.

Step 5: Import Existing Subrepos

This is the critical onboarding step. There are two approaches:

  • josh-sync onboard (recommended) — interactive, resumable, preserves open PRs
  • Manual import → merge → reset — lower-level, for automation or when there are no open PRs to preserve

The onboard command walks you through the entire process interactively, with checkpoint/resume at every step.

Before you start:

  1. Rename the existing subrepo on your Git server (e.g., stores/storefrontstores/storefront-archived)
  2. Create a new empty repo at the original path (e.g., a new stores/storefront with no commits)

The rename preserves the archived repo with all its history and open PRs. The new empty repo will receive josh-filtered history.

Run onboard:

josh-sync onboard billing

The command will:

  1. Verify prerequisites — checks the new empty repo is reachable, asks for the archived repo URL
  2. Import — copies subrepo content into monorepo and creates import PRs (one per branch)
  3. Wait for merge — shows PR numbers and waits for you to merge them
  4. Reset — pushes josh-filtered history to the new subrepo (per-branch, with resume)
  5. Done — prints instructions for developers and PR migration

If the process is interrupted at any point, re-run josh-sync onboard billing to resume from where it left off. Use --restart to start over.

Migrate open PRs:

After onboard completes, migrate PRs from the archived repo to the new one:

# Interactive — lists open PRs and lets you pick
josh-sync migrate-pr billing

# Migrate all open PRs at once
josh-sync migrate-pr billing --all

# Migrate specific PRs by number
josh-sync migrate-pr billing 5 8 12

PR migration works by fetching the diff from the archived repo's PR, applying it to the new repo, and creating a new PR. File content is identical after reset, so patches apply cleanly.

Option B: Manual import → merge → reset

Use this when the subrepo has no open PRs to preserve, or for scripted automation.

Do this one target at a time to keep PRs reviewable.

5b-1. Import

josh-sync import billing

This:

  1. Clones the monorepo directly (not through josh)
  2. Clones the subrepo
  3. Copies subrepo content into the monorepo subfolder via rsync
  4. Creates a branch auto-sync/import-billing-<timestamp>
  5. Pushes it and creates a PR on the monorepo

Review the import PR — check for leaked credentials, environment-specific config, or files that shouldn't be in the monorepo.

5b-2. Merge the import PR

Merge the PR using your Git platform's UI. This lands the subrepo content into the monorepo's main branch.

At this point, the monorepo has the content but the histories are disconnected. Sync will not work until you complete the reset step.

5b-3. Reset

josh-sync reset billing

You do NOT need to git pull locally before running reset. The reset command clones fresh from josh-proxy — it never uses your local working copy.

This:

  1. Clones the monorepo through josh-proxy with the josh filter (the "filtered view")
  2. Force-pushes that filtered view to the subrepo, replacing its history

This establishes shared commit ancestry between josh's filtered view and the subrepo. Without this, josh-proxy can't compute diffs between the two.

Warning: This is a destructive force-push that replaces the subrepo's history. Back up any important branches or tags in the subrepo beforehand. Merge or close all open pull requests on the subrepo first — they will be invalidated.

After reset, every developer with a local clone of the subrepo must update their local copy to match the new history:

cd /path/to/local-subrepo
git fetch origin
git checkout main && git reset --hard origin/main
git checkout stage && git reset --hard origin/stage   # repeat for each branch

Or simply delete and re-clone the subrepo. Local-only branches (not pushed to the remote) will be lost either way.

5b-4. Repeat for each target

For each target:
  1. josh-sync import <target>
  2. Review and merge the import PR on the monorepo
  3. josh-sync reset <target>

Verify

After all targets are imported and reset (whichever option you used):

# Check all targets show state
josh-sync status

# Test forward sync — should return "skip" (trees are identical after reset)
josh-sync sync --forward --target billing

# Test reverse sync — should return "skip" (no new human commits)
josh-sync sync --reverse --target billing

Step 6: Set Up CI Workflows

Forward sync (mono → subrepo)

Create .gitea/workflows/josh-sync-forward.yml:

name: "Josh Sync → Subrepo"

on:
  push:
    branches: [main]
    paths:
      # List ALL target subfolders:
      - "services/billing/**"
      - "services/auth/**"
      - "libs/shared/**"
  schedule:
    - cron: "0 */6 * * *"         # every 6 hours as fallback
  workflow_dispatch:
    inputs:
      target:
        description: "Target to sync (empty = detect from push or all)"
        required: false
        default: ""
      branch:
        description: "Branch to sync (empty = triggered branch or all)"
        required: false
        default: ""

concurrency:
  group: josh-sync-fwd-${{ github.ref_name }}
  cancel-in-progress: false

jobs:
  sync:
    runs-on: docker
    container: node:20-bookworm
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 2               # needed for target detection

      - name: Install tools
        run: |
          apt-get update -qq && apt-get install -y -qq jq curl git openssh-client >/dev/null 2>&1
          curl -sL "https://github.com/mikefarah/yq/releases/download/v4.44.6/yq_linux_amd64" \
            -o /usr/local/bin/yq && chmod +x /usr/local/bin/yq

      - name: Detect changed target
        if: github.event_name == 'push'
        id: detect
        run: |
          CHANGED=$(git diff --name-only HEAD~1 HEAD 2>/dev/null || echo "")
          TARGETS=$(yq -o json '.targets' .josh-sync.yml \
            | jq -r '.[] | "\(.name):\(.subfolder)"' \
            | while IFS=: read -r name prefix; do
                echo "$CHANGED" | grep -q "^${prefix}/" && echo "$name"
              done | sort -u | paste -sd ',' -)
          echo "targets=${TARGETS}" >> "$GITHUB_OUTPUT"

      - uses: https://your-gitea.example.com/org/josh-sync@v1
        with:
          direction: forward
          target: ${{ github.event.inputs.target || steps.detect.outputs.targets }}
          branch: ${{ github.event.inputs.branch || github.ref_name }}
        env:
          SYNC_BOT_USER: ${{ secrets.SYNC_BOT_USER }}
          SYNC_BOT_TOKEN: ${{ secrets.SYNC_BOT_TOKEN }}
          SUBREPO_TOKEN: ${{ secrets.SUBREPO_TOKEN || secrets.SYNC_BOT_TOKEN }}
          SUBREPO_SSH_KEY: ${{ secrets.SUBREPO_SSH_KEY }}

Reverse sync (subrepo → mono)

Create .gitea/workflows/josh-sync-reverse.yml:

name: "Josh Sync ← Subrepo"

on:
  schedule:
    - cron: "0 1,7,13,19 * * *"   # every 6h, offset from forward
  workflow_dispatch:
    inputs:
      target:
        description: "Target to sync (empty = all)"
        required: false
        default: ""
      branch:
        description: "Branch to sync (empty = all eligible)"
        required: false
        default: ""

concurrency:
  group: josh-sync-rev-${{ github.event.inputs.target || 'all' }}
  cancel-in-progress: false

jobs:
  sync:
    runs-on: docker
    container: node:20-bookworm
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4

      - name: Install tools
        run: |
          apt-get update -qq && apt-get install -y -qq jq curl git openssh-client >/dev/null 2>&1
          curl -sL "https://github.com/mikefarah/yq/releases/download/v4.44.6/yq_linux_amd64" \
            -o /usr/local/bin/yq && chmod +x /usr/local/bin/yq

      - uses: https://your-gitea.example.com/org/josh-sync@v1
        with:
          direction: reverse
          target: ${{ github.event.inputs.target || '' }}
          branch: ${{ github.event.inputs.branch || '' }}
        env:
          SYNC_BOT_USER: ${{ secrets.SYNC_BOT_USER }}
          SYNC_BOT_TOKEN: ${{ secrets.SYNC_BOT_TOKEN }}
          SUBREPO_TOKEN: ${{ secrets.SUBREPO_TOKEN || secrets.SYNC_BOT_TOKEN }}
          SUBREPO_SSH_KEY: ${{ secrets.SUBREPO_SSH_KEY }}

Required CI secrets

Secret Purpose
SYNC_BOT_USER Bot username
SYNC_BOT_TOKEN Bot API token (monorepo access + josh-proxy auth)
SUBREPO_SSH_KEY SSH private key for subrepo push (if using SSH auth)
SUBREPO_TOKEN Optional separate subrepo token (defaults to SYNC_BOT_TOKEN)

GitHub Actions note: These examples target Gitea Actions. For GitHub Actions, change the uses: reference to a GitHub repo (e.g., org/josh-sync@v1) and runs-on: to a GitHub runner (e.g., ubuntu-latest).

How Ongoing Sync Works

Once set up, sync runs automatically:

Forward sync (mono → subrepo)

Triggered by pushes to target subfolders or on a cron schedule:

  1. Clones the monorepo through josh-proxy (filtered view of the subfolder)
  2. Fetches the subrepo branch for comparison
  3. If trees are identical → skip
  4. If subrepo branch doesn't exist → fresh push
  5. Merges mono changes on top of subrepo state
  6. If clean merge → pushes with --force-with-lease (protects against concurrent changes)
  7. If lease rejected → retries on next run (subrepo changed during sync)
  8. If merge conflict → creates a conflict PR on the subrepo

Reverse sync (subrepo → mono)

Runs on a cron schedule (never triggered by subrepo pushes):

  1. Clones the subrepo
  2. Fetches the monorepo's josh-filtered view for comparison
  3. Finds new human commits (filters out bot commits by checking for the Josh-Sync-Origin: trailer)
  4. If no new human commits → skip
  5. Pushes through josh-proxy to a staging branch
  6. Creates a PR on the monorepo — never pushes directly

Loop prevention

Bot commits include a git trailer like Josh-Sync-Origin: forward/main/2024-02-12T10:30:00Z. Each sync direction filters out commits with this trailer, preventing changes from bouncing back and forth. The CI action also has a loop guard that skips entirely if the HEAD commit has the trailer.

State tracking

Sync state is stored as JSON files on an orphan branch (josh-sync-state), one file per target/branch. This tracks the last-synced commit SHAs and timestamps to avoid re-syncing the same changes.

Excluding Files from Sync

Some files in the monorepo subfolder may not belong in the subrepo (e.g., monorepo-specific CI configs, internal tooling). The exclude config field removes these at the josh-proxy layer — excluded files never appear in the subrepo.

Configuration

Add an exclude list to any target:

targets:
  - name: "billing"
    subfolder: "services/billing"
    subrepo_url: "git@host:org/billing.git"
    exclude:
      - ".monorepo/"          # directory at subfolder root
      - "**/internal/"        # directory at any depth
      - "*.secret"            # files by extension
    branches:
      main: main

How it works

When exclude is present, josh-sync appends an inline :exclude filter to the josh-proxy URL. For the example above, the josh filter becomes:

:/services/billing:exclude[::.monorepo/,::**/internal/,::*.secret]

Josh-proxy applies this filter at the transport layer — no extra files to generate or commit. This means:

  • Forward sync: the filtered clone already excludes the files
  • Reverse sync: pushes through josh also respect the exclusion
  • Reset: the subrepo history never contains excluded files
  • Tree comparison: skip detection works correctly (excluded files are not in the diff)

Pattern syntax

Josh uses :: patterns inside :exclude[...]:

Pattern Matches
dir/ Directory at subfolder root
file File at subfolder root
**/dir/ Directory at any depth
**/file File at any depth
*.ext Glob pattern (single * only)

Setup

  1. Add exclude to the target in .josh-sync.yml
  2. Run josh-sync preflight to verify the filter works
  3. Forward sync will now exclude the specified files

No extra files to generate or commit — the exclusion is embedded directly in the josh-proxy URL.

Changing the exclude list

You can safely add or remove patterns from exclude at any time. When josh-sync detects that the filter has changed since the last sync, it automatically creates a reconciliation merge commit on the subrepo that connects the old and new histories — no manual reset or force-push required. Developers do not need to re-clone the subrepo.

Adding a New Target

To add a new subrepo after initial setup:

  1. Add the target to .josh-sync.yml
  2. Update the forward workflow's paths: list to include the new subfolder
  3. Commit and push
  4. Import the target:
    # Recommended: interactive onboard (preserves open PRs)
    josh-sync onboard new-target
    
    # Or manual: import → merge PR → reset
    josh-sync import new-target
    # merge the PR
    josh-sync reset new-target
    
  5. Verify with josh-sync status

Troubleshooting

"Failed to clone through josh-proxy"

  • Check josh-proxy is running and accessible
  • Verify monorepo_path matches what josh-proxy expects
  • Test manually: git ls-remote https://<user>:<token>@josh.example.com/org/repo.git:/services/app.git

SSH authentication failures

  • SUBREPO_SSH_KEY must contain the actual key content, not a file path
  • For per-target keys, ensure subrepo_ssh_key_var in config matches the env var name
  • Check the key has write access to the subrepo

"Force-with-lease rejected"

Normal: the subrepo changed while sync was running. The next sync run will pick it up. If persistent, check for another process pushing to the subrepo simultaneously.

"Josh rejected push" (reverse sync)

Josh-proxy couldn't map the push back to the monorepo. Check josh-proxy logs, verify the josh filter is correct. May indicate a history divergence — consider running josh-sync reset <target>.

Import PR shows "No changes"

The subfolder already contains the same content as the subrepo. This is fine — the import is a no-op.

Duplicate/looping commits

Verify bot.trailer in config matches what's in commit messages. Check the loop guard in the CI workflow is active.

"cannot lock ref" or "expected X but got Y"

After reset (subrepo): The subrepo's history was replaced by force-push. Local clones still have the old history:

cd /path/to/subrepo
git fetch origin
git checkout main && git reset --hard origin/main

Or simply delete and re-clone.

After import/reset cycle (monorepo): The import and reset steps create and update branches rapidly (auto-sync/import-*, josh-sync-state). If your local clone fetched partway through, tracking refs go stale:

git remote prune origin && git pull

State issues

# View current state
josh-sync state show <target> [branch]

# Reset state (forces next sync to run regardless of SHA comparison)
josh-sync state reset <target> [branch]