16 KiB
Setup Guide
Step-by-step guide to setting up josh-sync for a new monorepo with existing subrepos.
Overview
josh-sync provides bidirectional sync between a monorepo and N external subrepos via josh-proxy:
MONOREPO SUBREPOS
├── services/billing/ ──── forward ────► billing-repo/
├── services/auth/ (push or cron) auth-repo/
└── libs/shared/ ◄──── reverse ───── shared-lib-repo/
(cron → always PR)
via josh-proxy (filtered git views)
Key safety properties:
- Forward sync (mono → subrepo) uses
--force-with-lease— never overwrites concurrent changes - Reverse sync (subrepo → mono) always creates a PR — never pushes directly
- Git trailers (
Josh-Sync-Origin:) prevent infinite sync loops - State tracked on an orphan branch (
josh-sync-state) — survives CI runner teardown
Prerequisites
Before you begin, you need:
josh-proxy instance
A running josh-proxy that can access your monorepo's Git server. Verify connectivity:
git ls-remote https://josh.example.com/org/monorepo.git HEAD
Bot account
A dedicated Git user (e.g., josh-sync-bot) with:
- Write access to the monorepo
- Write access to all subrepos
- Ability to create PRs on both monorepo and subrepo platforms
Credentials
| Variable | Purpose | Required |
|---|---|---|
SYNC_BOT_USER |
Bot's Git username | Yes |
SYNC_BOT_TOKEN |
API token with repo scope (monorepo + josh-proxy auth) | Yes |
SUBREPO_SSH_KEY |
SSH private key for subrepo access (if using SSH auth) | If SSH |
SUBREPO_TOKEN |
HTTPS token for subrepo access (defaults to SYNC_BOT_TOKEN) |
No |
Per-target credential overrides are supported — see Configuration Reference.
Tool dependencies
bash >=4, git, curl, jq, yq (mikefarah/yq v4+), openssh, rsync
The Nix flake bundles all dependencies automatically.
Step 1: Create the Monorepo
Create a new repository on your Git server (e.g., org/monorepo). Create subdirectories for each subrepo you want to sync:
mkdir -p services/billing services/auth libs/shared
These directories will be populated during the import step. They can be empty or contain .gitkeep files for now.
Verify josh-proxy can see the monorepo:
git ls-remote https://josh.example.com/org/monorepo.git HEAD
Step 2: Configure .josh-sync.yml
Create .josh-sync.yml at the monorepo root. Each target maps a monorepo subfolder to an external subrepo:
josh:
proxy_url: "https://josh.example.com" # josh-proxy URL (no trailing slash)
monorepo_path: "org/monorepo" # repo path as josh sees it
targets:
- name: "billing" # unique identifier
subfolder: "services/billing" # monorepo subfolder
# josh_filter auto-derived as ":/services/billing" if omitted
subrepo_url: "git@gitea.example.com:ext/billing.git"
subrepo_auth: "ssh" # "https" (default) or "ssh"
branches:
main: main # mono_branch: subrepo_branch
forward_only: []
- name: "auth"
subfolder: "services/auth"
subrepo_url: "https://gitea.example.com/ext/auth.git"
subrepo_auth: "https"
subrepo_token_var: "AUTH_REPO_TOKEN" # per-target credential override
branches:
main: main
develop: develop # multiple branches supported
forward_only: []
- name: "shared-lib"
subfolder: "libs/shared"
subrepo_url: "https://gitea.example.com/ext/shared-lib.git"
branches:
main: main
forward_only: [main] # one-way: mono → subrepo only
bot:
name: "josh-sync-bot"
email: "josh-sync-bot@example.com"
trailer: "Josh-Sync-Origin" # git trailer for loop prevention
For the full field reference, see Configuration Reference.
Step 3: Set Up Local Dev Environment
Option A: Nix + devenv (recommended)
devenv.yaml — declare josh-sync as a flake input:
inputs:
nixpkgs:
url: github:cachix/devenv-nixpkgs/rolling
josh-sync:
url: git+https://your-gitea.example.com/org/josh-sync?ref=main
flake: true
devenv.nix — import the josh-sync module:
{ inputs, ... }:
{
imports = [ inputs.josh-sync.devenvModules.default ];
name = "my-monorepo";
# .env contains secrets, not devenv config
dotenv.disableHint = true;
}
.envrc — activate devenv automatically:
DEVENV_WARN_TIMEOUT=20
use devenv
.env — local credentials (add to .gitignore):
SYNC_BOT_USER=sync-bot
SYNC_BOT_TOKEN=<your-api-token>
SUBREPO_SSH_KEY="-----BEGIN OPENSSH PRIVATE KEY-----
...
-----END OPENSSH PRIVATE KEY-----"
# Per-target overrides:
# AUTH_REPO_TOKEN=<auth-specific-token>
Option B: Manual installation
Install the required tools, then either:
- Clone the josh-sync repo and add
bin/to yourPATH - Run
make buildto create a single bundled script atdist/josh-sync
Step 4: Validate with Preflight
josh-sync preflight
This validates:
- Config syntax and required fields
- josh-proxy connectivity (via
git ls-remotethrough josh) - Subrepo connectivity and authentication
- Branch mappings
- CI workflow path coverage (checks if
.gitea/workflows/josh-sync-forward.ymlpaths match target subfolders)
For a new monorepo before import, preflight may warn that subfolders don't exist yet — that's expected.
Step 5: Import Existing Subrepos
This is the critical onboarding step. For each existing subrepo, you run a three-step cycle: import → merge → reset.
Do this one target at a time to keep PRs reviewable.
5a. Import
josh-sync import billing
This:
- Clones the monorepo directly (not through josh)
- Clones the subrepo
- Copies subrepo content into the monorepo subfolder via
rsync - Creates a branch
auto-sync/import-billing-<timestamp> - Pushes it and creates a PR on the monorepo
Review the import PR — check for leaked credentials, environment-specific config, or files that shouldn't be in the monorepo.
5b. Merge the import PR
Merge the PR using your Git platform's UI. This lands the subrepo content into the monorepo's main branch.
At this point, the monorepo has the content but the histories are disconnected. Sync will not work until you complete the reset step.
5c. Reset
josh-sync reset billing
You do NOT need to
git pulllocally before running reset. The reset command clones fresh from josh-proxy — it never uses your local working copy.
This:
- Clones the monorepo through josh-proxy with the josh filter (the "filtered view")
- Force-pushes that filtered view to the subrepo, replacing its history
This establishes shared commit ancestry between josh's filtered view and the subrepo. Without this, josh-proxy can't compute diffs between the two.
Warning: This is a destructive force-push that replaces the subrepo's history. Back up any important branches or tags in the subrepo beforehand. Merge or close all open pull requests on the subrepo first — they will be invalidated.
After reset, every developer with a local clone of the subrepo must update their local copy to match the new history:
cd /path/to/local-subrepo
git fetch origin
git checkout main && git reset --hard origin/main
git checkout stage && git reset --hard origin/stage # repeat for each branch
Or simply delete and re-clone the subrepo. Local-only branches (not pushed to the remote) will be lost either way.
5d. Repeat for each target
For each target:
1. josh-sync import <target>
2. Review and merge the import PR on the monorepo
3. josh-sync reset <target>
5e. Verify
After all targets are imported and reset:
# Check all targets show state
josh-sync status
# Test forward sync — should return "skip" (trees are identical after reset)
josh-sync sync --forward --target billing
# Test reverse sync — should return "skip" (no new human commits)
josh-sync sync --reverse --target billing
Step 6: Set Up CI Workflows
Forward sync (mono → subrepo)
Create .gitea/workflows/josh-sync-forward.yml:
name: "Josh Sync → Subrepo"
on:
push:
branches: [main]
paths:
# List ALL target subfolders:
- "services/billing/**"
- "services/auth/**"
- "libs/shared/**"
schedule:
- cron: "0 */6 * * *" # every 6 hours as fallback
workflow_dispatch:
inputs:
target:
description: "Target to sync (empty = detect from push or all)"
required: false
default: ""
branch:
description: "Branch to sync (empty = triggered branch or all)"
required: false
default: ""
concurrency:
group: josh-sync-fwd-${{ github.ref_name }}
cancel-in-progress: false
jobs:
sync:
runs-on: docker
container: node:20-bookworm
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 2 # needed for target detection
- name: Install tools
run: |
apt-get update -qq && apt-get install -y -qq jq curl git openssh-client >/dev/null 2>&1
curl -sL "https://github.com/mikefarah/yq/releases/download/v4.44.6/yq_linux_amd64" \
-o /usr/local/bin/yq && chmod +x /usr/local/bin/yq
- name: Detect changed target
if: github.event_name == 'push'
id: detect
run: |
CHANGED=$(git diff --name-only HEAD~1 HEAD 2>/dev/null || echo "")
TARGETS=$(yq -o json '.targets' .josh-sync.yml \
| jq -r '.[] | "\(.name):\(.subfolder)"' \
| while IFS=: read -r name prefix; do
echo "$CHANGED" | grep -q "^${prefix}/" && echo "$name"
done | sort -u | paste -sd ',' -)
echo "targets=${TARGETS}" >> "$GITHUB_OUTPUT"
- uses: https://your-gitea.example.com/org/josh-sync@v1
with:
direction: forward
target: ${{ github.event.inputs.target || steps.detect.outputs.targets }}
branch: ${{ github.event.inputs.branch || github.ref_name }}
env:
SYNC_BOT_USER: ${{ secrets.SYNC_BOT_USER }}
SYNC_BOT_TOKEN: ${{ secrets.SYNC_BOT_TOKEN }}
SUBREPO_TOKEN: ${{ secrets.SUBREPO_TOKEN || secrets.SYNC_BOT_TOKEN }}
SUBREPO_SSH_KEY: ${{ secrets.SUBREPO_SSH_KEY }}
Reverse sync (subrepo → mono)
Create .gitea/workflows/josh-sync-reverse.yml:
name: "Josh Sync ← Subrepo"
on:
schedule:
- cron: "0 1,7,13,19 * * *" # every 6h, offset from forward
workflow_dispatch:
inputs:
target:
description: "Target to sync (empty = all)"
required: false
default: ""
branch:
description: "Branch to sync (empty = all eligible)"
required: false
default: ""
concurrency:
group: josh-sync-rev-${{ github.event.inputs.target || 'all' }}
cancel-in-progress: false
jobs:
sync:
runs-on: docker
container: node:20-bookworm
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
- name: Install tools
run: |
apt-get update -qq && apt-get install -y -qq jq curl git openssh-client >/dev/null 2>&1
curl -sL "https://github.com/mikefarah/yq/releases/download/v4.44.6/yq_linux_amd64" \
-o /usr/local/bin/yq && chmod +x /usr/local/bin/yq
- uses: https://your-gitea.example.com/org/josh-sync@v1
with:
direction: reverse
target: ${{ github.event.inputs.target || '' }}
branch: ${{ github.event.inputs.branch || '' }}
env:
SYNC_BOT_USER: ${{ secrets.SYNC_BOT_USER }}
SYNC_BOT_TOKEN: ${{ secrets.SYNC_BOT_TOKEN }}
SUBREPO_TOKEN: ${{ secrets.SUBREPO_TOKEN || secrets.SYNC_BOT_TOKEN }}
SUBREPO_SSH_KEY: ${{ secrets.SUBREPO_SSH_KEY }}
Required CI secrets
| Secret | Purpose |
|---|---|
SYNC_BOT_USER |
Bot username |
SYNC_BOT_TOKEN |
Bot API token (monorepo access + josh-proxy auth) |
SUBREPO_SSH_KEY |
SSH private key for subrepo push (if using SSH auth) |
SUBREPO_TOKEN |
Optional separate subrepo token (defaults to SYNC_BOT_TOKEN) |
GitHub Actions note: These examples target Gitea Actions. For GitHub Actions, change the
uses:reference to a GitHub repo (e.g.,org/josh-sync@v1) andruns-on:to a GitHub runner (e.g.,ubuntu-latest).
How Ongoing Sync Works
Once set up, sync runs automatically:
Forward sync (mono → subrepo)
Triggered by pushes to target subfolders or on a cron schedule:
- Clones the monorepo through josh-proxy (filtered view of the subfolder)
- Fetches the subrepo branch for comparison
- If trees are identical → skip
- If subrepo branch doesn't exist → fresh push
- Merges mono changes on top of subrepo state
- If clean merge → pushes with
--force-with-lease(protects against concurrent changes) - If lease rejected → retries on next run (subrepo changed during sync)
- If merge conflict → creates a conflict PR on the subrepo
Reverse sync (subrepo → mono)
Runs on a cron schedule (never triggered by subrepo pushes):
- Clones the subrepo
- Fetches the monorepo's josh-filtered view for comparison
- Finds new human commits (filters out bot commits by checking for the
Josh-Sync-Origin:trailer) - If no new human commits → skip
- Pushes through josh-proxy to a staging branch
- Creates a PR on the monorepo — never pushes directly
Loop prevention
Bot commits include a git trailer like Josh-Sync-Origin: forward/main/2024-02-12T10:30:00Z. Each sync direction filters out commits with this trailer, preventing changes from bouncing back and forth. The CI action also has a loop guard that skips entirely if the HEAD commit has the trailer.
State tracking
Sync state is stored as JSON files on an orphan branch (josh-sync-state), one file per target/branch. This tracks the last-synced commit SHAs and timestamps to avoid re-syncing the same changes.
Adding a New Target
To add a new subrepo after initial setup:
- Add the target to
.josh-sync.yml - Update the forward workflow's
paths:list to include the new subfolder - Commit and push
- Run the import-merge-reset cycle for the new target:
josh-sync import new-target # merge the PR josh-sync reset new-target - Verify with
josh-sync status
Troubleshooting
"Failed to clone through josh-proxy"
- Check josh-proxy is running and accessible
- Verify
monorepo_pathmatches what josh-proxy expects - Test manually:
git ls-remote https://<user>:<token>@josh.example.com/org/repo.git:/services/app.git
SSH authentication failures
SUBREPO_SSH_KEYmust contain the actual key content, not a file path- For per-target keys, ensure
subrepo_ssh_key_varin config matches the env var name - Check the key has write access to the subrepo
"Force-with-lease rejected"
Normal: the subrepo changed while sync was running. The next sync run will pick it up. If persistent, check for another process pushing to the subrepo simultaneously.
"Josh rejected push" (reverse sync)
Josh-proxy couldn't map the push back to the monorepo. Check josh-proxy logs, verify the josh filter is correct. May indicate a history divergence — consider running josh-sync reset <target>.
Import PR shows "No changes"
The subfolder already contains the same content as the subrepo. This is fine — the import is a no-op.
Duplicate/looping commits
Verify bot.trailer in config matches what's in commit messages. Check the loop guard in the CI workflow is active.
"cannot lock ref" or "expected X but got Y"
After reset (subrepo): The subrepo's history was replaced by force-push. Local clones still have the old history:
cd /path/to/subrepo
git fetch origin
git checkout main && git reset --hard origin/main
Or simply delete and re-clone.
After import/reset cycle (monorepo): The import and reset steps create and update branches rapidly (auto-sync/import-*, josh-sync-state). If your local clone fetched partway through, tracking refs go stale:
git remote prune origin && git pull
State issues
# View current state
josh-sync state show <target> [branch]
# Reset state (forces next sync to run regardless of SHA comparison)
josh-sync state reset <target> [branch]