Skip to content

The monorepo rewrite story

Martin Adámek edited this page Jun 9, 2021 · 4 revisions

A short summary of what changed in v1

Lerna with NPM 7 workspaces

  • we could also go with yarn (better support for workspaces and lock files, e.g. local packages are not part of the lock file)
  • lerna also has its own way of handling workspaces (lerna bootstrap) but that results in each package having its own lock file
  • results in NPM 7 being required for installing
  • root package.json is marked as private, and is used mostly for dev dependencies - as those are shared across all packages
  • child packages are in packages folder, each having its own package.json and TS configs
  • lerna will handle running commands in topological order (based on how the child packages depend on each other)

TypeScript support

  • to allow requiring local packages without the need to compile, we need to setup paths mapping - this is done at the root tsconfig.json that is than extended in all the packages
  • we have two TS configs for each package (and two of them at the root level), one for the general usage/development (e.g. IDE support), one for building
    {
      "extends": "./tsconfig.build.json",
      "compilerOptions": {
        "baseUrl": ".",
        "paths": {
          "@apify/*": ["packages/*/src"] // <== here we let TS know that requires to `@apify/...` should be mapped to local files in `packages/*/src`
        }
      }
    }
    
  • this is the main difference between the general tsconfig.json and tsconfig.build.json - in the build context we want to maintain the requires to @apify/... packages, only during development we want to use the paths mapping

Jest setup

  • similarly to how we need to let the TS compiler know about the paths mapping, we need to adjust how jest is configure too:
module.exports = {
    // ...
    preset: 'ts-jest',
    moduleNameMapper: {
        '@apify/(.*)': '<rootDir>/packages/$1/src',
    },
    globals: {
        'ts-jest': {
            tsconfig: 'test/tsconfig.json',
        },
    },
};
  • we use custom tsconfig.json for tests as the files inside test folder do not belong to the root one
  • if any of the packages use some nonstandard compiler option (like in our case input_schema package using resolveJsonModule: true), we need to enable those options in the test TS config too

Lock files

  • due to some issues in NPM ang GH actions, we needed to use lockfiles to be able to install in CI environment
  • this allows for proper node_modules caching, which results in fast install step if no dependencies changed
  • also allows to split the pipeline into multiple steps (e.g. build/test/lint), sharing the installed dependencies

Reworked CI pipeline and Publishing

  • previously there were two very similar workflows, one for PRs and the other for commits to master that were automatically publishing beta releases
  • with the nature of this repository, we wanted to ship the stable release right ahead
  • to be able to generate changelogs and create GH releases automatically, we use conventional commits, so each commit message tells us what type of change it does (e.g. fix -> patch bump, feat -> minor bump, breaking change -> major bump)
  • we use lerna publish to handle the orchestration of the release
    • checks for changes in packages
    • decides what version bump to use
    • computes changelogs
    • creates the GH releases
    • publishes the packages

Build step

  • we have an NPM script in the root package.json npm run release
  • it will first build the app via lerna run build, which calls npm run build in each package, in topological order
  • each package builds the TS files from src folder to dist folder
  • afterwards copy.ts script is executed, copying package.json and other metafiles into the dist folder and fixes paths inside them
  • we then publish only the contents of the dist folder

Canary builds vs stable builds

  • usually the CI is used for shipping canary (dev) builds, but that means only publishing new versions to NPM
  • here we wanted to ship stable build, which also involves committing to the repository from CI
  • to allow pushing new commits, we need to use GH personal access token (plus obviously we need NPM publishing token)
    • the token needs to belong to user with admin rights to the repository if we want to push to protected branch (especially if we have required reviews enforced)
  • to allow lerna to compute correct changelogs, we need to fetch the whole repository with all tags:
    -   uses: actions/checkout@v2
        with:
            token: ${{ secrets.GH_TOKEN }}
            fetch-depth: 0 # we need to pull everything to allow lerna to detect what packages changed
            ref: master
    
  • we need to handle that the commit from CI won't trigger another CI build as we would end up in infinite loop
    • done via checking the commit message for [skip ci] fragment

Shared workflow

  • instead of two very similar workflows, we now have a single one that handles all of build/test/lint/publish jobs
  • the publish job is conditional, only for master branch, and is dependent on all the previous jobs, that are otherwise ran in parallel
  • only in the publish job we fetch the whole git repository (fetch-depth: 0)

Root changelog

  • we decided to use independent versioning mode in lerna, which means that each package has its own independent version, and version bumps are calculated separately for each package
  • due to this, we do not have a common changelog for everything, we don't even have a shared version to track
  • each package has its own changelog in its package folder
  • the shared changelog is now automatically generated after each successful publishing
  • it contains list of all packages, their versions and links to their changelogs

Root package-lock.json

  • unline with yarn, NPM will include the local packages inside its lock file
  • unfortunately lerna won't update the root lock file when publishing
  • we need to run npm install again after successful publish and commit those changes, otherwise we would end up with outdated lock file
  • we handle this in the very same commit as the one that updates the root changelog file

PR title check

  • we have merge commits disallowed and linear commit history enforced
  • PRs should be squash merged, resulting in a single commit in master
  • the commit message (which is used for infering version bumps and changelogs) is taken from the PR title
  • to validate the commit message format we need to validate the PR title via action-semantic-pull-request action used in separate workflow

Commit hooks to enforce conventional commits

  • we use husky to setup commit hooks
  • one hook checks the commit message format via commitlint package
  • one hook runs linter, but only on staged files/changes (git add)