diff --git a/docs/development/inference_performance_optimization.md b/docs/development/inference_performance_optimization.md
index e60b4728a9f..74d2aa78f74 100644
--- a/docs/development/inference_performance_optimization.md
+++ b/docs/development/inference_performance_optimization.md
@@ -12,7 +12,7 @@ memory consumption compare to Python.
 DJL `Predictor` is not designed to be thread-safe (although some implementation is),
 we recommend creating a new [Predictor](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html) for each thread.
 
-For a reference implementation, see [Multi-threaded Benchmark](https://github.com/deepjavalibrary/djl/blob/master/extensions/benchmark/src/main/java/ai/djl/benchmark/MultithreadedBenchmark.java).
+For a reference implementation, see [Multi-threaded Benchmark](https://github.com/deepjavalibrary/djl-serving/blob/master/benchmark/src/main/java/ai/djl/benchmark/MultithreadedBenchmark.java).
 
 you need to set corresponding configuration based on the engine you want to use.
 
@@ -111,10 +111,11 @@ This should only be disabled when you do not have the time to "warmup" a model w
 #### Multithreading Inference
 You can follow the same steps as other engines for running multithreading inference using TensorFlow engine.
 It's recommended to use one `Predictor` for each thread and avoid using a new `Predictor` for each inference call.
-You can refer to our [Multithreading Benchmark](https://github.com/deepjavalibrary/djl/blob/master/extensions/benchmark/src/main/java/ai/djl/benchmark/MultithreadedBenchmark.java) as an example,
+You can refer to our [Multithreading Benchmark](https://github.com/deepjavalibrary/djl-serving/blob/master/benchmark/src/main/java/ai/djl/benchmark/MultithreadedBenchmark.java) as an example,
 here is how to run it using TensorFlow engine.
 
 ```bash
+cd djl-serving
 ./gradlew benchmark --args='-e TensorFlow -c 100 -t -1 -u djl://ai.djl.tensorflow/resnet/0.0.1/resnet50 -s 1,224,224,3'
 ```
 
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
index c54afb223bf..fdaedd33340 100644
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -81,7 +81,7 @@ nav:
         - 'docs/development/configure_logging.md'
         - 'docs/how_to_collect_metrics.md'
         - 'docs/development/inference_performance_optimization.md'
-        - 'extensions/benchmark/README.md'
+        - 'docs/serving/benchmark/README.md'
         - 'docs/development/profiler.md'
       - 'docs/development/cache_management.md'
       - 'docs/development/memory_management.md'
diff --git a/extensions/benchmark/README.md b/extensions/benchmark/README.md
index ab3a8ab0f2f..9e32483c872 100644
--- a/extensions/benchmark/README.md
+++ b/extensions/benchmark/README.md
@@ -13,310 +13,5 @@ With djl-bench, you can easily compare your model's behavior in different use ca
 - running with different engines
 - running with different version of the engine
 
-djl-bench currently support benchmark the following type of models:
 
-- PyTorch TorchScript model
-- TensorFlow SavedModel bundle
-- Apache MXNet model
-- ONNX model
-- PaddlePaddle model
-- TFLite model
-- TensorRT model
-- XGBoost model
-- Python script model
-- Neo DLR (TVM) model
-
-You can build djl-bench from source if you need to benchmark fastText/BlazingText/Sentencepiece models.
-
-## Installation
-
-For macOS
-
-```
-brew install cask djl-bench
-```
-
-For Ubuntu
-
-- Install using snap
-
-```
-sudo snap install djlbench --classic
-sudo snap alias djlbench djl-bench
-```
-
-- Or download .deb package from S3
-
-```
-curl -O https://publish.djl.ai/djl-bench/0.17.0/djl-bench_0.17.0-1_all.deb
-sudo dpkg -i djl-bench_0.17.0-1_all.deb
-```
-
-For centOS or Amazon Linux 2
-
-You can download djl-bench zip file from [here](https://publish.djl.ai/djl-bench/0.17.0/benchmark-0.17.0.zip).
-
-```
-curl -O https://publish.djl.ai/djl-bench/0.17.0/benchmark-0.17.0.zip
-unzip benchmark-0.17.0.zip
-rm benchmark-0.17.0.zip
-sudo ln -s $PWD/benchmark-0.17.0/bin/benchmark /usr/bin/djl-bench
-```
-
-For Windows
-
-We are considering to create a `chocolatey` package for Windows. For the time being, you can
-download djl-bench zip file from [here](https://publish.djl.ai/djl-bench/0.17.0/benchmark-0.17.0.zip).
-
-Or you can run benchmark using gradle:
-
-```
-cd djl
-
-gradlew benchmark --args="--help"
-```
-
-## Prerequisite
-
-Please ensure Java 8+ is installed and you are using an OS that DJL supported with.
-
-After that, you need to clone the djl project and `cd` into the folder.
-
-DJL supported OS:
-
-- Ubuntu 18.04 and above
-- Amazon Linux 2 and above
-- MacOS latest version
-- Windows 10 (Windows Server 2016+)
-
-If you are trying to use GPU, please ensure the CUDA driver is installed. You can verify that through:
-
-```
-nvcc -V
-```
-
-to checkout the version. For different Deep Learning engine you are trying to run the benchmark,
-they have different CUDA version to support. Please check the individual Engine documentation to ensure your CUDA version is supported.
-
-## Sample benchmark script
-
-Here is a few sample benchmark script for you to refer. You can also skip this and directly follow
-the 4-step instructions for your own model.
-
-Benchmark on a Tensorflow model from [tfhub](https://tfhub.dev/) url with all-zeros NDArray input for 10 times:
-
-```
-djl-bench -e TensorFlow -u https://tfhub.dev/tensorflow/resnet_50/classification/1 -c 10 -s 1,224,224,3
-```
-
-Similarly, this is for PyTorch
-
-```
-djl-bench -e PyTorch -u https://alpha-djl-demos.s3.amazonaws.com/model/djl-blockrunner/pytorch_resnet18.zip -n traced_resnet18 -c 10 -s 1,3,224,224
-```
-
-Benchmark a model from [ONNX Model Zoo](https://github.com/onnx/models)
-
-```
-djl-bench -e OnnxRuntime -u https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet18v1/resnet18v1.tar.gz -s 1,3,224,224 -n resnet18v1/resnet18v1 -c 10
-```
-
-### Benchmark from ModelZoo
-
-#### MXNet
-
-Resnet50 image classification model:
-
-```
-djl-bench -c 2 -s 1,3,224,224 -u djl://ai.djl.mxnet/resnet/0.0.1/resnet50_v2
-```
-
-#### PyTorch
-
-SSD object detection model:
-
-```
-djl-bench -e PyTorch -c 2 -s 1,3,300,300 -u djl://ai.djl.pytorch/ssd/0.0.1/ssd_300_resnet50
-```
-
-## Configuration of Benchmark script
-
-To start your benchmarking, we need to make sure we provide the following information.
-
-- The Deep Learning Engine
-- The source of the model
-- How many runs you would like to make
-- Sample input for the model
-- (Optional) Multi-thread benchmark
-
-The benchmark script located [here](https://github.com/deepjavalibrary/djl/blob/master/benchmark/src/main/java/ai/djl/benchmark/Benchmark.java).
-
-Just do the following:
-
-```
-djl-bench --help
-```
-
-This will print out the possible arguments to pass in:
-
-```
-usage: djl-bench [-p MODEL-PATH] -s INPUT-SHAPES [OPTIONS]
- -c,--iteration <ITERATION>               Number of total iterations.
- -d,--duration <DURATION>                 Duration of the test in minutes.
- -e,--engine <ENGINE-NAME>                Choose an Engine for the benchmark.
- -g,--gpus <NUMBER_GPUS>                  Number of GPUS to run multithreading inference.
- -h,--help                                Print this help.
- -l,--delay <DELAY>                       Delay of incremental threads.
-    --model-arguments <MODEL-ARGUMENTS>   Specify model loading arguments.
-    --model-options <MODEL-OPTIONS>       Specify model loading options.
- -n,--model-name <MODEL-NAME>             Specify model file name.
-    --neuron-cores <NEURON-CORES>         Number of neuron cores to run multithreading inference, See
-                                          https://awsdocs-neuron.readthedocs-hosted.com.
- -o,--output-dir <OUTPUT-DIR>             Directory for output logs.
- -p,--model-path <MODEL-PATH>             Model directory file path.
- -s,--input-shapes <INPUT-SHAPES>         Input data shapes for the model.
- -t,--threads <NUMBER_THREADS>            Number of inference threads.
- -u,--model-url <MODEL-URL>               Model archive file URL.
-```
-
-### Step 1: Pick your deep engine
-
-By default, the above script will use MXNet as the default Engine, but you can always change that by adding the followings:
-
-```
--e TensorFlow # TensorFlow
--e PyTorch # PyTorch
--e MXNet # Apache MXNet
--e PaddlePaddle # PaddlePaddle
--e OnnxRuntime # pytorch
--e TFLite # TFLite
--e TensorRT # TensorRT
--e DLR # Neo DLR
--e XGBoost # XGBoost
--e Python # Python script
-```
-
-### Step 2: Identify the source of your model
-
-DJL accept variety of models came from different places.
-
-#### Remote location
-
-Use `--model-url` option to load a model from a URL. The URL must point to an archive file.
-
-The following is a pytorch model
-
-```
--u https://alpha-djl-demos.s3.amazonaws.com/model/djl-blockrunner/pytorch_resnet18.zip
-```
-We would recommend to make model files in a zip for better file tracking.
-
-#### Local directory
-
-Use `--model-path` option to load model from a local directory or an archive file.
-
-Mac/Linux
-
-```
--p /home/ubuntu/models/pytorch_resnet18
-or
--p /home/ubuntu/models/pytorch_resnet18.zip
-```
-
-Windows
-
-```
--p C:\models\pytorch_resnet18
-or
--p C:\models\pytorch_resnet18.zip
-```
-
-If the model file name is different from the parent folder name (or the archive file name), you need
-to specify `--model-name` in the `--args`:
-
-```
--n traced_resnet18
-```
-
-### Step 3: Define how many runs you would like to make
-
-add `-c` inside with a number
-
-```
--c 1000
-```
-
-This will run 1000 times inference.
-
-### Step 4: Define your model inputs
-
-The benchmark script uses dummy NDArray inputs.
-It will make fake NDArrays (like `NDArray.ones`) to feed in the model for inference.
-
-If we would like to fake an image:
-
-```
--s 1,3,224,224
-```
-
-This will create a NDArray (DataType FLOAT32) of shape(1, 3, 224, 224).
-
-If your model requires multiple inputs like three NDArrays with shape 1, 384 and 384. You can do the followings:
-
-```
--s (1),(384),(384)
-```
-
-If you input `DataType` is not FLOAT32, you can specify the data type with suffix:
-
-- f: FLOAT32, this is default and is optional
-- s: FLOAT16 (short float)
-- d: FLOAT64 (double)
-- u: UINT8 (unsigned byte)
-- b: INT8 (byte)
-- i: INT32 (int)
-- l: INT64 (long)
-- B: BOOLEAN (boolean)
-
-For example:
-
-```
--s (1)i,(384)f,(384)
-```
-
-### Optional Step: multithreading inference
-
-You can also do multi-threading inference with DJL. For example, if you would like to run the inference with 10 threads:
-
-```
--t 10
-```
-
-Best thread number for your system: The same number of cores your system have or double of the total cores.
-
-You can also add `-l` to simulate the increment load for your inference server. It will add threads with the delay of time.
-
-```
--t 10 -l 100
-```
-
-The above code will create 10 threads with the wait time of 100ms.
-
-## Advanced use cases
-
-For different purposes, we designed different mode you can play with. Such as the following arg:
-
-```
--d 86400
-```
-
-This will ask the benchmark script repeatedly running the designed task for 86400 seconds (24 hour).
-If you would like to make sure DJL is stable in the long run, you can do that.
-
-You can also keep monitoring the DJL memory usages by enable the following flag:
-
-```
-export BENCHMARK_OPTS="-Dcollect-memory=true"
-```
-
-The memory report will be made available in `build/memory.log`.
+**This module has been moved to [deepjavalibrary/djl-serving/benchmark](https://github.com/deepjavalibrary/djl-serving/tree/master/benchmark).**
diff --git a/extensions/benchmark/build.gradle b/extensions/benchmark/build.gradle
deleted file mode 100644
index fe3a54aa332..00000000000
--- a/extensions/benchmark/build.gradle
+++ /dev/null
@@ -1,151 +0,0 @@
-plugins {
-    id 'application'
-    id "nebula.ospackage" version "9.0.0"
-}
-
-boolean isRelease = project.hasProperty("release") || project.hasProperty("staging")
-
-dependencies {
-    implementation "commons-cli:commons-cli:${commons_cli_version}"
-    implementation "org.apache.logging.log4j:log4j-slf4j-impl:${log4j_slf4j_version}"
-    if (isRelease) {
-        implementation platform("ai.djl:bom:${djl_version}")
-
-        implementation "ai.djl:model-zoo"
-        runtimeOnly "ai.djl.pytorch:pytorch-model-zoo"
-        runtimeOnly "ai.djl.tensorflow:tensorflow-model-zoo"
-        runtimeOnly "ai.djl.mxnet:mxnet-model-zoo"
-        runtimeOnly "ai.djl.paddlepaddle:paddlepaddle-model-zoo"
-        runtimeOnly "ai.djl.onnxruntime:onnxruntime-engine"
-        runtimeOnly "ai.djl.tflite:tflite-engine"
-        runtimeOnly "ai.djl.dlr:dlr-engine"
-        runtimeOnly "ai.djl.ml.xgboost:xgboost"
-        runtimeOnly "ai.djl.python:python"
-        runtimeOnly "ai.djl.tensorrt:tensorrt"
-    } else {
-        implementation project(":model-zoo")
-
-        runtimeOnly project(":engines:pytorch:pytorch-model-zoo")
-        runtimeOnly project(":engines:tensorflow:tensorflow-model-zoo")
-        runtimeOnly project(":engines:mxnet:mxnet-model-zoo")
-        runtimeOnly project(":engines:paddlepaddle:paddlepaddle-model-zoo")
-
-        runtimeOnly project(":engines:tflite:tflite-engine")
-        runtimeOnly project(":engines:tensorrt")
-        ProcessBuilder pb = new ProcessBuilder("nvidia-smi", "-L")
-        def hasGPU = false;
-        try {
-            Process process = pb.start()
-            hasGPU = process.waitFor() == 0
-        } catch (IOException ignore) {
-        }
-
-        if (hasGPU) {
-            runtimeOnly(project(":engines:onnxruntime:onnxruntime-engine")) {
-                exclude group: "com.microsoft.onnxruntime", module: "onnxruntime"
-            }
-            runtimeOnly "com.microsoft.onnxruntime:onnxruntime_gpu:${onnxruntime_version}"
-        } else {
-            runtimeOnly project(":engines:onnxruntime:onnxruntime-engine")
-        }
-
-        runtimeOnly project(":engines:dlr:dlr-engine")
-        runtimeOnly(project(":engines:ml:xgboost")) {
-            exclude group: "ml.dmlc", module: "xgboost4j_2.12"
-        }
-    }
-
-    testImplementation("org.testng:testng:${testng_version}") {
-        exclude group: "junit", module: "junit"
-    }
-}
-
-application {
-    mainClass = System.getProperty("main", "ai.djl.benchmark.Benchmark")
-}
-
-run {
-    environment("TF_CPP_MIN_LOG_LEVEL", "1") // turn off TensorFlow print out
-    systemProperties System.getProperties()
-    systemProperties.remove("user.dir")
-    systemProperty("file.encoding", "UTF-8")
-}
-
-task benchmark(type: JavaExec) {
-    environment("TF_CPP_MIN_LOG_LEVEL", "1") // turn off TensorFlow print out
-    List<String> arguments = gradle.startParameter["taskRequests"]["args"].getAt(0)
-    for (String argument : arguments) {
-        if (argument.trim().startsWith("--args")) {
-            String[] line = argument.split("=", 2)
-            if (line.length == 2) {
-                line = line[1].split(" ")
-                if (line.contains("-t")) {
-                    if (System.getProperty("ai.djl.default_engine") == "TensorFlow") {
-                        environment("OMP_NUM_THREADS", "1")
-                        environment("TF_NUM_INTRAOP_THREADS", "1")
-                    } else {
-                        environment("MXNET_ENGINE_TYPE", "NaiveEngine")
-                        environment("OMP_NUM_THREADS", "1")
-                    }
-                }
-                break
-            }
-        }
-    }
-
-    systemProperties System.getProperties()
-    systemProperties.remove("user.dir")
-    systemProperty("file.encoding", "UTF-8")
-    classpath = sourceSets.main.runtimeClasspath
-    // restrict the jvm heap size for better monitoring benchmark
-    jvmArgs = ["-Xmx2g"]
-    if (Boolean.getBoolean("loggc")) {
-        if (JavaVersion.current() == JavaVersion.VERSION_1_8) {
-            jvmArgs += ["-XX:+PrintGCTimeStamps", "-Xloggc:build/gc.log"]
-        } else {
-            jvmArgs += ["-Xlog:gc*=debug:file=build/gc.log"]
-        }
-    }
-    mainClass = "ai.djl.benchmark.Benchmark"
-}
-
-task createDeb(type: Deb, dependsOn: distTar) {
-    doFirst {
-        exec {
-            commandLine "tar", "xvf", "${project.buildDir}/distributions/benchmark-${project.version}.tar", "-C", "${project.buildDir}"
-        }
-    }
-
-    packageName = "djl-bench"
-    archiveVersion = "${djl_version}"
-    release = 1
-    maintainer = "Deep Java Library <djl-dev@amazon.com>"
-    summary = "djl-bench is a command line tool that allows you to benchmark the\n" +
-            "  model on all different platforms for single-thread/multi-thread\n" +
-            "  inference performance."
-
-    from("${project.buildDir}/benchmark-${project.version}") {
-        into "/usr/local/djl-bench-${djl_version}"
-    }
-    link("/usr/bin/djl-bench", "/usr/local/djl-bench-${djl_version}/bin/benchmark")
-}
-
-startScripts {
-    defaultJvmOpts = []
-    doLast {
-        String replacement = 'CLASSPATH=\\$APP_HOME/lib/*\n\n' +
-                'if [[ "\\$*" == *-t* || "\\$*" == *--threads* ]]\n' +
-                'then\n' +
-                '    export TF_CPP_MIN_LOG_LEVEL=1\n' +
-                '    export MXNET_ENGINE_TYPE=NaiveEngine\n' +
-                '    export OMP_NUM_THREADS=1\n' +
-                '    export TF_NUM_INTRAOP_THREADS=1\n' +
-                'fi'
-
-        String text = unixScript.text.replaceAll('CLASSPATH=\\$APP_HOME/lib/.*', replacement)
-        text = text.replaceAll("/usr/bin/env sh", "/usr/bin/env bash")
-        text = text.replaceAll("#!/bin/sh", "#!/bin/bash")
-
-        unixScript.text = text
-    }
-}
diff --git a/extensions/benchmark/gradle b/extensions/benchmark/gradle
deleted file mode 120000
index 1ce6c4c1ed0..00000000000
--- a/extensions/benchmark/gradle
+++ /dev/null
@@ -1 +0,0 @@
-../../gradle
\ No newline at end of file
diff --git a/extensions/benchmark/gradlew b/extensions/benchmark/gradlew
deleted file mode 120000
index 343e0d2caa4..00000000000
--- a/extensions/benchmark/gradlew
+++ /dev/null
@@ -1 +0,0 @@
-../../gradlew
\ No newline at end of file
diff --git a/extensions/benchmark/snapcraft/snapcraft.yaml b/extensions/benchmark/snapcraft/snapcraft.yaml
deleted file mode 100644
index afa85c747db..00000000000
--- a/extensions/benchmark/snapcraft/snapcraft.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-name: djlbench
-version: '0.17.0'
-title: DJL Benhmark
-license: Apache-2.0
-summary: A machine learning benchmarking toolkit
-description: |
-  djlbench is a command line tool that allows you to benchmark the
-  model on all different platforms for single-thread/multi-thread
-  inference performance.
-
-  Currently djlbench support the models from the following framework:
-  - PyTorch
-  - TensorFlow
-  - Apachmark MXNet
-  - PaddlePaddle
-  - ONNXRuntime
-  - TensorRT
-  - TensorFlow Lite
-  - Neo DLR
-  - XGBoost
-  - Python
-
-base: core18
-grade: stable
-confinement: classic
-
-apps:
-  djlbench:
-    command: benchmark-$SNAPCRAFT_PROJECT_VERSION/bin/benchmark
-    environment:
-      JAVA_HOME: "$SNAP/usr/lib/jvm/java-11-openjdk-amd64"
-      PATH: "$SNAP/bin:$PATH:$SNAP/usr/lib/jvm/java-11-openjdk-amd64/bin"
-
-parts:
-  djlbench:
-    plugin: gradle
-    source: https://github.com/deepjavalibrary/djl.git
-    source-tag: v$SNAPCRAFT_PROJECT_VERSION
-    gradle-output-dir: extensions/benchmark/build/libs
-    gradle-options: [ -Pstaging, ':extensions:benchmark:dT' ]
-    override-build: |
-      snapcraftctl build
-      tar xvf $SNAPCRAFT_PART_BUILD/extensions/benchmark/build/distributions/benchmark-*.tar -C $SNAPCRAFT_PART_INSTALL/
-      rm -rf $SNAPCRAFT_PART_INSTALL/jar
diff --git a/extensions/benchmark/src/main/java/ai/djl/benchmark/AbstractBenchmark.java b/extensions/benchmark/src/main/java/ai/djl/benchmark/AbstractBenchmark.java
deleted file mode 100644
index 8ef27cb1fe7..00000000000
--- a/extensions/benchmark/src/main/java/ai/djl/benchmark/AbstractBenchmark.java
+++ /dev/null
@@ -1,299 +0,0 @@
-/*
- * Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
- *
- * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
- * with the License. A copy of the License is located at
- *
- * http://aws.amazon.com/apache2.0/
- *
- * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
- * OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
- * and limitations under the License.
- */
-package ai.djl.benchmark;
-
-import ai.djl.Device;
-import ai.djl.ModelException;
-import ai.djl.engine.Engine;
-import ai.djl.metric.Metrics;
-import ai.djl.metric.Unit;
-import ai.djl.ndarray.NDList;
-import ai.djl.ndarray.types.DataType;
-import ai.djl.ndarray.types.Shape;
-import ai.djl.repository.zoo.Criteria;
-import ai.djl.repository.zoo.ZooModel;
-import ai.djl.training.listener.MemoryTrainingListener;
-import ai.djl.training.util.ProgressBar;
-import ai.djl.translate.NoBatchifyTranslator;
-import ai.djl.translate.TranslateException;
-import ai.djl.translate.TranslatorContext;
-import ai.djl.util.Pair;
-import ai.djl.util.PairList;
-
-import org.apache.commons.cli.CommandLine;
-import org.apache.commons.cli.DefaultParser;
-import org.apache.commons.cli.Options;
-import org.apache.commons.cli.ParseException;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import java.io.IOException;
-import java.nio.FloatBuffer;
-import java.time.Duration;
-
-/** Abstract benchmark class. */
-public abstract class AbstractBenchmark {
-
-    private static final Logger logger = LoggerFactory.getLogger(AbstractBenchmark.class);
-
-    protected ProgressBar progressBar;
-
-    /**
-     * Abstract predict method that must be implemented by sub class.
-     *
-     * @param arguments command line arguments
-     * @param metrics {@link Metrics} to collect statistic information
-     * @param iteration number of prediction iteration to run
-     * @return prediction result
-     * @throws IOException if io error occurs when loading model.
-     * @throws ModelException if specified model not found or there is a parameter error
-     * @throws TranslateException if error occurs when processing input or output
-     * @throws ClassNotFoundException if input or output class cannot be loaded
-     */
-    protected abstract float[] predict(Arguments arguments, Metrics metrics, int iteration)
-            throws IOException, ModelException, TranslateException, ClassNotFoundException;
-
-    /**
-     * Execute benchmark.
-     *
-     * @param args input raw arguments
-     * @return if example execution complete successfully
-     */
-    public final boolean runBenchmark(String[] args) {
-        Options options = Arguments.getOptions();
-        try {
-            if (Arguments.hasHelp(args)) {
-                Arguments.printHelp(
-                        "usage: djl-bench [-p MODEL-PATH] -s INPUT-SHAPES [OPTIONS]", options);
-                return true;
-            }
-            DefaultParser parser = new DefaultParser();
-            CommandLine cmd = parser.parse(options, args, null, false);
-            Arguments arguments = new Arguments(cmd);
-            String engineName = arguments.getEngine();
-            Engine engine = Engine.getEngine(engineName);
-
-            long init = System.nanoTime();
-            String version = engine.getVersion();
-            long loaded = System.nanoTime();
-            logger.info(
-                    String.format(
-                            "Load %s (%s) in %.3f ms.",
-                            engineName, version, (loaded - init) / 1_000_000f));
-            Duration duration = Duration.ofSeconds(arguments.getDuration());
-            Object devices;
-            if (this instanceof MultithreadedBenchmark) {
-                devices = engine.getDevices(arguments.getMaxGpus());
-            } else {
-                devices = engine.defaultDevice();
-            }
-
-            if (arguments.getDuration() != 0) {
-                logger.info(
-                        "Running {} on: {}, duration: {} minutes.",
-                        getClass().getSimpleName(),
-                        devices,
-                        duration.toMinutes());
-            } else {
-                logger.info("Running {} on: {}.", getClass().getSimpleName(), devices);
-            }
-            int numOfThreads = arguments.getThreads();
-            int iteration = arguments.getIteration();
-            if (this instanceof MultithreadedBenchmark) {
-                int expected = 10 * numOfThreads;
-                if (iteration < expected) {
-                    iteration = expected;
-                    logger.info(
-                            "Iteration is too small for multi-threading benchmark. Adjust to: {}",
-                            iteration);
-                }
-            }
-            while (!duration.isNegative()) {
-                Metrics metrics = new Metrics(); // Reset Metrics for each test loop.
-                progressBar = new ProgressBar("Iteration", iteration);
-                float[] lastResult = predict(arguments, metrics, iteration);
-                if (lastResult == null) {
-                    return false;
-                }
-
-                long begin = metrics.getMetric("start").get(0).getValue().longValue();
-                long end = metrics.getMetric("end").get(0).getValue().longValue();
-                long totalTime = end - begin;
-
-                if (lastResult.length > 3) {
-                    logger.info(
-                            "Inference result: [{}, {}, {} ...]",
-                            lastResult[0],
-                            lastResult[1],
-                            lastResult[2]);
-                } else {
-                    logger.info("Inference result: {}", lastResult);
-                }
-
-                String throughput = String.format("%.2f", iteration * 1000d / totalTime);
-                logger.info(
-                        "Throughput: {}, completed {} iteration in {} ms.",
-                        throughput,
-                        iteration,
-                        totalTime);
-
-                if (metrics.hasMetric("LoadModel")) {
-                    long loadModelTime =
-                            metrics.getMetric("LoadModel").get(0).getValue().longValue();
-                    logger.info(
-                            "Model loading time: {} ms.",
-                            String.format("%.3f", loadModelTime / 1000f));
-                }
-
-                if (metrics.hasMetric("Inference") && iteration > 1) {
-                    float totalP50 = metrics.percentile("Total", 50).getValue().longValue() / 1000f;
-                    float totalP90 = metrics.percentile("Total", 90).getValue().longValue() / 1000f;
-                    float totalP99 = metrics.percentile("Total", 99).getValue().longValue() / 1000f;
-                    float p50 = metrics.percentile("Inference", 50).getValue().longValue() / 1000f;
-                    float p90 = metrics.percentile("Inference", 90).getValue().longValue() / 1000f;
-                    float p99 = metrics.percentile("Inference", 99).getValue().longValue() / 1000f;
-                    float preP50 =
-                            metrics.percentile("Preprocess", 50).getValue().longValue() / 1000f;
-                    float preP90 =
-                            metrics.percentile("Preprocess", 90).getValue().longValue() / 1000f;
-                    float preP99 =
-                            metrics.percentile("Preprocess", 99).getValue().longValue() / 1000f;
-                    float postP50 =
-                            metrics.percentile("Postprocess", 50).getValue().longValue() / 1000f;
-                    float postP90 =
-                            metrics.percentile("Postprocess", 90).getValue().longValue() / 1000f;
-                    float postP99 =
-                            metrics.percentile("Postprocess", 99).getValue().longValue() / 1000f;
-                    logger.info(
-                            String.format(
-                                    "total P50: %.3f ms, P90: %.3f ms, P99: %.3f ms",
-                                    totalP50, totalP90, totalP99));
-                    logger.info(
-                            String.format(
-                                    "inference P50: %.3f ms, P90: %.3f ms, P99: %.3f ms",
-                                    p50, p90, p99));
-                    logger.info(
-                            String.format(
-                                    "preprocess P50: %.3f ms, P90: %.3f ms, P99: %.3f ms",
-                                    preP50, preP90, preP99));
-                    logger.info(
-                            String.format(
-                                    "postprocess P50: %.3f ms, P90: %.3f ms, P99: %.3f ms",
-                                    postP50, postP90, postP99));
-
-                    if (Boolean.getBoolean("collect-memory")) {
-                        float heapBeforeModel =
-                                metrics.getMetric("Heap").get(0).getValue().longValue();
-                        float heapBeforeInference =
-                                metrics.getMetric("Heap").get(1).getValue().longValue();
-                        float heap = metrics.percentile("Heap", 90).getValue().longValue();
-                        float nonHeap = metrics.percentile("NonHeap", 90).getValue().longValue();
-                        int mb = 1024 * 1024;
-                        logger.info(String.format("heap (base): %.3f MB", heapBeforeModel / mb));
-                        logger.info(
-                                String.format("heap (model): %.3f MB", heapBeforeInference / mb));
-                        logger.info(String.format("heap P90: %.3f MB", heap / mb));
-                        logger.info(String.format("nonHeap P90: %.3f MB", nonHeap / mb));
-
-                        if (!System.getProperty("os.name").startsWith("Win")) {
-                            float rssBeforeModel =
-                                    metrics.getMetric("rss").get(0).getValue().longValue();
-                            float rssBeforeInference =
-                                    metrics.getMetric("rss").get(1).getValue().longValue();
-                            float rss = metrics.percentile("rss", 90).getValue().longValue();
-                            float cpu = metrics.percentile("cpu", 90).getValue().longValue();
-                            logger.info(String.format("cpu P90: %.3f %%", cpu));
-                            logger.info(String.format("rss (base): %.3f MB", rssBeforeModel / mb));
-                            logger.info(
-                                    String.format("rss (model): %.3f MB", rssBeforeInference / mb));
-                            logger.info(String.format("rss P90: %.3f MB", rss / mb));
-                        }
-                    }
-                }
-                MemoryTrainingListener.dumpMemoryInfo(metrics, arguments.getOutputDir());
-                long delta = System.currentTimeMillis() - begin;
-                duration = duration.minus(Duration.ofMillis(delta));
-                if (!duration.isNegative()) {
-                    logger.info(duration.toMinutes() + " minutes left");
-                }
-            }
-            return true;
-        } catch (ParseException e) {
-            Arguments.printHelp(e.getMessage(), options);
-        } catch (TranslateException | ModelException | IOException | ClassNotFoundException t) {
-            logger.error("Unexpected error", t);
-        }
-        return false;
-    }
-
-    protected ZooModel<Void, float[]> loadModel(Arguments arguments, Metrics metrics, Device device)
-            throws ModelException, IOException {
-        long begin = System.nanoTime();
-        PairList<DataType, Shape> shapes = arguments.getInputShapes();
-        BenchmarkTranslator translator = new BenchmarkTranslator(shapes);
-
-        Criteria<Void, float[]> criteria =
-                Criteria.builder()
-                        .setTypes(Void.class, float[].class)
-                        .optModelUrls(arguments.getModelUrl())
-                        .optModelName(arguments.getModelName())
-                        .optEngine(arguments.getEngine())
-                        .optOptions(arguments.getModelOptions())
-                        .optArguments(arguments.getModelArguments())
-                        .optDevice(device)
-                        .optTranslator(translator)
-                        .optProgress(new ProgressBar())
-                        .build();
-
-        ZooModel<Void, float[]> model = criteria.loadModel();
-        if (device == Device.cpu() || device == Device.gpu()) {
-            long delta = (System.nanoTime() - begin) / 1000;
-            logger.info(
-                    "Model {} loaded in: {} ms.",
-                    model.getName(),
-                    String.format("%.3f", delta / 1000f));
-            metrics.addMetric("LoadModel", delta, Unit.MICROSECONDS);
-        }
-        return model;
-    }
-
-    private static final class BenchmarkTranslator implements NoBatchifyTranslator<Void, float[]> {
-
-        private PairList<DataType, Shape> shapes;
-
-        public BenchmarkTranslator(PairList<DataType, Shape> shapes) {
-            this.shapes = shapes;
-        }
-
-        /** {@inheritDoc} */
-        @Override
-        public NDList processInput(TranslatorContext ctx, Void input) {
-            NDList list = new NDList();
-            for (Pair<DataType, Shape> pair : shapes) {
-                DataType dataType = pair.getKey();
-                Shape shape = pair.getValue();
-                list.add(ctx.getNDManager().zeros(shape, dataType));
-            }
-            return list;
-        }
-
-        /** {@inheritDoc} */
-        @Override
-        public float[] processOutput(TranslatorContext ctx, NDList list) {
-            FloatBuffer fb = list.get(0).toByteBuffer().asFloatBuffer();
-            float[] ret = new float[fb.remaining()];
-            fb.get(ret);
-            return ret;
-        }
-    }
-}
diff --git a/extensions/benchmark/src/main/java/ai/djl/benchmark/Arguments.java b/extensions/benchmark/src/main/java/ai/djl/benchmark/Arguments.java
deleted file mode 100644
index 4a96e934fe5..00000000000
--- a/extensions/benchmark/src/main/java/ai/djl/benchmark/Arguments.java
+++ /dev/null
@@ -1,340 +0,0 @@
-/*
- * Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
- *
- * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
- * with the License. A copy of the License is located at
- *
- * http://aws.amazon.com/apache2.0/
- *
- * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
- * OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
- * and limitations under the License.
- */
-package ai.djl.benchmark;
-
-import ai.djl.Device;
-import ai.djl.engine.Engine;
-import ai.djl.ndarray.types.DataType;
-import ai.djl.ndarray.types.Shape;
-import ai.djl.util.PairList;
-
-import org.apache.commons.cli.CommandLine;
-import org.apache.commons.cli.HelpFormatter;
-import org.apache.commons.cli.Option;
-import org.apache.commons.cli.OptionGroup;
-import org.apache.commons.cli.Options;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import java.io.IOException;
-import java.nio.file.Path;
-import java.nio.file.Paths;
-import java.util.Arrays;
-import java.util.List;
-import java.util.Map;
-import java.util.concurrent.ConcurrentHashMap;
-
-/** A class represents parsed command line arguments. */
-public class Arguments {
-
-    private static final Logger logger = LoggerFactory.getLogger(Arguments.class);
-
-    private String modelUrl;
-    private String modelName;
-    private String engine;
-    private String modelOptions;
-    private String modelArguments;
-    private String outputDir;
-    private int duration;
-    private int iteration;
-    private int threads;
-    private int maxGpus;
-    private int neuronCores;
-    private int delay;
-    private PairList<DataType, Shape> inputShapes;
-
-    /**
-     * Constructs a {@code Arguments} instance.
-     *
-     * @param cmd command line options
-     */
-    Arguments(CommandLine cmd) {
-        if (cmd.hasOption("model-path")) {
-            String modelPath = cmd.getOptionValue("model-path");
-            Path path = Paths.get(modelPath);
-            try {
-                modelUrl = path.toUri().toURL().toExternalForm();
-            } catch (IOException e) {
-                throw new IllegalArgumentException("Invalid model-path: " + modelPath, e);
-            }
-        } else if (cmd.hasOption("model-url")) {
-            modelUrl = cmd.getOptionValue("model-url");
-        }
-
-        modelName = cmd.getOptionValue("model-name");
-        modelOptions = cmd.getOptionValue("model-options");
-        modelArguments = cmd.getOptionValue("model-arguments");
-        outputDir = cmd.getOptionValue("output-dir");
-
-        if (cmd.hasOption("engine")) {
-            engine = cmd.getOptionValue("engine");
-        } else {
-            engine = Engine.getDefaultEngineName();
-        }
-
-        if (cmd.hasOption("duration")) {
-            duration = Integer.parseInt(cmd.getOptionValue("duration"));
-        }
-        iteration = 1;
-        if (cmd.hasOption("iteration")) {
-            iteration = Integer.parseInt(cmd.getOptionValue("iteration"));
-        }
-        if (cmd.hasOption("gpus")) {
-            maxGpus = Integer.parseInt(cmd.getOptionValue("gpus"));
-            if (maxGpus < 0) {
-                maxGpus = Integer.MAX_VALUE;
-            }
-        } else {
-            maxGpus = Integer.MAX_VALUE;
-        }
-        if (cmd.hasOption("neuron-cores")) {
-            neuronCores = Integer.parseInt(cmd.getOptionValue("neuron-cores"));
-        }
-        if (cmd.hasOption("threads")) {
-            threads = Integer.parseInt(cmd.getOptionValue("threads"));
-            Engine eng = Engine.getEngine(engine);
-            Device[] devices = eng.getDevices(maxGpus);
-            if (devices[0].isGpu()) {
-                // one thread per GPU
-                if (threads <= 0) {
-                    threads = devices.length;
-                } else if (threads < devices.length) {
-                    threads = devices.length;
-                    logger.warn(
-                            "Number of threads is less than GPU count, adjust to: {}",
-                            devices.length);
-                } else if ("MXNet".equals(engine) && threads > devices.length) {
-                    threads = devices.length;
-                    logger.warn("MXNet inference can only have one worker per GPU.");
-                } else if (threads % devices.length != 0) {
-                    threads = threads / devices.length * devices.length;
-                    logger.warn("threads should be multiple of GPU count, change to: {}", threads);
-                }
-            } else if (threads <= 0) {
-                threads = Runtime.getRuntime().availableProcessors();
-            }
-        }
-        if (cmd.hasOption("delay")) {
-            delay = Integer.parseInt(cmd.getOptionValue("delay"));
-        }
-
-        String shape = cmd.getOptionValue("input-shapes");
-        inputShapes = NDListGenerator.parseShape(shape);
-    }
-
-    static Options getOptions() {
-        Options options = new Options();
-        options.addOption(
-                Option.builder("h").longOpt("help").hasArg(false).desc("Print this help.").build());
-        OptionGroup artifactGroup = new OptionGroup();
-        artifactGroup.setRequired(true);
-        artifactGroup.addOption(
-                Option.builder("p")
-                        .longOpt("model-path")
-                        .hasArg()
-                        .argName("MODEL-PATH")
-                        .desc("Model directory file path.")
-                        .build());
-        artifactGroup.addOption(
-                Option.builder("u")
-                        .longOpt("model-url")
-                        .hasArg()
-                        .argName("MODEL-URL")
-                        .desc("Model archive file URL.")
-                        .build());
-        options.addOptionGroup(artifactGroup);
-        options.addOption(
-                Option.builder("n")
-                        .longOpt("model-name")
-                        .hasArg()
-                        .argName("MODEL-NAME")
-                        .desc("Specify model file name.")
-                        .build());
-        options.addOption(
-                Option.builder()
-                        .longOpt("model-options")
-                        .hasArg()
-                        .argName("MODEL-OPTIONS")
-                        .desc("Specify model loading options.")
-                        .build());
-        options.addOption(
-                Option.builder()
-                        .longOpt("model-arguments")
-                        .hasArg()
-                        .argName("MODEL-ARGUMENTS")
-                        .desc("Specify model loading arguments.")
-                        .build());
-        options.addOption(
-                Option.builder("e")
-                        .longOpt("engine")
-                        .hasArg()
-                        .argName("ENGINE-NAME")
-                        .desc("Choose an Engine for the benchmark.")
-                        .build());
-        options.addOption(
-                Option.builder("s")
-                        .required()
-                        .longOpt("input-shapes")
-                        .hasArg()
-                        .argName("INPUT-SHAPES")
-                        .desc("Input data shapes for the model.")
-                        .build());
-        options.addOption(
-                Option.builder("d")
-                        .longOpt("duration")
-                        .hasArg()
-                        .argName("DURATION")
-                        .desc("Duration of the test in minutes.")
-                        .build());
-        options.addOption(
-                Option.builder("c")
-                        .longOpt("iteration")
-                        .hasArg()
-                        .argName("ITERATION")
-                        .desc("Number of total iterations.")
-                        .build());
-        options.addOption(
-                Option.builder("t")
-                        .longOpt("threads")
-                        .hasArg()
-                        .argName("NUMBER_THREADS")
-                        .desc("Number of inference threads.")
-                        .build());
-        OptionGroup deviceGroup = new OptionGroup();
-        deviceGroup.addOption(
-                Option.builder("g")
-                        .longOpt("gpus")
-                        .hasArg()
-                        .argName("NUMBER_GPUS")
-                        .desc("Number of GPUS to run multithreading inference.")
-                        .build());
-        deviceGroup.addOption(
-                Option.builder()
-                        .longOpt("neuron-cores")
-                        .hasArg()
-                        .argName("NEURON-CORES")
-                        .desc(
-                                "Number of neuron cores to run multithreading inference, See"
-                                        + " https://awsdocs-neuron.readthedocs-hosted.com.")
-                        .build());
-        options.addOptionGroup(deviceGroup);
-        options.addOption(
-                Option.builder("l")
-                        .longOpt("delay")
-                        .hasArg()
-                        .argName("DELAY")
-                        .desc("Delay of incremental threads.")
-                        .build());
-        options.addOption(
-                Option.builder("o")
-                        .longOpt("output-dir")
-                        .hasArg()
-                        .argName("OUTPUT-DIR")
-                        .desc("Directory for output logs.")
-                        .build());
-        return options;
-    }
-
-    static boolean hasHelp(String[] args) {
-        List<String> list = Arrays.asList(args);
-        return list.contains("-h") || list.contains("--help");
-    }
-
-    static void printHelp(String msg, Options options) {
-        HelpFormatter formatter = new HelpFormatter();
-        formatter.setSyntaxPrefix("");
-        formatter.setLeftPadding(1);
-        formatter.setWidth(120);
-        formatter.printHelp(msg, options);
-    }
-
-    int getDuration() {
-        return duration;
-    }
-
-    String getEngine() {
-        return engine;
-    }
-
-    String getModelUrl() {
-        return modelUrl;
-    }
-
-    String getModelName() {
-        return modelName;
-    }
-
-    Map<String, String> getModelOptions() {
-        if (modelOptions == null) {
-            return null;
-        }
-        Map<String, String> map = new ConcurrentHashMap<>();
-        for (String option : modelOptions.split(",")) {
-            String[] tokens = option.split("=", 2);
-            if (tokens.length == 2) {
-                map.put(tokens[0].trim(), tokens[1].trim());
-            } else {
-                map.put(tokens[0].trim(), "");
-            }
-        }
-        return map;
-    }
-
-    Map<String, Object> getModelArguments() {
-        if (modelArguments == null) {
-            return null;
-        }
-
-        Map<String, Object> map = new ConcurrentHashMap<>();
-        for (String option : modelArguments.split(",")) {
-            String[] tokens = option.split("=", 2);
-            if (tokens.length == 2) {
-                map.put(tokens[0].trim(), tokens[1].trim());
-            } else {
-                map.put(tokens[0].trim(), "");
-            }
-        }
-        return map;
-    }
-
-    int getIteration() {
-        return iteration;
-    }
-
-    int getThreads() {
-        return threads;
-    }
-
-    int getMaxGpus() {
-        return maxGpus;
-    }
-
-    int getNeuronCores() {
-        return neuronCores;
-    }
-
-    String getOutputDir() {
-        if (outputDir == null) {
-            outputDir = "build";
-        }
-        return outputDir;
-    }
-
-    int getDelay() {
-        return delay;
-    }
-
-    PairList<DataType, Shape> getInputShapes() {
-        return inputShapes;
-    }
-}
diff --git a/extensions/benchmark/src/main/java/ai/djl/benchmark/Benchmark.java b/extensions/benchmark/src/main/java/ai/djl/benchmark/Benchmark.java
deleted file mode 100644
index 3fad6c37798..00000000000
--- a/extensions/benchmark/src/main/java/ai/djl/benchmark/Benchmark.java
+++ /dev/null
@@ -1,119 +0,0 @@
-/*
- * Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
- *
- * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
- * with the License. A copy of the License is located at
- *
- * http://aws.amazon.com/apache2.0/
- *
- * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
- * OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
- * and limitations under the License.
- */
-package ai.djl.benchmark;
-
-import ai.djl.Device;
-import ai.djl.ModelException;
-import ai.djl.engine.Engine;
-import ai.djl.engine.EngineException;
-import ai.djl.inference.Predictor;
-import ai.djl.metric.Metrics;
-import ai.djl.metric.Unit;
-import ai.djl.repository.zoo.ZooModel;
-import ai.djl.training.listener.MemoryTrainingListener;
-import ai.djl.translate.TranslateException;
-
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import java.io.IOException;
-import java.util.Arrays;
-import java.util.List;
-
-/** A class runs single threaded benchmark. */
-public final class Benchmark extends AbstractBenchmark {
-
-    private static final Logger logger = LoggerFactory.getLogger(Benchmark.class);
-
-    /**
-     * Main entry point.
-     *
-     * @param args command line arguments
-     */
-    public static void main(String[] args) {
-        List<String> list = Arrays.asList(args);
-        try {
-            boolean success;
-            if (!list.isEmpty() && "ndlist-gen".equals(list.get(0))) {
-                success = NDListGenerator.generate(Arrays.copyOfRange(args, 1, args.length));
-            } else {
-                boolean multithreading = list.contains("-t") || list.contains("--threads");
-                configEngines(multithreading);
-                if (multithreading) {
-                    success = new MultithreadedBenchmark().runBenchmark(args);
-                } else {
-                    success = new Benchmark().runBenchmark(args);
-                }
-            }
-            if (!success) {
-                System.exit(-1); // NOPMD
-            }
-        } catch (EngineException e) {
-            String osName = System.getProperty("os.name");
-            String arch = System.getProperty("os.arch");
-            logger.warn("Engine is not supported on {}:{}.", osName, arch);
-            logger.debug("Failed to load engine", e);
-        }
-    }
-
-    /** {@inheritDoc} */
-    @Override
-    public float[] predict(Arguments arguments, Metrics metrics, int iteration)
-            throws IOException, ModelException, TranslateException {
-        Device device = Engine.getEngine(arguments.getEngine()).defaultDevice();
-        try (ZooModel<Void, float[]> model = loadModel(arguments, metrics, device)) {
-            float[] predictResult = null;
-
-            try (Predictor<Void, float[]> predictor = model.newPredictor()) {
-                predictor.predict(null); // warmup
-
-                predictor.setMetrics(metrics); // Let predictor collect metrics
-                metrics.addMetric("start", System.currentTimeMillis(), Unit.MILLISECONDS);
-                for (int i = 0; i < iteration; ++i) {
-                    predictResult = predictor.predict(null);
-
-                    progressBar.update(i);
-                    MemoryTrainingListener.collectMemoryInfo(metrics);
-                }
-                metrics.addMetric("end", System.currentTimeMillis(), Unit.MILLISECONDS);
-            }
-            return predictResult;
-        }
-    }
-
-    private static void configEngines(boolean multithreading) {
-        if (multithreading) {
-            if (System.getProperty("ai.djl.pytorch.num_interop_threads") == null) {
-                System.setProperty("ai.djl.pytorch.num_interop_threads", "1");
-            }
-            if (System.getProperty("ai.djl.pytorch.num_threads") == null) {
-                System.setProperty("ai.djl.pytorch.num_threads", "1");
-            }
-        }
-        if (System.getProperty("ai.djl.tflite.disable_alternative") == null) {
-            System.setProperty("ai.djl.tflite.disable_alternative", "true");
-        }
-        if (System.getProperty("ai.djl.dlr.disable_alternative") == null) {
-            System.setProperty("ai.djl.dlr.disable_alternative", "true");
-        }
-        if (System.getProperty("ai.djl.paddlepaddle.disable_alternative") == null) {
-            System.setProperty("ai.djl.paddlepaddle.disable_alternative", "true");
-        }
-        if (System.getProperty("ai.djl.onnx.disable_alternative") == null) {
-            System.setProperty("ai.djl.onnx.disable_alternative", "true");
-        }
-        if (System.getProperty("ai.djl.tensorrt.disable_alternative") == null) {
-            System.setProperty("ai.djl.tensorrt.disable_alternative", "true");
-        }
-    }
-}
diff --git a/extensions/benchmark/src/main/java/ai/djl/benchmark/MultithreadedBenchmark.java b/extensions/benchmark/src/main/java/ai/djl/benchmark/MultithreadedBenchmark.java
deleted file mode 100644
index 438b176d4ef..00000000000
--- a/extensions/benchmark/src/main/java/ai/djl/benchmark/MultithreadedBenchmark.java
+++ /dev/null
@@ -1,194 +0,0 @@
-/*
- * Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
- *
- * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
- * with the License. A copy of the License is located at
- *
- * http://aws.amazon.com/apache2.0/
- *
- * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
- * OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
- * and limitations under the License.
- */
-package ai.djl.benchmark;
-
-import ai.djl.Device;
-import ai.djl.ModelException;
-import ai.djl.engine.Engine;
-import ai.djl.inference.Predictor;
-import ai.djl.metric.Metrics;
-import ai.djl.metric.Unit;
-import ai.djl.repository.zoo.ZooModel;
-import ai.djl.training.listener.MemoryTrainingListener;
-import ai.djl.translate.TranslateException;
-
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.List;
-import java.util.concurrent.Callable;
-import java.util.concurrent.ExecutionException;
-import java.util.concurrent.ExecutorService;
-import java.util.concurrent.Executors;
-import java.util.concurrent.Future;
-import java.util.concurrent.atomic.AtomicInteger;
-
-/** A class runs single threaded benchmark. */
-public class MultithreadedBenchmark extends AbstractBenchmark {
-
-    private static final Logger logger = LoggerFactory.getLogger(MultithreadedBenchmark.class);
-
-    /** {@inheritDoc} */
-    @Override
-    public float[] predict(Arguments arguments, Metrics metrics, int iteration)
-            throws IOException, ModelException, TranslateException {
-
-        MemoryTrainingListener.collectMemoryInfo(metrics); // Measure memory before loading model
-
-        Engine engine = Engine.getEngine(arguments.getEngine());
-        Device[] devices = engine.getDevices(arguments.getMaxGpus());
-        int numOfThreads = arguments.getThreads();
-        int neuronCores = arguments.getNeuronCores();
-        if (neuronCores > 0) {
-            devices = new Device[neuronCores];
-            Arrays.fill(devices, Device.cpu());
-            if (numOfThreads > 1) {
-                numOfThreads = 2 * neuronCores;
-            }
-        }
-
-        int delay = arguments.getDelay();
-        AtomicInteger counter = new AtomicInteger(iteration);
-        logger.info("Multithreading inference with {} threads.", numOfThreads);
-
-        List<ZooModel<Void, float[]>> models = new ArrayList<>(devices.length);
-        List<PredictorCallable> callables = new ArrayList<>(numOfThreads);
-        for (Device device : devices) {
-            ZooModel<Void, float[]> model = loadModel(arguments, metrics, device);
-            models.add(model);
-
-            for (int i = 0; i < numOfThreads / devices.length; ++i) {
-                callables.add(new PredictorCallable(model, metrics, counter, i, i == 0));
-            }
-        }
-
-        float[] result = null;
-        ExecutorService executorService = Executors.newFixedThreadPool(numOfThreads);
-
-        MemoryTrainingListener.collectMemoryInfo(metrics); // Measure memory before worker kickoff
-
-        int successThreads = 0;
-        try {
-            for (PredictorCallable callable : callables) {
-                callable.warmup();
-            }
-
-            metrics.addMetric("start", System.currentTimeMillis(), Unit.MILLISECONDS);
-            try {
-                List<Future<float[]>> futures;
-                if (delay > 0) {
-                    futures = new ArrayList<>();
-                    for (PredictorCallable callable : callables) {
-                        futures.add(executorService.submit(callable));
-                        Thread.sleep(delay);
-                    }
-                } else {
-                    futures = executorService.invokeAll(callables);
-                }
-
-                for (Future<float[]> future : futures) {
-                    result = future.get();
-                    if (result != null) {
-                        ++successThreads;
-                    }
-                }
-            } catch (InterruptedException | ExecutionException e) {
-                logger.error("", e);
-            }
-            metrics.addMetric("end", System.currentTimeMillis(), Unit.MILLISECONDS);
-            for (PredictorCallable callable : callables) {
-                callable.close();
-            }
-        } finally {
-            executorService.shutdown();
-        }
-
-        models.forEach(ZooModel::close);
-        if (successThreads != numOfThreads) {
-            logger.error("Only {}/{} threads finished.", successThreads, numOfThreads);
-            return null;
-        }
-
-        return result;
-    }
-
-    private static class PredictorCallable implements Callable<float[]> {
-
-        private Predictor<Void, float[]> predictor;
-
-        private Metrics metrics;
-        private String workerId;
-        private boolean collectMemory;
-        private AtomicInteger counter;
-        private int total;
-        private int steps;
-
-        public PredictorCallable(
-                ZooModel<Void, float[]> model,
-                Metrics metrics,
-                AtomicInteger counter,
-                int workerId,
-                boolean collectMemory) {
-            this.predictor = model.newPredictor();
-            this.metrics = metrics;
-            this.counter = counter;
-            this.workerId = String.format("%02d", workerId);
-            this.collectMemory = collectMemory;
-            predictor.setMetrics(metrics);
-            total = counter.get();
-            if (total < 10) {
-                steps = 1;
-            } else {
-                steps = (int) Math.pow(10, (int) Math.log10(total));
-            }
-        }
-
-        /** {@inheritDoc} */
-        @Override
-        public float[] call() throws Exception {
-            float[] result = null;
-            int count = 0;
-            int remaining;
-            while ((remaining = counter.decrementAndGet()) > 0 || result == null) {
-                try {
-                    result = predictor.predict(null);
-                } catch (Exception e) {
-                    // stop immediately when we find any exception
-                    counter.set(0);
-                    throw e;
-                }
-                if (collectMemory) {
-                    MemoryTrainingListener.collectMemoryInfo(metrics);
-                }
-                int processed = total - remaining + 1;
-                logger.trace("Worker-{}: {} iteration finished.", workerId, ++count);
-                if (processed % steps == 0 || processed == total) {
-                    logger.info("Completed {} requests", processed);
-                }
-            }
-            logger.debug("Worker-{}: finished.", workerId);
-            return result;
-        }
-
-        public void warmup() throws TranslateException {
-            predictor.predict(null);
-        }
-
-        public void close() {
-            predictor.close();
-        }
-    }
-}
diff --git a/extensions/benchmark/src/main/java/ai/djl/benchmark/NDListGenerator.java b/extensions/benchmark/src/main/java/ai/djl/benchmark/NDListGenerator.java
deleted file mode 100644
index eb52178b6b5..00000000000
--- a/extensions/benchmark/src/main/java/ai/djl/benchmark/NDListGenerator.java
+++ /dev/null
@@ -1,171 +0,0 @@
-/*
- * Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.
- *
- * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
- * with the License. A copy of the License is located at
- *
- * http://aws.amazon.com/apache2.0/
- *
- * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
- * OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
- * and limitations under the License.
- */
-package ai.djl.benchmark;
-
-import ai.djl.Device;
-import ai.djl.ndarray.NDList;
-import ai.djl.ndarray.NDManager;
-import ai.djl.ndarray.types.DataType;
-import ai.djl.ndarray.types.Shape;
-import ai.djl.util.Pair;
-import ai.djl.util.PairList;
-
-import org.apache.commons.cli.CommandLine;
-import org.apache.commons.cli.DefaultParser;
-import org.apache.commons.cli.Option;
-import org.apache.commons.cli.Options;
-import org.apache.commons.cli.ParseException;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import java.io.BufferedOutputStream;
-import java.io.OutputStream;
-import java.nio.file.Files;
-import java.nio.file.Path;
-import java.nio.file.Paths;
-import java.util.Arrays;
-import java.util.regex.Matcher;
-import java.util.regex.Pattern;
-
-/** A class generates NDList files. */
-final class NDListGenerator {
-
-    private static final Logger logger = LoggerFactory.getLogger(NDListGenerator.class);
-
-    private NDListGenerator() {}
-
-    static boolean generate(String[] args) {
-        Options options = getOptions();
-        try {
-            if (Arguments.hasHelp(args)) {
-                Arguments.printHelp(
-                        "usage: djl-bench ndlist-gen -s INPUT-SHAPES -o OUTPUT_FILE", options);
-                return true;
-            }
-            DefaultParser parser = new DefaultParser();
-            CommandLine cmd = parser.parse(options, args, null, false);
-            String inputShapes = cmd.getOptionValue("input-shapes");
-            String output = cmd.getOptionValue("output-file");
-            boolean ones = cmd.hasOption("ones");
-            Path path = Paths.get(output);
-
-            try (NDManager manager = NDManager.newBaseManager(Device.cpu(), "PyTorch")) {
-                NDList list = new NDList();
-                for (Pair<DataType, Shape> pair : parseShape(inputShapes)) {
-                    DataType dataType = pair.getKey();
-                    Shape shape = pair.getValue();
-                    if (ones) {
-                        list.add(manager.ones(shape, dataType));
-                    } else {
-                        list.add(manager.zeros(shape, dataType));
-                    }
-                }
-                try (OutputStream os = new BufferedOutputStream(Files.newOutputStream(path))) {
-                    list.encode(os);
-                }
-            }
-            logger.info("NDList file created: {}", path.toAbsolutePath());
-            return true;
-        } catch (ParseException e) {
-            Arguments.printHelp(e.getMessage(), options);
-        } catch (Throwable t) {
-            logger.error("Unexpected error", t);
-        }
-        return false;
-    }
-
-    static PairList<DataType, Shape> parseShape(String shape) {
-        PairList<DataType, Shape> inputShapes = new PairList<>();
-        if (shape != null) {
-            if (shape.contains("(")) {
-                Pattern pattern =
-                        Pattern.compile("\\((\\s*(\\d+)([,\\s]+\\d+)*\\s*)\\)([sdubilBfS]?)");
-                Matcher matcher = pattern.matcher(shape);
-                while (matcher.find()) {
-                    String[] tokens = matcher.group(1).split(",");
-                    long[] array = Arrays.stream(tokens).mapToLong(Long::parseLong).toArray();
-                    DataType dataType;
-                    String dataTypeStr = matcher.group(4);
-                    if (dataTypeStr == null || dataTypeStr.isEmpty()) {
-                        dataType = DataType.FLOAT32;
-                    } else {
-                        switch (dataTypeStr) {
-                            case "s":
-                                dataType = DataType.FLOAT16;
-                                break;
-                            case "d":
-                                dataType = DataType.FLOAT64;
-                                break;
-                            case "u":
-                                dataType = DataType.UINT8;
-                                break;
-                            case "b":
-                                dataType = DataType.INT8;
-                                break;
-                            case "i":
-                                dataType = DataType.INT32;
-                                break;
-                            case "l":
-                                dataType = DataType.INT64;
-                                break;
-                            case "B":
-                                dataType = DataType.BOOLEAN;
-                                break;
-                            case "f":
-                                dataType = DataType.FLOAT32;
-                                break;
-                            default:
-                                throw new IllegalArgumentException("Invalid input-shape: " + shape);
-                        }
-                    }
-                    inputShapes.add(dataType, new Shape(array));
-                }
-            } else {
-                String[] tokens = shape.split(",");
-                long[] shapes = Arrays.stream(tokens).mapToLong(Long::parseLong).toArray();
-                inputShapes.add(DataType.FLOAT32, new Shape(shapes));
-            }
-        }
-        return inputShapes;
-    }
-
-    private static Options getOptions() {
-        Options options = new Options();
-        options.addOption(
-                Option.builder("h").longOpt("help").hasArg(false).desc("Print this help.").build());
-        options.addOption(
-                Option.builder("s")
-                        .required()
-                        .longOpt("input-shapes")
-                        .hasArg()
-                        .argName("INPUT-SHAPES")
-                        .desc("Input data shapes for the model.")
-                        .build());
-        options.addOption(
-                Option.builder("o")
-                        .required()
-                        .longOpt("output-file")
-                        .hasArg()
-                        .argName("OUTPUT-FILE")
-                        .desc("Write output NDList to file.")
-                        .build());
-        options.addOption(
-                Option.builder("1")
-                        .longOpt("ones")
-                        .hasArg(false)
-                        .argName("ones")
-                        .desc("Use all ones instead of zeros.")
-                        .build());
-        return options;
-    }
-}
diff --git a/extensions/benchmark/src/main/java/ai/djl/benchmark/package-info.java b/extensions/benchmark/src/main/java/ai/djl/benchmark/package-info.java
deleted file mode 100644
index 6436a24fe15..00000000000
--- a/extensions/benchmark/src/main/java/ai/djl/benchmark/package-info.java
+++ /dev/null
@@ -1,15 +0,0 @@
-/*
- * Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
- *
- * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
- * with the License. A copy of the License is located at
- *
- * http://aws.amazon.com/apache2.0/
- *
- * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
- * OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
- * and limitations under the License.
- */
-
-/** Contains benchmarking utility classes. */
-package ai.djl.benchmark;
diff --git a/extensions/benchmark/src/main/resources/log4j2.xml b/extensions/benchmark/src/main/resources/log4j2.xml
deleted file mode 100644
index cb6d3c6fbee..00000000000
--- a/extensions/benchmark/src/main/resources/log4j2.xml
+++ /dev/null
@@ -1,20 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<Configuration status="INFO">
-  <Appenders>
-    <Console name="console" target="SYSTEM_OUT">
-      <PatternLayout
-          pattern="[%-5level] - %msg%n"/>
-    </Console>
-  </Appenders>
-  <Loggers>
-    <Root level="info" additivity="false">
-      <AppenderRef ref="console"/>
-    </Root>
-    <Logger name="ai.djl" level="${sys:ai.djl.logging.level:-info}" additivity="false">
-      <AppenderRef ref="console"/>
-    </Logger>
-    <Logger name="ai.djl.repository.zoo" level="${sys:ai.djl.modelzoo.logging.level:-info}" additivity="false">
-      <AppenderRef ref="console"/>
-    </Logger>
-  </Loggers>
-</Configuration>
diff --git a/extensions/benchmark/src/test/java/ai/djl/benchmark/BenchmarkTest.java b/extensions/benchmark/src/test/java/ai/djl/benchmark/BenchmarkTest.java
deleted file mode 100644
index 3b53e6fc8dc..00000000000
--- a/extensions/benchmark/src/test/java/ai/djl/benchmark/BenchmarkTest.java
+++ /dev/null
@@ -1,123 +0,0 @@
-/*
- * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
- *
- * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
- * with the License. A copy of the License is located at
- *
- * http://aws.amazon.com/apache2.0/
- *
- * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
- * OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
- * and limitations under the License.
- */
-package ai.djl.benchmark;
-
-import ai.djl.ndarray.types.DataType;
-
-import org.apache.commons.cli.CommandLine;
-import org.apache.commons.cli.DefaultParser;
-import org.apache.commons.cli.Options;
-import org.apache.commons.cli.ParseException;
-import org.testng.Assert;
-import org.testng.annotations.Test;
-
-import java.net.MalformedURLException;
-import java.nio.file.Paths;
-import java.util.Map;
-
-public class BenchmarkTest {
-
-    @Test
-    public void testHelp() {
-        String[] args = {"-h"};
-        Benchmark.main(args);
-    }
-
-    @Test
-    public void testArguments() throws ParseException, MalformedURLException {
-        Options options = Arguments.getOptions();
-        DefaultParser parser = new DefaultParser();
-
-        String[] args = {
-            "-p",
-            "/opt/ml/resnet18_v1",
-            "-s",
-            "(1)s,(1)d,(1)u,(1)b,(1)i,(1)l,(1)B,(1)",
-            "--model-options",
-            "fp16,dlaCore=1",
-            "--model-arguments",
-            "width=28"
-        };
-        CommandLine cmd = parser.parse(options, args, null, false);
-        Arguments arguments = new Arguments(cmd);
-        String expected = Paths.get("/opt/ml/resnet18_v1").toUri().toURL().toString();
-        Assert.assertEquals(arguments.getModelUrl(), expected);
-        DataType[] types = arguments.getInputShapes().keyArray(new DataType[0]);
-        Assert.assertEquals(types[0], DataType.FLOAT16);
-        Assert.assertEquals(types[1], DataType.FLOAT64);
-        Assert.assertEquals(types[2], DataType.UINT8);
-        Assert.assertEquals(types[3], DataType.INT8);
-        Assert.assertEquals(types[4], DataType.INT32);
-        Assert.assertEquals(types[5], DataType.INT64);
-        Assert.assertEquals(types[6], DataType.BOOLEAN);
-        Assert.assertEquals(types[7], DataType.FLOAT32);
-
-        Assert.assertThrows(
-                IllegalArgumentException.class,
-                () -> {
-                    String[] arg = {"-p", "/opt/ml/resnet18_v1", "-s", "(1)S"};
-                    CommandLine commandLine = parser.parse(options, arg, null, false);
-                    new Arguments(commandLine);
-                });
-
-        Map<String, String> map = arguments.getModelOptions();
-        Assert.assertEquals(map.get("dlaCore"), "1");
-        Assert.assertTrue(map.containsKey("fp16"));
-
-        Map<String, Object> modelArguments = arguments.getModelArguments();
-        Assert.assertEquals(modelArguments.get("width"), "28");
-    }
-
-    @Test
-    public void testBenchmark() {
-        String[] args = {
-            "-e",
-            "PyTorch",
-            "-u",
-            "djl://ai.djl.pytorch/resnet/0.0.1/traced_resnet18",
-            "-s",
-            "1,3,224,224",
-            "-c",
-            "2"
-        };
-        new Benchmark().runBenchmark(args);
-    }
-
-    @Test
-    public void testMultithreadedBenchmark() {
-        System.setProperty("collect-memory", "true");
-        try {
-            String[] args = {
-                "-e",
-                "PyTorch",
-                "-u",
-                "djl://ai.djl.pytorch/resnet/0.0.1/traced_resnet18",
-                "-s",
-                "(1,3,224,224)f",
-                "-d",
-                "1",
-                "-l",
-                "1",
-                "-c",
-                "2",
-                "-t",
-                "-1",
-                "-g",
-                "-1"
-            };
-            Benchmark.main(args);
-        } finally {
-            System.clearProperty("collect-memory");
-        }
-    }
-}
diff --git a/extensions/benchmark/src/test/java/ai/djl/benchmark/NDListGeneratorTest.java b/extensions/benchmark/src/test/java/ai/djl/benchmark/NDListGeneratorTest.java
deleted file mode 100644
index 50964a9a1f5..00000000000
--- a/extensions/benchmark/src/test/java/ai/djl/benchmark/NDListGeneratorTest.java
+++ /dev/null
@@ -1,46 +0,0 @@
-/*
- * Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.
- *
- * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
- * with the License. A copy of the License is located at
- *
- * http://aws.amazon.com/apache2.0/
- *
- * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
- * OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
- * and limitations under the License.
- */
-package ai.djl.benchmark;
-
-import org.testng.Assert;
-import org.testng.annotations.Test;
-
-public class NDListGeneratorTest {
-
-    @Test
-    public void testHelp() {
-        String[] args = {"ndlist-gen", "-h"};
-        Benchmark.main(args);
-    }
-
-    @Test
-    public void testMissingOptions() {
-        String[] args = {"ndlist-gen", "-s"};
-        boolean success = NDListGenerator.generate(args);
-        Assert.assertFalse(success);
-    }
-
-    @Test
-    public void testOnes() {
-        String[] args = {"ndlist-gen", "-s", "1", "-o", "build/ones.ndlist", "-1"};
-        boolean success = NDListGenerator.generate(args);
-        Assert.assertTrue(success);
-    }
-
-    @Test
-    public void testZeros() {
-        String[] args = {"ndlist-gen", "-s", "1", "-o", "build/ones.ndlist"};
-        boolean success = NDListGenerator.generate(args);
-        Assert.assertTrue(success);
-    }
-}
diff --git a/extensions/benchmark/src/test/java/ai/djl/benchmark/package-info.java b/extensions/benchmark/src/test/java/ai/djl/benchmark/package-info.java
deleted file mode 100644
index fd842219c53..00000000000
--- a/extensions/benchmark/src/test/java/ai/djl/benchmark/package-info.java
+++ /dev/null
@@ -1,15 +0,0 @@
-/*
- * Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.
- *
- * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
- * with the License. A copy of the License is located at
- *
- * http://aws.amazon.com/apache2.0/
- *
- * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
- * OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
- * and limitations under the License.
- */
-
-/** Contains tests for the benchmark module. */
-package ai.djl.benchmark;
diff --git a/settings.gradle b/settings.gradle
index 859c5362eff..412daf4fee3 100644
--- a/settings.gradle
+++ b/settings.gradle
@@ -27,7 +27,6 @@ include ':engines:tflite:tflite-native'
 include ':examples'
 include 'extensions:audio'
 include ':extensions:aws-ai'
-include ':extensions:benchmark'
 include ':extensions:fasttext'
 include ':extensions:hadoop'
 include ':extensions:opencv'
diff --git a/tools/gradle/release.gradle b/tools/gradle/release.gradle
index 5645ad5fe0d..85662d8b1f1 100644
--- a/tools/gradle/release.gradle
+++ b/tools/gradle/release.gradle
@@ -59,7 +59,6 @@ task increaseFinalVersion {
         collection += fileTree(".").filter {
             it.name.endsWith(".md") || it.name.endsWith("overview.html")
         }
-        collection += file("extensions/benchmark/snapcraft/snapcraft.yaml")
 
         collection.each { File file ->
             file.text = file.text.replaceAll("/${previousVersion}/", "/${djl_version}/")