tf_cnn_benchmarks

Fix crash: "No module named 'tf_keras'".

Sep 20, 2023

5996abc · Sep 20, 2023

Name	Name	Last commit message	Last commit date
parent directory ..
models	models	Update references from `variables.VariableV1` to its new location in …	Sep 19, 2023
platforms	platforms	Squash commit.	Mar 16, 2023
test_data	test_data	Squash commit.	Mar 16, 2023
README.md	README.md	Update README and fix a few TF2 compatibility issues.	Jan 16, 2020
all_reduce_benchmark.py	all_reduce_benchmark.py	Squash commit.	Mar 16, 2023
all_reduce_benchmark_test.py	all_reduce_benchmark_test.py	Squash commit.	Mar 16, 2023
allreduce.py	allreduce.py	Squash commit.	Mar 16, 2023
allreduce_test.py	allreduce_test.py	Squash commit.	Mar 16, 2023
batch_allreduce.py	batch_allreduce.py	Squash commit.	Mar 16, 2023
benchmark_cnn.py	benchmark_cnn.py	Squash commit.	Mar 16, 2023
benchmark_cnn_distributed_test.py	benchmark_cnn_distributed_test.py	Squash commit.	Mar 16, 2023
benchmark_cnn_distributed_test_runner.py	benchmark_cnn_distributed_test_runner.py	Squash commit.	Mar 16, 2023
benchmark_cnn_test.py	benchmark_cnn_test.py	Internal Code Change	Sep 19, 2023
cnn_util.py	cnn_util.py	Squash commit.	Mar 16, 2023
cnn_util_test.py	cnn_util_test.py	Squash commit.	Mar 16, 2023
coco_metric.py	coco_metric.py	Squash commit.	Mar 16, 2023
constants.py	constants.py	Squash commit.	Mar 16, 2023
convnet_builder.py	convnet_builder.py	Fix crash: "No module named 'tf_keras'".	Sep 20, 2023
datasets.py	datasets.py	Squash commit.	Mar 16, 2023
flags.py	flags.py	Squash commit.	Mar 16, 2023
leading_indicators_test.py	leading_indicators_test.py	Squash commit.	Mar 16, 2023
mlperf.py	mlperf.py	Squash commit.	Mar 16, 2023
mlperf_test.py	mlperf_test.py	Squash commit.	Mar 16, 2023
preprocessing.py	preprocessing.py	Squash commit.	Mar 16, 2023
run_tests.py	run_tests.py	Squash commit.	Mar 16, 2023
ssd_constants.py	ssd_constants.py	Squash commit.	Mar 16, 2023
ssd_dataloader.py	ssd_dataloader.py	Squash commit.	Mar 16, 2023
test_util.py	test_util.py	Squash commit.	Mar 16, 2023
tf_cnn_benchmarks.py	tf_cnn_benchmarks.py	Squash commit.	Mar 16, 2023
variable_mgr.py	variable_mgr.py	Squash commit.	Mar 16, 2023
variable_mgr_util.py	variable_mgr_util.py	Update ops.Tensor references to //third_party/tensorflow/python/frame…	Sep 19, 2023
variable_mgr_util_test.py	variable_mgr_util_test.py	Squash commit.	Mar 16, 2023

README.md

tf_cnn_benchmarks: High performance benchmarks

Note: tf_cnn_benchmarks is no longer maintained.

tf_cnn_benchmarks contains TensorFlow 1 implementations of several popular convolutional models, and is designed to be as fast as possible. tf_cnn_benchmarks supports both running on a single machine or running in distributed mode across multiple hosts.

tf_cnn_benchmarks is no longer maintained. Although it will run with TensorFlow 2, it was written and optimized for TensorFlow 1, and has not been maintained since TensorFlow 2 was released. For clean and easy-to-read TensorFlow 2 models, please see the TensorFlow Official Models.

Getting Started

To run ResNet50 with synthetic data without distortions with a single GPU, run

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet50 --variable_update=parameter_server

Note that the master branch of tf_cnn_benchmarks occasionally requires the latest nightly version of TensorFlow. You can install the nightly version by running pip install tf-nightly-gpu in a clean environment, or by installing TensorFlow from source. We sometimes will create a branch of tf_cnn_benchmarks, in the form of cnn_tf_vX.Y_compatible, that is compatible with TensorFlow version X.Y. For example, branch cnn_tf_v1.9_compatible works with TensorFlow 1.9. However, as tf_cnn_benchmarks is no longer maintained, we will likely no longer create new branches.

Some important flags are

model: Model to use, e.g. resnet50, inception3, vgg16, and alexnet.
num_gpus: Number of GPUs to use.
data_dir: Path to data to process. If not set, synthetic data is used. To use Imagenet data use these instructions as a starting point.
batch_size: Batch size for each GPU.
variable_update: The method for managing variables: parameter_server ,replicated, distributed_replicated, independent
local_parameter_device: Device to use as parameter server: cpu or gpu.

To see the full list of flags, run python tf_cnn_benchmarks.py --help.

To run ResNet50 with real data with 8 GPUs, run:

python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256 \
--model=resnet50 --optimizer=momentum --variable_update=replicated \
--nodistortions --gradient_repacking=8 --num_gpus=8 \
--num_epochs=90 --weight_decay=1e-4 --data_dir=${DATA_DIR} --use_fp16 \
--train_dir=${CKPT_DIR}

This will train a ResNet-50 model on ImageNet with 2048 batch size on 8 GPUs. The model should train to around 76% accuracy.

Running the tests

To run the tests, run

pip install portpicker
python run_tests.py && python run_tests.py --run_distributed_tests

Note the tests require portpicker.

The command above runs a subset of tests that is both fast and fairly comprehensive. Alternatively, all the tests can be run, but this will take a long time:

python run_tests.py --full_tests && python run_tests.py --full_tests --run_distributed_tests

We will run all tests on every PR before merging them, so it is not necessary to pass --full_tests when running tests yourself.

To run an individual test, such as method testParameterServer of test class TfCnnBenchmarksTest of module benchmark_cnn_test, run

python -m unittest -v benchmark_cnn_test.TfCnnBenchmarksTest.testParameterServer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

tf_cnn_benchmarks

tf_cnn_benchmarks

README.md

tf_cnn_benchmarks: High performance benchmarks

Getting Started

Running the tests

Files

tf_cnn_benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

tf_cnn_benchmarks

Folders and files

parent directory

README.md

tf_cnn_benchmarks: High performance benchmarks

Getting Started

Running the tests