You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am trying to run example classification.ipynb in Orin.
When I'm trying to run example(noting changed), build_classification_graph has error.
NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_133]]
(1) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
What should I do now? Should I changed ckpt file?
Detail
Environment(HW) & using L4T-ML docker (dustynv/l4t-ml:r35.4.1)
2023-11-03 08:19:47.515183: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2023-11-03 08:19:47.515480: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties:
name: Orin major: 8 minor: 7 memoryClockRate(GHz): 1.3
pciBusID: 0000:00:00.0
2023-11-03 08:19:47.515618: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-11-03 08:19:47.515738: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2023-11-03 08:19:47.515801: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-11-03 08:19:47.515854: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-11-03 08:19:47.515901: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2023-11-03 08:19:47.515947: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2023-11-03 08:19:47.515990: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2023-11-03 08:19:47.516230: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2023-11-03 08:19:47.516488: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2023-11-03 08:19:47.516645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0
2023-11-03 08:19:47.516764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-11-03 08:19:47.516819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0
2023-11-03 08:19:47.516865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N
2023-11-03 08:19:47.517035: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2023-11-03 08:19:47.517316: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2023-11-03 08:19:47.517584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20692 MB memory) -> physical GPU (device: 0, name: Orin, pci bus id: 0000:00:00.0, compute capability: 8.7)
INFO:tensorflow:Restoring parameters from data/inception_v2/inception_v2.ckpt
---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1365, in BaseSession._do_call(self, fn, *args)
1364 try:
-> 1365 return fn(*args)
1366 except errors.OpError as e:
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1349, in BaseSession._do_run.<locals>._run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1348 self._extend_graph()
-> 1349 return self._call_tf_sessionrun(options, feed_dict, fetch_list,
1350 target_list, run_metadata)
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1441, in BaseSession._call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
1439 def _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list,
1440 run_metadata):
-> 1441 return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
1442 fetch_list, target_list,
1443 run_metadata)
NotFoundError: 2 root error(s) found.
(0) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[{{node save/RestoreV2}}]]
[[save/RestoreV2/_133]]
(1) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[{{node save/RestoreV2}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
NotFoundError Traceback (most recent call last)
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py:1289, in Saver.restore(self, sess, save_path)
1288 else:
-> 1289 sess.run(self.saver_def.restore_op_name,
1290 {self.saver_def.filename_tensor_name: save_path})
1291 except errors.NotFoundError as err:
1292 # There are three common conditions that might cause this error:
1293 # 0. The file is missing. We ignore here, as this is checked above.
(...)
1297 # 1. The checkpoint would not be loaded successfully as is. Try to parse
1298 # it as an object-based checkpoint.
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:955, in BaseSession.run(self, fetches, feed_dict, options, run_metadata)
954 try:
--> 955 result = self._run(None, fetches, feed_dict, options_ptr,
956 run_metadata_ptr)
957 if run_metadata:
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1179, in BaseSession._run(self, handle, fetches, feed_dict, options, run_metadata)
1178 if final_fetches or final_targets or (handle and feed_dict_tensor):
-> 1179 results = self._do_run(handle, final_targets, final_fetches,
1180 feed_dict_tensor, options, run_metadata)
1181 else:
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1358, in BaseSession._do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1357 if handle is None:
-> 1358 return self._do_call(_run_fn, feeds, fetches, targets, options,
1359 run_metadata)
1360 else:
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1384, in BaseSession._do_call(self, fn, *args)
1380 message += ('\nA possible workaround: Try disabling Grappler optimizer'
1381 '\nby modifying the config for creating the session eg.'
1382 '\nsession_config.graph_options.rewrite_options.'
1383 'disable_meta_optimizer = True')
-> 1384 raise type(e)(node_def, op, message)
NotFoundError: 2 root error(s) found.
(0) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_133]]
(1) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "usr/local/lib/python3.8/dist-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 976, in launch_instance
app.start()
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelapp.py", line 712, in start
self.io_loop.start()
File "usr/local/lib/python3.8/dist-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "usr/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "usr/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue
await self.process_one()
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 499, in process_one
await dispatch(*args)
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell
await result
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 730, in execute_request
reply_content = await reply_content
File "usr/local/lib/python3.8/dist-packages/ipykernel/ipkernel.py", line 383, in do_execute
res = shell.run_cell(
File "usr/local/lib/python3.8/dist-packages/ipykernel/zmqshell.py", line 528, in run_cell
return super().run_cell(*args, **kwargs)
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2881, in run_cell
result = self._run_cell(
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2936, in _run_cell
return runner(coro)
File "usr/local/lib/python3.8/dist-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3135, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3338, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3398, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "tmp/ipykernel_8476/3250917203.py", line 1, in <cell line: 1>
frozen_graph, input_names, output_names = build_classification_graph(
File "home/adas/Repo/working/tf_trt_models/examples/classification/tf_trt_models/classification.py", line 208, in build_classification_graph
tf_saver = tf.train.Saver()
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 868, in _build
self.saver_def = self._builder._build_internal( # pylint: disable=protected-access
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 507, in _build_internal
restore_op = self._AddRestoreOps(filename_tensor, saveables,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
_, _, _op = _op_def_lib._apply_op_helper(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
return func(*args, **kwargs)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3356, in create_op
return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
ret = Operation(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
During handling of the above exception, another exception occurred:
NotFoundError Traceback (most recent call last)
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py:1300, in Saver.restore(self, sess, save_path)
1299 try:
-> 1300 names_to_keys = object_graph_key_mapping(save_path)
1301 except errors.NotFoundError:
1302 # 2. This is not an object-based checkpoint, which likely means there
1303 # is a graph mismatch. Re-raise the original error with
1304 # a helpful message (b/110263146)
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py:1618, in object_graph_key_mapping(checkpoint_path)
1617 reader = pywrap_tensorflow.NewCheckpointReader(checkpoint_path)
-> 1618 object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
1619 object_graph_proto = (trackable_object_graph_pb2.TrackableObjectGraph())
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py:915, in CheckpointReader.get_tensor(self, tensor_str)
913 from tensorflow.python.util import compat
--> 915 return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
NotFoundError: _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint file
During handling of the above exception, another exception occurred:
NotFoundError Traceback (most recent call last)
Input In [26], in <cell line: 1>()
----> 1 frozen_graph, input_names, output_names = build_classification_graph(
2 model=MODEL,
3 checkpoint=checkpoint_path,
4 num_classes=NUM_CLASSES
5 )
File /home/adas/Repo/working/tf_trt_models/examples/classification/tf_trt_models/classification.py:209, in build_classification_graph(model, checkpoint, num_classes)
207 # load checkpoint
208 tf_saver = tf.train.Saver()
--> 209 tf_saver.restore(save_path=checkpoint, sess=tf_sess)
211 # freeze graph
212 frozen_graph = tf.graph_util.convert_variables_to_constants(
213 tf_sess,
214 tf_sess.graph_def,
215 output_node_names=[output_name]
216 )
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py:1305, in Saver.restore(self, sess, save_path)
1300 names_to_keys = object_graph_key_mapping(save_path)
1301 except errors.NotFoundError:
1302 # 2. This is not an object-based checkpoint, which likely means there
1303 # is a graph mismatch. Re-raise the original error with
1304 # a helpful message (b/110263146)
-> 1305 raise _wrap_restore_error_with_msg(
1306 err, "a Variable name or other graph key that is missing")
1308 # This is an object-based checkpoint. We'll print a warning and then do
1309 # the restore.
1310 logging.warning(
1311 "Restoring an object-based checkpoint using a name-based saver. This "
1312 "may be somewhat fragile, and will re-build the Saver. Instead, "
1313 "consider loading object-based checkpoints using "
1314 "tf.train.Checkpoint().")
NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_133]]
(1) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "usr/local/lib/python3.8/dist-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 976, in launch_instance
app.start()
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelapp.py", line 712, in start
self.io_loop.start()
File "usr/local/lib/python3.8/dist-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "usr/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "usr/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue
await self.process_one()
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 499, in process_one
await dispatch(*args)
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell
await result
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 730, in execute_request
reply_content = await reply_content
File "usr/local/lib/python3.8/dist-packages/ipykernel/ipkernel.py", line 383, in do_execute
res = shell.run_cell(
File "usr/local/lib/python3.8/dist-packages/ipykernel/zmqshell.py", line 528, in run_cell
return super().run_cell(*args, **kwargs)
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2881, in run_cell
result = self._run_cell(
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2936, in _run_cell
return runner(coro)
File "usr/local/lib/python3.8/dist-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3135, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3338, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3398, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "tmp/ipykernel_8476/3250917203.py", line 1, in <cell line: 1>
frozen_graph, input_names, output_names = build_classification_graph(
File "home/adas/Repo/working/tf_trt_models/examples/classification/tf_trt_models/classification.py", line 208, in build_classification_graph
tf_saver = tf.train.Saver()
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 868, in _build
self.saver_def = self._builder._build_internal( # pylint: disable=protected-access
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 507, in _build_internal
restore_op = self._AddRestoreOps(filename_tensor, saveables,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
_, _, _op = _op_def_lib._apply_op_helper(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
return func(*args, **kwargs)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3356, in create_op
return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
ret = Operation(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
The text was updated successfully, but these errors were encountered:
Hi, I am trying to run example classification.ipynb in Orin.
When I'm trying to run example(noting changed),
build_classification_graph
has error.What should I do now? Should I changed ckpt file?
Detail
Environment(HW) & using L4T-ML docker (dustynv/l4t-ml:r35.4.1)
Code
Full Erorr
The text was updated successfully, but these errors were encountered: