Skip to content

Commit 3fd492c

Browse files
authored
added faq to the aws doc (Unity-Technologies#1320)
* added faq to the aws doc * added the link * added some faq and updated the temp ami id * resolved the comments, updated one of the faq along with the scriptable object update * added one other cause raise in issues * fixed line change
1 parent 8330926 commit 3fd492c

File tree

2 files changed

+106
-14
lines changed

2 files changed

+106
-14
lines changed

docs/FAQ.md

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -67,13 +67,20 @@ On Windows, you can find
6767
## Environment Connection Timeout
6868

6969
If you are able to launch the environment from `UnityEnvironment` but then
70-
receive a timeout error, there may be a number of possible causes.
70+
receive a timeout error like this:
7171

72-
* _Cause_: There may be no Brains the `Broadcast Hub` of the Academy.
73-
In this case, the environment will not attempt to communicate
74-
with python. _Solution_: Set the Brains(s) you wish to externally control
75-
through the Python API to `External` from the Unity Editor, and rebuild the
76-
environment.
72+
```
73+
UnityAgentsException: The Communicator was unable to connect. Please make sure the External process is ready to accept communication with Unity.
74+
```
75+
76+
There may be a number of possible causes:
77+
78+
* _Cause_: There may be no LearningBrain with `Control` option checked in the
79+
`Broadcast Hub` of the Academy. In this case, the environment will not attempt
80+
to communicate with python. _Solution_: Click `Add New` in your Academy's
81+
`Broadcast Hub`, and drag your LearningBrain asset into the `Brains` field,
82+
and check the `Control` toggle. Also you need to assign this LearningBrain
83+
asset to all of the Agents you wish to do training on.
7784
* _Cause_: On OSX, the firewall may be preventing communication with the
7885
environment. _Solution_: Add the built environment binary to the list of
7986
exceptions on the firewall by following
@@ -82,6 +89,8 @@ receive a timeout error, there may be a number of possible causes.
8289
_Solution_: Look into the [log
8390
files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the Unity
8491
Environment to figure what error happened.
92+
# _Cause_: You have assigned HTTP_PROXY and HTTPS_PROXY values in your
93+
environment variables. _Solution_: Remove these values and try again.
8594

8695
## Communication port {} still in use
8796

@@ -101,3 +110,7 @@ terminating. In order to address this, set `Max Steps` for either the Academy or
101110
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
102111
is possible to manually set `done` conditions for episodes from within scripts
103112
for custom episode-terminating events.
113+
114+
## Problems with training on AWS
115+
116+
Please refer to [Training on Amazon Web Service FAQ](Training-on-Amazon-Web-Service.md#faq)

docs/Training-on-Amazon-Web-Service.md

Lines changed: 87 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Service for training ML-Agents environments.
55

66
## Preconfigured AMI
77

8-
We've prepared a preconfigured AMI for you with the ID: `ami-18642967` in the
8+
We've prepared a preconfigured AMI for you with the ID: `ami-016ff5559334f8619` in the
99
`us-east-1` region. It was created as a modification of [Deep Learning AMI
1010
(Ubuntu)](https://aws.amazon.com/marketplace/pp/B077GCH38C). The AMI has been
1111
tested with p2.xlarge instance. Furthermore, if you want to train without
@@ -86,7 +86,7 @@ can display the Unity environment in the virtual environment, and train as we
8686
would on a local machine. Ensure that `headless` mode is disabled when building
8787
linux executables which use visual observations.
8888

89-
1. Install and setup Xorg:
89+
#### Install and setup Xorg:
9090

9191
```console
9292
# Install Xorg
@@ -105,11 +105,12 @@ linux executables which use visual observations.
105105
$ sudo vim /etc/X11/xorg.conf
106106
```
107107

108-
2. Update and setup Nvidia driver:
108+
#### Update and setup Nvidia driver:
109109

110110
```console
111111
# Download and install the latest Nvidia driver for ubuntu
112-
$ wget http://download.nvidia.com/XFree86/Linux-x86_64/390.67/NVIDIA-Linux-x86_64-390.67.run
112+
# Please refer to http://download.nvidia.com/XFree86/Linux-#x86_64/latest.txt
113+
$ wget http://download.nvidia.com/XFree86/Linux-x86_64/390.87/NVIDIA-Linux-x86_64-390.87.run
113114
$ sudo /bin/bash ./NVIDIA-Linux-x86_64-390.67.run --accept-license --no-questions --ui=none
114115
115116
# Disable Nouveau as it will clash with the Nvidia driver
@@ -119,13 +120,13 @@ linux executables which use visual observations.
119120
$ sudo update-initramfs -u
120121
```
121122

122-
3. Restart the EC2 instance:
123+
#### Restart the EC2 instance:
123124

124125
```console
125126
sudo reboot now
126127
```
127128

128-
4. Make sure there are no Xorg processes running:
129+
#### Make sure there are no Xorg processes running:
129130

130131
```console
131132
# Kill any possible running Xorg processes
@@ -158,7 +159,7 @@ linux executables which use visual observations.
158159
159160
```
160161

161-
5. Start X Server and make the ubuntu use X Server for display:
162+
#### Start X Server and make the ubuntu use X Server for display:
162163

163164
```console
164165
# Start the X Server, press Enter to come back to the command line
@@ -172,7 +173,7 @@ linux executables which use visual observations.
172173
$ export DISPLAY=:0
173174
```
174175

175-
6. Ensure the Xorg is correctly configured:
176+
#### Ensure the Xorg is correctly configured:
176177

177178
```console
178179
# For more information on glxgears, see ftp://www.x.org/pub/X11R6.8.1/doc/glxgears.1.html.
@@ -232,3 +233,81 @@ Headless Mode, you have to setup the X Server to enable training.)
232233
```console
233234
mlagents-learn <trainer-config-file> --env=<your_env> --train
234235
```
236+
237+
## FAQ
238+
239+
### The <Executable_Name>_Data folder hasn't been copied cover
240+
241+
If you've built your Linux executable, but forget to copy over the corresponding <Executable_Name>_Data folder, you will see error message like the following:
242+
243+
```console
244+
Set current directory to /home/ubuntu/ml-agents/ml-agents
245+
Found path: /home/ubuntu/ml-agents/ml-agents/3dball_linux.x86_64
246+
no boot config - using default values
247+
248+
(Filename: Line: 403)
249+
250+
There is no data folder
251+
```
252+
253+
### Unity Environment not responding
254+
255+
If you didn't setup X Server or hasn't launched it properly, or you didn't made your environment with external brain, or your environment somehow crashes, or you haven't `chmod +x` your Unity Environment, all of these will cause connection between Unity and Python to fail. Then you will see something like this:
256+
257+
```console
258+
Logging to /home/ubuntu/.config/unity3d/<Some_Path>/Player.log
259+
Traceback (most recent call last):
260+
File "<stdin>", line 1, in <module>
261+
File "/home/ubuntu/ml-agents/ml-agents/mlagents/envs/environment.py", line 63, in __init__
262+
aca_params = self.send_academy_parameters(rl_init_parameters_in)
263+
File "/home/ubuntu/ml-agents/ml-agents/mlagents/envs/environment.py", line 489, in send_academy_parameters
264+
return self.communicator.initialize(inputs).rl_initialization_output
265+
File "/home/ubuntu/ml-agents/ml-agents/mlagents/envs/rpc_communicator.py", line 60, in initialize
266+
mlagents.envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
267+
The environment does not need user interaction to launch
268+
The Academy and the External Brain(s) are attached to objects in the Scene
269+
The environment and the Python interface have compatible versions.
270+
```
271+
272+
It would be also really helpful to check your /home/ubuntu/.config/unity3d/<Some_Path>/Player.log to see what happens with your Unity environment.
273+
274+
### Could not launch X Server
275+
276+
When you execute:
277+
278+
```console
279+
sudo /usr/bin/X :0 &
280+
```
281+
282+
You might see something like:
283+
284+
```console
285+
X.Org X Server 1.18.4
286+
...
287+
(==) Log file: "/var/log/Xorg.0.log", Time: Thu Oct 11 21:10:38 2018
288+
(==) Using config file: "/etc/X11/xorg.conf"
289+
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
290+
(EE)
291+
Fatal server error:
292+
(EE) no screens found(EE)
293+
(EE)
294+
Please consult the The X.Org Foundation support
295+
at http://wiki.x.org
296+
for help.
297+
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
298+
(EE)
299+
(EE) Server terminated with error (1). Closing log file.
300+
```
301+
302+
And when you execute:
303+
304+
```console
305+
nvidia-smi
306+
```
307+
308+
You might see something like:
309+
310+
```console
311+
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
312+
```
313+
This means the NVIDIA's driver needs to be updated. Refer to [this section](Training-on-Amazon-Web-Service.md#update-and-setup-nvidia-driver) for more information.

0 commit comments

Comments
 (0)