Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulties running using MPI in some systems #7

Open
luizfelippesr opened this issue Mar 12, 2021 · 1 comment
Open

Difficulties running using MPI in some systems #7

luizfelippesr opened this issue Mar 12, 2021 · 1 comment
Labels

Comments

@luizfelippesr
Copy link
Owner

luizfelippesr commented Mar 12, 2021

@charles-jose has been encountering errors when trying to run the main program using MPI (with mpirun). Running the code in serial mode (i.e. without mpirun) works fine.

Part of the output that @charles-jose gets is shown below.

   |  \/  | __ _  __ _ _ __   ___| |_(_)_______ _ __  
   | |\/| |/ _` |/ _` | '_ \ / _ \ __| |_  / _ \ '__| 
   | |  | | (_| | (_| | | | |  __/ |_| |/ /  __/ |    
   |_|  |_|\__,_|\__, |_| |_|\___|\__|_/___\___|_|    
                |___/      
   Runnning on 36 processors
   Using global parameters file: example/example_global_parameters.in  
   Warm up run  
 0:  Galaxy 1 - Finished after  .704561 s  CPU time
   Warm up done  
   Total number of cycles 1 
   Cycle 1 
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) MPI-process 1:
  #000: ../../../src/H5F.c line 579 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: ../../../src/H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: ../../../src/H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: ../../../src/H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 11, error message = 'Resource temporarily unavailable'
    major: File accessibilty
    minor: Bad file ID accessed
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) MPI-process 3:
  #000: ../../../src/H5F.c line 579 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: ../../../src/H5Fint.c line 1168 in H5F_open(): unable to
[....]
@luizfelippesr
Copy link
Owner Author

I did not manage to succeed in reproducing the above bug, but I believe I have just discovered another one: when I run in parallel with the setting p_master_works_too = .true., I get the following error:

HDF5-DIAG: Error detected in HDF5 (1.10.4) MPI-process 0:
  #000: ../../../src/H5Dio.c line 336 in H5Dwrite(): can't write data
    major: Dataset
    minor: Write failed
  #001: ../../../src/H5Dio.c line 730 in H5D__write(): src and dest dataspaces have different number of elements selected
    major: Invalid arguments to routine
    minor: Bad value

Thus, there is some error in the code which takes care of the output of work the done by the root process (this is probably in distributor.f90).

Normally, however, the best choice would be p_master_works_too = .false., keeping the root process is busy only with disk IO operations. Therefore, I will change the default parameter setting (to p_master_works_too = .false.), and include a working in the comments of global_input_parameters.f90.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants