Skip to content
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.

Hostname of the data node #3

Open
vsimko opened this issue Sep 4, 2017 · 1 comment
Open

Hostname of the data node #3

vsimko opened this issue Sep 4, 2017 · 1 comment
Assignees
Labels

Comments

@vsimko
Copy link
Member

vsimko commented Sep 4, 2017

I tried to use an HDFS client to upload some local file to HDFS running in a docker container.
I'm using Javascript library webhdfs.

The problem seems to be that the HDFS "data node" has a generated hostname which is not accessible from my host Linux environment. This is what the HDFS web interface of shows:
image

Here is my setup:

mkdir mywebhdfs
cd mywebhdfs
yarn add webhdfs

# now run the code which is listed below
node webhdfs-test.js

Here is my simple script for uploading data to HDFS:

// this is the content of file: webhdfs-test.js
const WebHDFS = require('webhdfs');

const hdfs = WebHDFS.createClient({
  host: 'localhost',
  port: '50070'
});

const fs = require('fs');

const local = fs.createReadStream('/home/vlx/landuse-classification-pipeline.jpg');
const remote = hdfs.createWriteStream('/some.jpg');

local.pipe(remote);

remote.on('error', (err) => {
  console.error(err)
});

remote.on('finish', () => {
  console.log("on finish")
});

Output:

{ Error: getaddrinfo EAI_AGAIN 9d2064769c90:50075
    at Object.exports._errnoException (util.js:1050:11)
    at errnoException (dns.js:33:15)
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:73:26)
  code: 'EAI_AGAIN',
  errno: 'EAI_AGAIN',
  syscall: 'getaddrinfo',
  hostname: '9d2064769c90',
  host: '9d2064769c90',
  port: '50075' }

When I add the hostname 9d2064769c90 to my /etc/hosts, the file is successfully uploaded.

@wipatrick
Copy link
Member

wipatrick commented Sep 5, 2017

this is not a bug. it's a normal behaviour of how webhdfs works. see here.

WebHDFS returns the datanode by using its hostname configured in /etc/hosts. Generally, using the standard hdfs client there is a property inside of hdfs-site.xml that in it's default configuration is false. But you can also set it to true (see Clients use Hostnames when connecting to DataNodes)

<property>
  <name>dfs.client.use.datanode.hostname</name>
  <value>true</value>
  <description>Whether clients should use datanode hostnames when
    connecting to datanodes.
  </description>
</property>

I found no way how to configure that using WebHDFS. However, a workaround would be to deploy the Node script inside a Node container in the same docker network so that the hostname can be resolved. See #4

@vsimko: how do you need to upload files to HDFS? In any case, there are 2 options now as presented in the README.md. Since option 2 is using node, you could extend that app by adding a simple webserver that serves e.g. Dropzone.js

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants