You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.
I tried to use an HDFS client to upload some local file to HDFS running in a docker container.
I'm using Javascript library webhdfs.
The problem seems to be that the HDFS "data node" has a generated hostname which is not accessible from my host Linux environment. This is what the HDFS web interface of shows:
Here is my setup:
mkdir mywebhdfs
cd mywebhdfs
yarn add webhdfs
# now run the code which is listed below
node webhdfs-test.js
Here is my simple script for uploading data to HDFS:
// this is the content of file: webhdfs-test.jsconstWebHDFS=require('webhdfs');consthdfs=WebHDFS.createClient({host: 'localhost',port: '50070'});constfs=require('fs');constlocal=fs.createReadStream('/home/vlx/landuse-classification-pipeline.jpg');constremote=hdfs.createWriteStream('/some.jpg');local.pipe(remote);remote.on('error',(err)=>{console.error(err)});remote.on('finish',()=>{console.log("on finish")});
Output:
{ Error: getaddrinfo EAI_AGAIN 9d2064769c90:50075
at Object.exports._errnoException (util.js:1050:11)
at errnoException (dns.js:33:15)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:73:26)
code: 'EAI_AGAIN',
errno: 'EAI_AGAIN',
syscall: 'getaddrinfo',
hostname: '9d2064769c90',
host: '9d2064769c90',
port: '50075' }
When I add the hostname 9d2064769c90 to my /etc/hosts, the file is successfully uploaded.
The text was updated successfully, but these errors were encountered:
this is not a bug. it's a normal behaviour of how webhdfs works. see here.
WebHDFS returns the datanode by using its hostname configured in /etc/hosts. Generally, using the standard hdfs client there is a property inside of hdfs-site.xml that in it's default configuration is false. But you can also set it to true (see Clients use Hostnames when connecting to DataNodes)
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
<description>Whether clients should use datanode hostnames when
connecting to datanodes.
</description>
</property>
I found no way how to configure that using WebHDFS. However, a workaround would be to deploy the Node script inside a Node container in the same docker network so that the hostname can be resolved. See #4
@vsimko: how do you need to upload files to HDFS? In any case, there are 2 options now as presented in the README.md. Since option 2 is using node, you could extend that app by adding a simple webserver that serves e.g. Dropzone.js
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I tried to use an HDFS client to upload some local file to HDFS running in a docker container.
I'm using Javascript library webhdfs.
The problem seems to be that the HDFS "data node" has a generated hostname which is not accessible from my host Linux environment. This is what the HDFS web interface of shows:
![image](https://user-images.githubusercontent.com/7080773/30029831-1f09c340-918b-11e7-8c40-a1313814eb2f.png)
Here is my setup:
Here is my simple script for uploading data to HDFS:
Output:
When I add the hostname
9d2064769c90
to my/etc/hosts
, the file is successfully uploaded.The text was updated successfully, but these errors were encountered: