Skip to content

Latest commit

 

History

History
executable file
·
48 lines (34 loc) · 1.6 KB

File metadata and controls

executable file
·
48 lines (34 loc) · 1.6 KB

Overview

This example shows how to execute the spark wordcount example on the BigInsights cluster. The wordcount is performed on an Apache 2 License file.

Developer experience

Developers will gain the most from these examples if they are:

  • Comfortable using Windows, OS X or *nix command prompts
  • Able to read code written in a high level language such as Groovy
  • Familiar with the Gradle build tool
  • Familiar with Spark concepts

Example Requirements

Run the example

To run the examples, in a command prompt window:

  • change into the directory containing this example and run gradle to execute the example
    • ./gradlew Example (OS X / *nix)
    • gradlew.bat Example (Windows)

Output from the command will contain the wordcounts:

...
bicluster#54|: 1136
bicluster#54|limited: 4
bicluster#54|all: 3
bicluster#54|code: 1
bicluster#54|managed: 1
bicluster#54|customary: 1
bicluster#54|Works,: 2
bicluster#54|APPENDIX:: 1
...

Decomposition Instructions

The ./build.gradle script runs the example. It uses a ssh plugin to:

  • copy ./wordcount.py and ./LICENSE files to the BigInsights cluster
  • from the ssh session, use the hadoop fs command to add the LICENSE to hdfs
  • from the ssh session, execute the wordcount.py script with the pyspark command