Skip to content

Convert tab delimited files to CSV format

Nate Williams edited this page Apr 18, 2017 · 6 revisions

Convert tab-delimited files to CSV format

User problem

Karen is a data analyst with a wide range of data files stored on Amazon S3 in different formats. Since some files are tab-delimited and others are CSV, she's constantly updating her scripts to handle different formats. She wants to convert all existing tab-delimited files to a CSV format and set up a process to automatically convert newly added files to a CSV format going forward.

Solution

Create a pipe to do batch processing of tab-delimited files. Select a folder in Amazon S3 that contains the source files and a separate folder in S3 as the destination. Add a file selection step that filters input files to only include the ones in a tab-delimited format. Add a convert step to change the file format to CSV, and output the converted files to the destination folder.

Workflow

  • Select and authenticate input connection
  • Select and authenticate output connection
  • Select source folders with the files to process
  • Select destination folder where the converted files will be stored
  • Add a file filter step to select only files in a tab-delimited format
  • Add a convert step to change the file format from tab-delimited to CSV
  • Run a test of the pipe to verify the file conversion works
  • Schedule pipe to run periodically
  • Turn the pipe "on" to put it into production

Primary To-Do List

  • Add file selection filter functionality, if this doesn't current exist
  • Add "Test" button to confirm file transfer will work
  • Add control to update pipe scheduling options
  • Provide command-level feedback on whether the pipe steps are running successfully

Clone this wiki locally