Skip to content

Commit 2048111

Browse files
authored
Merge pull request #90 from HaripriyaB/split_csv_issue_63
Split csv in to multiple files
2 parents 450ba6e + a9e46cc commit 2048111

File tree

2 files changed

+69
-0
lines changed

2 files changed

+69
-0
lines changed

Python/Split_CSV/README.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Split a CSV file in to multiple smaller files
2+
### Description:
3+
The aim of this program is to take a large csv file as input and break it in to multiple smaller files based on the number of rows per file given by the user.
4+
5+
### Library used:
6+
* Python CSV
7+
8+
### Parameters used and their Significance:
9+
* `row_limit`: The number of rows you want in each output file. 10,000 by default.
10+
* `filename`: The raw input csv file name.
11+
12+
### Usage:
13+
**`>> python split_csv.py`**
14+
15+
#### I/O:
16+
```
17+
* Enter name of the csv file(csv file should be present in the pwd): $(file_name)
18+
19+
* Enter number of rows each split should contain:($num_of_rows)
20+
21+
* Output files successfully saved!
22+
```
23+
***The output files are stored in the present working directory(pwd) itself. A sample dataset.csv for example can be found [HERE](https://drive.google.com/file/d/1Q5dpNYAhfA3f_MTJE49sutBOH-pYVyEQ/view?usp=sharing).***

Python/Split_CSV/split_csv.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
import os
2+
3+
def split(filename, delimiter=',', row_limit=10000):
4+
import csv
5+
output_name_template='output_%s.csv'
6+
output_path='.'
7+
keep_headers=True
8+
output_name_template = filename + output_name_template
9+
filehandler = open(filename+".csv",'r')
10+
reader = csv.reader(filehandler, delimiter=delimiter)
11+
current_piece = 1
12+
current_out_path = os.path.join(
13+
output_path,
14+
output_name_template % current_piece
15+
)
16+
current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter)
17+
current_limit = row_limit
18+
if keep_headers:
19+
headers = reader.next()
20+
current_out_writer.writerow(headers)
21+
for i, row in enumerate(reader):
22+
if i + 1 > current_limit:
23+
current_piece += 1
24+
current_limit = row_limit * current_piece
25+
current_out_path = os.path.join(
26+
output_path,
27+
output_name_template % current_piece
28+
)
29+
current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter)
30+
if keep_headers:
31+
current_out_writer.writerow(headers)
32+
current_out_writer.writerow(row)
33+
34+
def main():
35+
file_name = raw_input("Enter name of the csv file(csv file should be present in the pwd):")
36+
num_of_rows = raw_input("Enter number of rows each split should contain:")
37+
try:
38+
split(str(file_name),row_limit = int(num_of_rows))
39+
print("Output files successfully saved!")
40+
41+
except Exception as e:
42+
print("Something went wrong in importing given file...Check below:\n")
43+
print(e)
44+
45+
if __name__ == "__main__":
46+
main()

0 commit comments

Comments
 (0)