Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
5b55fb1
Add 'row' and 'col' values to foreign key errors
Nov 2, 2016
6939bf1
Fix column number for duplicate key too
Nov 2, 2016
97bd060
Allow leading zero on dateformat 'd'
Nov 7, 2016
87904c4
fix up dMMyyyy formatting
Nov 8, 2016
d24000a
Fix foreign key row reporting and empty table checks
Dec 9, 2016
acfb679
Merge branch 'master' into feature/allow-leading-zero-on-dateformat-d
May 12, 2017
af83491
Fix error on missing foreign key table
May 12, 2017
66902a5
return duplicate key error key as tuple not string
May 12, 2017
f47c2f3
add metadata to build process
May 12, 2017
bc76c9b
Merge remote-tracking branch 'upstream/master' into v0.4.0.1
veryrusty Oct 7, 2017
cfc88b0
Merge branch 'v0.4.0.1'
veryrusty Oct 7, 2017
926d28b
Don't revalidate values for FK checks
stephent-stratdat Jun 6, 2018
0c818af
Fix referencing_columns for array values
stephent-stratdat Jun 6, 2018
ed58c28
Fix crash when no csv file specified (only schema)
stephent-stratdat Jun 6, 2018
11911e7
Merge pull request #1 from strategicdata/feature/fks-on-array
sdt Jun 7, 2018
0ae2901
Return [] not nil for empty array values
stephent-stratdat Jun 19, 2018
869bb8f
Merge pull request #4 from strategicdata/bugfix/empty-array-values
stephent-stratdat Jun 19, 2018
f1041f8
Don't iterate array values if nil
stephent-stratdat Jun 27, 2018
5fdecde
Merge pull request #5 from strategicdata/bugfix/empty-line-with-array…
stephent-stratdat Jun 28, 2018
3315a87
Add script to build the gemfile using docker
stephent-stratdat Jun 28, 2018
5a264a0
Merge pull request #6 from strategicdata/docker-build-script
stephent-stratdat Jun 28, 2018
389ac51
Bump version to 0.4.0.2 (0.4.0 + SD patches)
stephent-stratdat Jun 28, 2018
91e97af
Update ruby.yml
adamc00 Sep 13, 2019
4063a04
Update ruby.yml
adamc00 Sep 13, 2019
1188311
Testing build changes
adamc00 Sep 13, 2019
f71ac46
load_from_json -> load_from_uri, tidy up
adamc00 Sep 13, 2019
d804c4f
load_from_json -> load_from_uri, tidy up
adamc00 Sep 13, 2019
5d27914
Merge branch 'master' into master
Floppy Feb 4, 2020
097b1c1
Merge remote-tracking branch 'upstream/master'
Oct 16, 2020
af325eb
Add custom time format H:mm
Nov 23, 2020
eac596c
Bump to 0.4.0.3 (0.4.0 + SD extensions)
Nov 23, 2020
423eceb
Use ruby 2.5 to build gem
Nov 23, 2020
5542105
Create FileUrl module
stephent-stratdat Aug 12, 2021
e70550d
Convert FileUrl.path to FileUrl.file
stephent-stratdat Aug 12, 2021
9ad71d4
Remove unneeded Csvlint:: namespace prefixes
stephent-stratdat Aug 15, 2021
f931c71
Apply URI encode/decode to path<->url conversions
stephent-stratdat Aug 15, 2021
fea4eef
Merge pull request #8 from sdt/fix-filename-urls
veryrusty Aug 16, 2021
c6f7d5f
Bump to 0.4.0.4 (0.4.0 + SD extensions)
Aug 16, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .github/workflows/ruby.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: Ruby

on: [push]

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v1
- name: Set up Ruby 2.4
uses: actions/setup-ruby@v1
with:
ruby-version: 2.4.x
- name: Build and test with Rake
run: |
gem install bundler
bundle install --jobs 4 --retry 3
bundle exec rake
3 changes: 3 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
FROM ruby:2.4

WORKDIR /code
85 changes: 44 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ A ruby gem to support validating CSV files to check their syntax and contents. Y
* Validation that checks the structural formatting of a CSV file
* Validation of a delimiter-separated values (dsv) file accesible via URL, File, or an IO-style object (e.g. StringIO)
* Validation against [CSV dialects](http://dataprotocols.org/csv-dialect/)
* Validation against multiple schema standards; [JSON Table Schema](https://github.com/theodi/csvlint.rb/blob/master/README.md#json-table-schema-support) and [CSV on the Web](https://github.com/theodi/csvlint.rb/blob/master/README.md#csv-on-the-web-validation-support)
* Validation against multiple schema standards; [JSON Table Schema](https://github.com/theodi/csvlint.rb/blob/master/README.md#json-table-schema-support) and [CSV on the Web](https://github.com/theodi/csvlint.rb/blob/master/README.md#csv-on-the-web-validation-support)

## Development

Expand Down Expand Up @@ -200,60 +200,63 @@ follows JSON Table Schema with some extensions and rudinmentary [CSV on the Web

An example JSON Table Schema schema file is:

{
"fields": [
```json
{
"fields": [
{
"name": "id",
"constraints": {
"required": true,
"type": "http://www.w3.org/TR/xmlschema-2/#integer"
}
},
{
"name": "price",
"constraints": {
"required": true,
"minLength": 1
}
},
{
"name": "postcode",
"constraints": {
"required": true,
"pattern": "[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2}"
}
}
]
}
```

An equivalent CSV on the Web Metadata file is:
```json
{
"@context": "http://www.w3.org/ns/csvw",
"url": "http://example.com/example1.csv",
"tableSchema": {
"columns": [
{
"name": "id",
"constraints": {
"required": true,
"type": "http://www.w3.org/TR/xmlschema-2/#integer"
}
"required": true,
"datatype": { "base": "integer" }
},
{
"name": "price",
"constraints": {
"required": true,
"minLength": 1
}
"required": true,
"datatype": { "base": "string", "minLength": 1 }
},
{
"name": "postcode",
"constraints": {
"required": true,
"pattern": "[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2}"
}
"required": true
}
]
}

An equivalent CSV on the Web Metadata file is:

{
"@context": "http://www.w3.org/ns/csvw",
"url": "http://example.com/example1.csv",
"tableSchema": {
"columns": [
{
"name": "id",
"required": true,
"datatype": { "base": "integer" }
},
{
"name": "price",
"required": true,
"datatype": { "base": "string", "minLength": 1 }
},
{
"name": "postcode",
"required": true
}
]
}
}
}
```

Parsing and validating with a schema (of either kind):

schema = Csvlint::Schema.load_from_json(uri)
schema = Csvlint::Schema.load_from_uri(uri)
validator = Csvlint::Validator.new( "http://example.org/data.csv", nil, schema )

### CSV on the Web Validation Support
Expand Down
10 changes: 10 additions & 0 deletions build-gemfile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash -e

docker run --rm -ti -u $( id -u ):$( id -g ) -v $PWD:/build -w /build ruby:2.5 \
bash -e -c '
echo Installing dependencies
gem install -g
echo Building gemfile
gem build csvlint.gemspec
'
ls -lF *.gem
5 changes: 5 additions & 0 deletions csvlint.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ Gem::Specification.new do |spec|
spec.homepage = "https://github.com/theodi/csvlint.rb"
spec.license = "MIT"

spec.metadata = {
"git-hash" => `git show -s --pretty=format:'%H'`.strip(),
"git-desc" => `git describe --dirty --tags`.strip()
}

spec.files = `git ls-files`.split($/)
spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
Expand Down
7 changes: 7 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
version: "3"
services:
csvlint.rb:
build: .
command: rake
volumes:
- $PWD:/code
1 change: 1 addition & 0 deletions lib/csvlint.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

require 'csvlint/error_message'
require 'csvlint/error_collector'
require 'csvlint/file_url'
require 'csvlint/validate'
require 'csvlint/field'

Expand Down
3 changes: 1 addition & 2 deletions lib/csvlint/cli.rb
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,7 @@ def fetch_schema_tables(schema, options)
end
schema.tables.keys.each do |source|
begin
source = source.sub("file:","")
source = File.new( source )
source = FileUrl.file(source)
rescue Errno::ENOENT
return_error "#{source} not found"
end unless source =~ /^http(s)?/
Expand Down
2 changes: 1 addition & 1 deletion lib/csvlint/csvw/column.rb
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def validate(string_value, row=nil)
string_value = string_value || @default
if null.include? string_value
validate_required(nil, row)
values = nil
values = @separator.nil? ? nil : []
return values
else
string_values = @separator.nil? ? [string_value] : string_value.split(@separator)
Expand Down
9 changes: 6 additions & 3 deletions lib/csvlint/csvw/date_format.rb
Original file line number Diff line number Diff line change
Expand Up @@ -140,12 +140,13 @@ def parse(value)

private
FIELDS = {
"yyyy" => /(?<year>-?([1-9][0-9]{3,}|0[0-9]{3}))/,
"yyyy" => /(?<year>-?([1-9][0-9]{3}|0[0-9]{3}))/,
"MM" => /(?<month>0[1-9]|1[0-2])/,
"M" => /(?<month>[1-9]|1[0-2])/,
"dd" => /(?<day>0[1-9]|[12][0-9]|3[01])/,
"d" => /(?<day>[1-9]|[12][0-9]|3[01])/,
"d" => /(?<day>0?[1-9]|[12]?[0-9]|3[01])/,
"HH" => /(?<hour>[01][0-9]|2[0-3])/,
"H" => /(?<hour>0?[0-9]|1[0-9]|2[0-3])/,
"mm" => /(?<minute>[0-5][0-9])/,
"ss" => /([0-6][0-9])/,
"X" => /(?<timezone>Z|[-+]((0[0-9]|1[0-3])([0-5][0-9])?|14(00)?))/,
Expand All @@ -170,13 +171,15 @@ def parse(value)
"dd.MM.yyyy" => Regexp.new("^#{FIELDS["dd"]}.#{FIELDS["MM"]}.#{FIELDS["yyyy"]}$"),
"d.M.yyyy" => Regexp.new("^#{FIELDS["d"]}.#{FIELDS["M"]}.#{FIELDS["yyyy"]}$"),
"MM.dd.yyyy" => Regexp.new("^#{FIELDS["MM"]}.#{FIELDS["dd"]}.#{FIELDS["yyyy"]}$"),
"M.d.yyyy" => Regexp.new("^#{FIELDS["M"]}.#{FIELDS["d"]}.#{FIELDS["yyyy"]}$")
"M.d.yyyy" => Regexp.new("^#{FIELDS["M"]}.#{FIELDS["d"]}.#{FIELDS["yyyy"]}$"),
"dMMyyyy" => Regexp.new("^#{FIELDS["d"]}#{FIELDS["MM"]}#{FIELDS["yyyy"]}$"),
}

TIME_PATTERN_REGEXP = {
"HH:mm:ss" => Regexp.new("^#{FIELDS["HH"]}:#{FIELDS["mm"]}:(?<second>#{FIELDS["ss"]})$"),
"HHmmss" => Regexp.new("^#{FIELDS["HH"]}#{FIELDS["mm"]}(?<second>#{FIELDS["ss"]})$"),
"HH:mm" => Regexp.new("^#{FIELDS["HH"]}:#{FIELDS["mm"]}$"),
"H:mm" => Regexp.new("^#{FIELDS["H"]}:#{FIELDS["mm"]}$"),
"HHmm" => Regexp.new("^#{FIELDS["HH"]}#{FIELDS["mm"]}$")
}

Expand Down
2 changes: 1 addition & 1 deletion lib/csvlint/csvw/property_checker.rb
Original file line number Diff line number Diff line change
Expand Up @@ -454,7 +454,7 @@ def column_reference_property(type)
if value.instance_of? String
schema_url = URI.join(base_url, value).to_s
schema_base_url = schema_url
schema_ref = schema_url.start_with?("file:") ? File.new(schema_url[5..-1]) : schema_url
schema_ref = FileUrl.file(schema_url)
schema = JSON.parse( open(schema_ref).read )
schema["@id"] = schema["@id"] ? URI.join(schema_url, schema["@id"]).to_s : schema_url
if schema["@context"]
Expand Down
52 changes: 37 additions & 15 deletions lib/csvlint/csvw/table.rb
Original file line number Diff line number Diff line change
Expand Up @@ -63,26 +63,39 @@ def validate_row(values, row=nil, validate=false)
unless @primary_key.nil?
key = @primary_key.map { |column| column.validate(values[column.number - 1], row) }
colnum = if primary_key.length == 1 then primary_key[0].number else nil end
build_errors(:duplicate_key, :schema, row, colnum, key.join(","), @primary_key_values[key]) if @primary_key_values.include?(key)
build_errors(:duplicate_key, :schema, row, colnum, key, @primary_key_values[key]) if @primary_key_values.include?(key)
@primary_key_values[key] = row
end
# build a record of the unique values that are referenced by foreign keys from other tables
# so that later we can check whether those foreign keys reference these values
@foreign_key_references.each do |foreign_key|
referenced_columns = foreign_key["referenced_columns"]
key = referenced_columns.map{ |column| column.validate(values[column.number - 1], row) }
known_values = @foreign_key_reference_values[foreign_key] = @foreign_key_reference_values[foreign_key] || {}
known_values[key] = known_values[key] || []
known_values[key] << row
key = referenced_columns.map{ |column| values[column.number - 1] }
known_values = @foreign_key_reference_values[foreign_key] ||= {}
(known_values[key] ||= []) << row
end
# build a record of the references from this row to other tables
# we can't check yet whether these exist in the other tables because
# we might not have parsed those other tables
@foreign_keys.each do |foreign_key|
referencing_columns = foreign_key["referencing_columns"]
key = referencing_columns.map{ |column| column.validate(values[column.number - 1], row) }
known_values = @foreign_key_values[foreign_key] = @foreign_key_values[foreign_key] || []
known_values << key unless known_values.include?(key)
key = referencing_columns.map{ |column| values[column.number - 1] }
known_values = @foreign_key_values[foreign_key] ||= {}

if referencing_columns.length == 1 && !referencing_columns[0].separator.nil?
# This case is for an array-valued column, where each value is a
# FK. The data will look like this:
# [ [ "5", "7", "9" ] ]
# We want it like this:
# [ ["5"], ["7"], ["9"] ]
if key[0] != nil
key[0].each do |subkey|
(known_values[ [subkey] ] ||= []) << row
end
end
else
(known_values[key] ||= []) << row
end
end
end
return valid?
Expand All @@ -92,6 +105,7 @@ def validate_foreign_keys
reset
@foreign_keys.each do |foreign_key|
local = @foreign_key_values[foreign_key]
next if local.nil?
remote_table = foreign_key["referenced_table"]
remote_table.validate_foreign_key_references(foreign_key, @url, local)
@errors += remote_table.errors unless remote_table == self
Expand All @@ -102,14 +116,22 @@ def validate_foreign_keys

def validate_foreign_key_references(foreign_key, remote_url, remote)
reset
local = @foreign_key_reference_values[foreign_key]
context = { "from" => { "url" => remote_url.to_s.split("/")[-1], "columns" => foreign_key["columnReference"] }, "to" => { "url" => @url.to_s.split("/")[-1], "columns" => foreign_key["reference"]["columnReference"] }}
local = @foreign_key_reference_values[foreign_key] || {}
context = {
"from" => { "url" => remote_url.to_s.split("/")[-1], "columns" => foreign_key["columnReference"] },
"to" => { "url" => @url.to_s.split("/")[-1], "columns" => foreign_key["reference"]["columnReference"] }
}
colnum = if foreign_key["referencing_columns"].length == 1 then foreign_key["referencing_columns"][0].number else nil end
remote.each_with_index do |r,i|
if local[r]
build_errors(:multiple_matched_rows, :schema, i+1, colnum, r, context) if local[r].length > 1
else
build_errors(:unmatched_foreign_key_reference, :schema, i+1, colnum, r, context)

remote.each do |key,rows|
if not local[key]
rows.each do |row|
build_errors(:unmatched_foreign_key_reference, :schema, row, colnum, key, context)
end
elsif local[key].length > 1
rows.each do |row|
build_errors(:multiple_matched_rows, :schema, row, colnum, key, context)
end
end
end
return valid?
Expand Down
5 changes: 3 additions & 2 deletions lib/csvlint/csvw/table_group.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ def initialize(url, id: nil, tables: {}, notes: [], annotations: {}, warnings: [

def validate_header(header, table_url, strict)
reset
table_url = "file:#{File.absolute_path(table_url)}" if table_url.instance_of? File
table_url = FileUrl::url(table_url) if table_url.instance_of? File
@validated_tables[table_url] = true
table = tables[table_url]
table.validate_header(header, strict)
@errors += table.errors
Expand All @@ -32,7 +33,7 @@ def validate_header(header, table_url, strict)

def validate_row(values, row=nil, all_errors=[], table_url, validate)
reset
table_url = "file:#{File.absolute_path(table_url)}" if table_url.instance_of? File
table_url = FileUrl::url(table_url) if table_url.instance_of? File
@validated_tables[table_url] = true
table = tables[table_url]
table.validate_row(values, row, validate)
Expand Down
20 changes: 20 additions & 0 deletions lib/csvlint/file_url.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module Csvlint
module FileUrl

# Convert a path to an absolute file:// uri
def FileUrl.url(path)
URI.encode(File.expand_path(path).gsub(/^\/*/, "file:///"))
end

# Convert an file:// uri to a File
def FileUrl.file(uri)
if uri.start_with?("file:")
uri = URI.decode(uri)
uri = uri.gsub(/^file:\/*/, "/")
File.new(uri)
else
uri
end
end
end
end
8 changes: 6 additions & 2 deletions lib/csvlint/validate.rb
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,9 @@ def validate
end

def validate_stream
if (@source.nil?)
return
end
@current_line = 1
@source.each_line do |line|
break if line_limit_reached?
Expand Down Expand Up @@ -430,13 +433,14 @@ def check_foreign_keys

def locate_schema


@source_url = nil
warn_if_unsuccessful = false
case @source
when StringIO
return
when File
@source_url = "file:#{URI.encode(File.expand_path(@source))}"
@source_url = FileUrl.url(@source)
else
@source_url = @source
end
Expand All @@ -461,7 +465,7 @@ def locate_schema
template = URITemplate.new(template)
path = template.expand('url' => @source_url)
url = URI.join(@source_url, path)
url = File.new(url.to_s.sub(/^file:/, "")) if url.to_s =~ /^file:/
url = FileUrl.file(url)
schema = Schema.load_from_uri(url)
if schema.instance_of? Csvlint::Csvw::TableGroup
if schema.tables[@source_url]
Expand Down
Loading