Skip to content

Duplicate data on restart when reading files with whitespaces in name #218

Open
@jpazsedano

Description

@jpazsedano

Looks like a simple error when generating the sincedb file. Whitespaces in the file name are not escaped and when reading it back after a reboot, if the file name has whitespaces, it isn't read properly and the plugin loads the file from the beginning again, like if it wasn't in the sincedb file. Looks very simple, I don't try to fix it myself because I have zero knowledge of ruby.

  • Version:
    4.1.6 (logstash version 6.4.2)

  • Operating System:
    Linux (I'm using the official docker images, I think they are CentOS 7).

  • Config File (if you have sensitive info, please remove it):

input {
  file {
    path => ["/mnt/data/wrong file.csv", "/mnt/data/ok-file.csv"]
    sincedb_path => "/mnt/data/test.sincedb"

    start_position => "beginning"
  }
}

output {
  file {
    path => "/mnt/data/output.dat"
  }
}
  • Sample Data:
    After a couple of reboots you can see that the data from "wrong file.csv" keeps repeating.
{"message":"field1,field2,field3\r","host":"77a43f13b227","path":"/mnt/data/wrong file.csv","@timestamp":"2018-10-25T11:40:10.127Z","@version":"1"}
{"message":"data2-1,data2-2,data2-3\r","host":"77a43f13b227","path":"/mnt/data/wrong file.csv","@timestamp":"2018-10-25T11:40:10.193Z","@version":"1"}
{"message":"data1-1,data1-2,data1-3\r","host":"77a43f13b227","path":"/mnt/data/wrong file.csv","@timestamp":"2018-10-25T11:40:10.187Z","@version":"1"}
{"message":"field1,field2,field3\r","host":"77a43f13b227","path":"/mnt/data/ok-file.csv","@timestamp":"2018-10-25T11:40:10.415Z","@version":"1"}
{"message":"data1-1,data1-2,data1-3\r","host":"77a43f13b227","path":"/mnt/data/ok-file.csv","@timestamp":"2018-10-25T11:40:10.415Z","@version":"1"}
{"message":"data2-1,data2-2,data2-3\r","host":"77a43f13b227","path":"/mnt/data/ok-file.csv","@timestamp":"2018-10-25T11:40:10.415Z","@version":"1"}
{"path":"/mnt/data/wrong file.csv","message":"data2-1,data2-2,data2-3\r","@timestamp":"2018-10-25T11:41:27.625Z","@version":"1","host":"77a43f13b227"}
{"path":"/mnt/data/wrong file.csv","message":"field1,field2,field3\r","@timestamp":"2018-10-25T11:41:27.367Z","@version":"1","host":"77a43f13b227"}
{"path":"/mnt/data/wrong file.csv","message":"data1-1,data1-2,data1-3\r","@timestamp":"2018-10-25T11:41:27.617Z","@version":"1","host":"77a43f13b227"}
{"path":"/mnt/data/wrong file.csv","host":"9ac74fb6972b","message":"field1,field2,field3\r","@version":"1","@timestamp":"2018-10-25T11:54:21.743Z"}
{"path":"/mnt/data/wrong file.csv","host":"9ac74fb6972b","message":"data2-1,data2-2,data2-3\r","@version":"1","@timestamp":"2018-10-25T11:54:21.820Z"}
{"path":"/mnt/data/wrong file.csv","host":"9ac74fb6972b","message":"data1-1,data1-2,data1-3\r","@version":"1","@timestamp":"2018-10-25T11:54:21.806Z"}

Sincedb contents:

3940649673984149 0 113 72 1540468461.2128549 /mnt/data/ok-file.csv
5629499534248064 0 113 72 1540468461.821136 /mnt/data/wrong file.csv
  • Steps to Reproduce:
  1. Create a file with whitespaces in the path.
  2. Load it with the file plugin.
  3. Restart.
  4. Get your duplicate data.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions