Skip to content

Commit c5f6ade

Browse files
committed
feat: Refactored structure, handled missing comma error and added doc
1 parent 1eaac76 commit c5f6ade

20 files changed

+11694
-117
lines changed

Cargo.lock

-10
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

+18-5
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,19 @@
1-
[workspace]
2-
members = ["crates/*"]
3-
resolver = "2"
1+
[package]
2+
name = "spanned_json_parser"
3+
version = "0.1.0"
4+
edition = "2021"
45

5-
[profile.release]
6-
debug = true
6+
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
7+
8+
[dependencies]
9+
nom = "7"
10+
memchr = "2.6.4"
11+
serde = "1.0.190"
12+
bytecount = "0.6.7"
13+
14+
[dev-dependencies]
15+
criterion = { version = "0.5", features = ["html_reports"] }
16+
17+
[[bench]]
18+
name = "parsing"
19+
harness = false

LICENSE

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Copyright 2023 Jules Guesnon
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4+
5+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6+
7+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md

+77-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,79 @@
11
# Spanned Json Parser
22

3-
WIP
3+
This crate is a json parser that will return span information for values, which mean lines and column number. It is also compatible with [serde](https://serde.rs/) so you can serialize it to any other struct that implements [Deserialize](https://docs.rs/serde/latest/serde/de/trait.Deserialize.html)
4+
5+
## Why use it ?
6+
7+
One of the main use case is to do validation after parsing. By having the line and col number, you can tell really precisely to user where a value is invalid
8+
9+
## How to use it ?
10+
11+
The crate expose a `Value` that is similar to [serde](https://docs.rs/serde_json/latest/serde_json/value/enum.Value.html), and wraps everything into this struct:
12+
13+
```rust
14+
pub struct Position {
15+
pub col: usize,
16+
pub line: usize,
17+
}
18+
19+
pub struct SpannedValue {
20+
pub value: Value,
21+
pub start: Position,
22+
pub end: Position,
23+
}
24+
```
25+
26+
### Parsing
27+
28+
```rust
29+
use spanned_json_parse::parse;
30+
use std::fs;
31+
32+
fn main() {
33+
let json = fs::read_to_string(path).unwrap();
34+
35+
let parsed = parse(&json);
36+
37+
println!("Parsed: {:#?}", parsed);
38+
}
39+
```
40+
41+
### Serializing in a struct
42+
43+
```rust
44+
use serde::Deserialize;
45+
use spanned_json_parser::parse;
46+
47+
#[derive(Deserialize)]
48+
struct Test {
49+
pub hello: String,
50+
}
51+
52+
fn main() {
53+
let json = r#"{"hello": "world"}"#;
54+
55+
let parsed = parse(json).unwrap();
56+
57+
let test: Test = serde_json::from_value(serde_json::to_value(parsed).unwrap()).unwrap();
58+
59+
println!("Test hello: {}", test.hello);
60+
}
61+
```
62+
63+
## Performance
64+
65+
Here are the outputs of the benchmark. Everything was tested on a Macbook Pro M1, so keep in mind that this numbers are here to give you an idea of the performance, but might not be representative of the reality:
66+
67+
```
68+
Parser ./benches/data/twitter.json
69+
time: [10.220 ms 10.279 ms 10.334 ms]
70+
thrpt: [58.280 MiB/s 58.589 MiB/s 58.932 MiB/s]
71+
72+
Parser ./benches/data/citm_catalog.json
73+
time: [18.204 ms 18.281 ms 18.353 ms]
74+
thrpt: [89.752 MiB/s 90.102 MiB/s 90.486 MiB/s]
75+
76+
Parser ./benches/data/canada.json
77+
time: [42.026 ms 42.188 ms 42.341 ms]
78+
thrpt: [50.702 MiB/s 50.886 MiB/s 51.082 MiB/s]
79+
```
File renamed without changes.
File renamed without changes.

benches/data/largest.json

+11,352
Large diffs are not rendered by default.
File renamed without changes.
File renamed without changes.

crates/cli/Cargo.toml cli/Cargo.toml

+1-3
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,4 @@ edition = "2021"
66
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
77

88
[dependencies]
9-
spanned_json_parser = { path = "../parser" }
10-
serde = "1.0.190"
11-
serde_json = "1.0"
9+
spanned_json_parser = { path = ".." }

crates/cli/src/main.rs cli/src/main.rs

+1-6
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
use std::{env, fs, process, time::Instant};
1+
use std::{env, fs, process};
22

33
fn main() {
44
let args: Vec<String> = env::args().collect();
@@ -13,13 +13,8 @@ fn main() {
1313
}
1414
};
1515

16-
let start = Instant::now();
17-
println!("Starting parsing: {:?}", start);
18-
19-
// let parsed = serde_json::from_str::<serde_json::Value>(&json);
2016
let parsed = spanned_json_parser::parse(&json);
2117

22-
println!("Ended parsing: {:?}", start.elapsed());
2318
match parsed {
2419
Ok(_) => process::exit(0),
2520
Err(_) => process::exit(1),

crates/parser/Cargo.toml

-20
This file was deleted.

crates/parser/src/lib.rs

-12
This file was deleted.

crates/parser/src/error.rs src/error.rs

+6-5
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
11
use crate::{parser::Span, value::Position};
22
use nom::error::{ErrorKind, FromExternalError, ParseError};
3-
use std::num::IntErrorKind;
43
use std::num::ParseFloatError;
54
use std::num::ParseIntError;
65

76
#[derive(Debug, PartialEq)]
87
pub enum Kind {
98
MissingQuote,
109
MissingArrayBracket,
10+
MissingComma,
1111
MissingObjectBracket,
1212
InvalidKey(String),
13+
MissingChar(char),
1314
MissingColon,
1415
CharsAfterRoot(String),
1516
NotAnHex(String),
@@ -53,14 +54,14 @@ impl Default for Error {
5354
}
5455

5556
impl From<ParseIntError> for Error {
56-
fn from(value: ParseIntError) -> Self {
57+
fn from(_value: ParseIntError) -> Self {
5758
let position = Position::default();
5859
Self::new(position.clone(), position, Kind::NotANumber)
5960
}
6061
}
6162

6263
impl From<ParseFloatError> for Error {
63-
fn from(value: ParseFloatError) -> Self {
64+
fn from(_value: ParseFloatError) -> Self {
6465
let position = Position::default();
6566

6667
Self::new(position.clone(), position, Kind::NotANumber)
@@ -78,7 +79,7 @@ impl<'a> ParseError<Span<'a>> for Error {
7879
}
7980
}
8081

81-
fn append(input: Span<'a>, kind: ErrorKind, other: Self) -> Self {
82+
fn append(input: Span<'a>, kind: ErrorKind, _other: Self) -> Self {
8283
let pos = Position::from(input);
8384

8485
Self {
@@ -90,7 +91,7 @@ impl<'a> ParseError<Span<'a>> for Error {
9091
}
9192

9293
impl<'a, T> FromExternalError<Span<'a>, T> for Error {
93-
fn from_external_error(input: Span<'a>, kind: ErrorKind, e: T) -> Self {
94+
fn from_external_error(input: Span<'a>, _kind: ErrorKind, _e: T) -> Self {
9495
let position = Position::from(input);
9596

9697
Self::new(position.clone(), position, Kind::ToBeDefined)

crates/parser/src/input.rs src/input.rs

+2-2
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ use nom::{
55
InputTakeAtPosition, Offset, ParseTo, Slice,
66
};
77
use std::{
8-
ops::{Range, RangeFrom, RangeFull, RangeTo},
8+
ops::{Range, RangeFrom, RangeTo},
99
str::{CharIndices, Chars, FromStr},
1010
};
1111

@@ -149,7 +149,7 @@ impl<'a> InputTakeAtPosition for Input<'a> {
149149
fn split_at_position1<P, E: nom::error::ParseError<Self>>(
150150
&self,
151151
predicate: P,
152-
e: nom::error::ErrorKind,
152+
_e: nom::error::ErrorKind,
153153
) -> nom::IResult<Self, Self, E>
154154
where
155155
P: Fn(Self::Item) -> bool,

src/lib.rs

+94
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
#![forbid(unsafe_code)]
2+
#![warn(clippy::all)]
3+
#![allow(clippy::needless_doctest_main)]
4+
//! This crate is a json parser that will return span information for values, which mean lines and column number. It is also compatible with [serde](https://serde.rs/) so you can serialize it to any other struct that implements [Deserialize](https://docs.rs/serde/latest/serde/de/trait.Deserialize.html)
5+
//!
6+
//! ## Why use it ?
7+
//!
8+
//! One of the main use case is to do validation after parsing. By having the line and col number, you can tell really precisely to user where a value is invalid
9+
//!
10+
//! ## How to use it ?
11+
//!
12+
//! The crate expose a `Value` that is similar to [serde](https://docs.rs/serde_json/latest/serde_json/value/enum.Value.html), and wraps everything into this struct:
13+
//!
14+
//! ```rust
15+
//! pub struct Position {
16+
//! pub col: usize,
17+
//! pub line: usize,
18+
//! }
19+
//!
20+
//! pub struct SpannedValue {
21+
//! pub value: Value,
22+
//! pub start: Position,
23+
//! pub end: Position,
24+
//! }
25+
//! ```
26+
//!
27+
//! ### Parsing
28+
//!
29+
//! ```rust
30+
//! use spanned_json_parse::parse;
31+
//! use std::fs;
32+
//!
33+
//! fn main() {
34+
//! let json = fs::read_to_string("path").unwrap();
35+
//!
36+
//! let parsed = parse(&json);
37+
//!
38+
//! println!("Parsed: {:#?}", parsed);
39+
//! }
40+
//! ```
41+
//!
42+
//! ### Serializing in a struct
43+
//!
44+
//! ```rust
45+
//! use serde::Deserialize;
46+
//! use spanned_json_parser::parse;
47+
//!
48+
//! #[derive(Deserialize)]
49+
//! struct Test {
50+
//! pub hello: String,
51+
//! }
52+
//!
53+
//! fn main() {
54+
//! let json = r#"{"hello": "world"}"#;
55+
//!
56+
//! let parsed = parse(json).unwrap();
57+
//!
58+
//! let test: Test = serde_json::from_value(serde_json::to_value(parsed).unwrap()).unwrap();
59+
//!
60+
//! println!("Test hello: {}", test.hello);
61+
//! }
62+
//! ```
63+
//!
64+
//! ## Performance
65+
//!
66+
//! Here are the outputs of the benchmark. Everything was tested on a Macbook Pro M1, so keep in mind that this numbers are here to give you an idea of the performance, but might not be representative of the reality:
67+
//!
68+
//! ```
69+
//! Parser ./benches/data/twitter.json
70+
//! time: [10.220 ms 10.279 ms 10.334 ms]
71+
//! thrpt: [58.280 MiB/s 58.589 MiB/s 58.932 MiB/s]
72+
//!
73+
//! Parser ./benches/data/citm_catalog.json
74+
//! time: [18.204 ms 18.281 ms 18.353 ms]
75+
//! thrpt: [89.752 MiB/s 90.102 MiB/s 90.486 MiB/s]
76+
//!
77+
//! Parser ./benches/data/canada.json
78+
//! time: [42.026 ms 42.188 ms 42.341 ms]
79+
//! thrpt: [50.702 MiB/s 50.886 MiB/s 51.082 MiB/s]
80+
//! ```
81+
82+
extern crate bytecount;
83+
extern crate memchr;
84+
extern crate nom;
85+
extern crate serde;
86+
87+
mod input;
88+
mod parser;
89+
mod ser;
90+
91+
pub mod error;
92+
pub mod value;
93+
94+
pub use parser::parse;

0 commit comments

Comments
 (0)