Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
a13d02c
Create global_2015-08-18.jsonld
RalphTro Aug 18, 2025
a6d5e9b
Inserted experimantal files
RalphTro Aug 18, 2025
a7a3fc1
SW logic first draft
RalphTro Aug 18, 2025
b7cb32a
Updated comtext files
RalphTro Sep 26, 2025
470e8c7
Inserted 3 examples
RalphTro Sep 26, 2025
ac9e859
Updated TEMP script for discussion purposes
RalphTro Sep 26, 2025
76b4ea2
bare string
RalphTro Sep 26, 2025
fdee07c
added notes + context files for doc loader
Echsecutor Sep 26, 2025
a169c1b
Merge branch 'MDAF-Support' of github.com:RalphTro/epcis-event-hash-g…
Echsecutor Sep 26, 2025
0b1f7d7
Made 3 MDAF messages work in JSON-LD Playground
RalphTro Sep 29, 2025
e432635
Create masterDataAvailableForDeforestation.jsonld
RalphTro Oct 2, 2025
744e233
Create changeLog.md
RalphTro Oct 14, 2025
d5a164e
adding changelog since 1.9.3
Echsecutor Oct 17, 2025
387e96c
Added id/type (all options)
RalphTro Oct 17, 2025
0a52098
Merge branch 'MDAF-Support' of https://github.com/RalphTro/epcis-even…
RalphTro Oct 17, 2025
596cdcc
- EPCIS 2.1 change: always add implicit JSONLD context (namespace) `{…
Echsecutor Oct 17, 2025
1cda47a
Merge branch 'MDAF-Support' of github.com:RalphTro/epcis-event-hash-g…
Echsecutor Oct 17, 2025
23cbcad
fixes https://github.com/RalphTro/epcis-event-hash-generator/issues/117
Echsecutor Oct 17, 2025
5496f03
time stamp normalization difference for cbv2.0 / cbv2.1
Echsecutor Oct 17, 2025
b5c686c
removed old docker action. flake8 green
Echsecutor Oct 17, 2025
3e27c21
changing examples and expected hashes fixes https://github.com/RalphT…
Echsecutor Oct 17, 2025
739c7af
flake8
Echsecutor Oct 17, 2025
49c2564
python 3.7 and .8 no longer available for github actions
Echsecutor Oct 17, 2025
7598418
stupid yaml
Echsecutor Oct 17, 2025
11703f7
flake8 problems fixed in build pipeline. removed redundant extra flak…
Echsecutor Oct 17, 2025
f70424f
supported python versions
Echsecutor Oct 17, 2025
01eb8c7
status badge update (removed old actions)
Echsecutor Oct 17, 2025
67f457c
Adjusted propert name + recsalculated corresponding hash
RalphTro Nov 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions .cursor/notes/algorithm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# EPCIS Event Hashing Algorithm

## Core Concept

The algorithm creates syntax-agnostic hash IDs for EPCIS events by:

1. Extracting event data in canonical property order
2. Normalizing values according to strict rules
3. Concatenating into a pre-hash string
4. Applying standard hash algorithms (SHA-256, etc.)

## Canonical Property Order

Defined in `__init__.py` as `PROP_ORDER` - a nested tuple structure specifying:

- Sequence of properties to process (1-25)
- Child element ordering for complex fields
- Special handling for lists and nested structures

Key sequences:

1. `eventType` (though not explicitly in PROP_ORDER)
2. `eventTime`
3. `eventTimeZoneOffset`
4. `epcList` → `epc`
5. `parentID`
6-12. Various input/output/quantity lists
6. `action`
7. `transformationID`
15-16. `bizStep`, `disposition`
8. `persistentDisposition`
18-19. `readPoint`, `bizLocation`
20-22. Transaction/source/destination lists
9. `sensorElementList` (complex nested structure)
10. `ilmd` elements
11. User extension elements

## Key Normalization Rules

### Timestamps (`_fix_time_stamp_format` in `hash_generator.py`)

- Convert to UTC with 'Z' suffix
- Millisecond precision (pad with .000 if needed)
- Round 4th decimal place if present (5-9 rounds up)

### Numeric Values

- Remove trailing zeros
- No single quotes around numbers
- Preserve decimal precision as needed

### URI Normalization (`dl_normaliser.py`)

- Convert URN-based CBV values to GS1 Web URI format
- Expand CURIEs to full URIs
- Convert EPC URIs to canonical GS1 Digital Link
- Constrain GS1 Digital Link URIs (domain, query string, granularity)

### String Processing

- Trim leading/trailing whitespace
- UTF-8/ASCII lexical ordering for sorting
- Case-sensitive sorting

### Lists and Collections

- Sort child elements lexically
- Special type-first ordering for business transactions, sources, destinations
- Concatenate without separators (unless debugging with `join_by`)

## Implementation Files

- **`hash_generator.py`**: Core algorithm implementation

- `derive_prehashes_from_events()` - Main pre-hash generation
- `calculate_hashes_from_pre_hashes()` - Apply hash algorithms
- `_fix_time_stamp_format()` - Timestamp normalization
- `_pre_hash_from_epcis_event()` - Single event processing

- **`dl_normaliser.py`**: URI/Digital Link normalization
- **`__init__.py`**: Property order configuration (`PROP_ORDER`)

## Output Format

Final hash embedded in 'ni' URI scheme (RFC 6920):

```
ni:///{digest algorithm};{digest value}?ver={CBV version}
```

Example: `ni:///sha-256;B64HASH?ver=CBV2.0`

## Testing and Validation

Reference examples in `tests/examples/` with expected `.hashes` and `.prehashes` files for validation.
101 changes: 101 additions & 0 deletions .cursor/notes/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Code Architecture

## Data Flow

1. **File Input** → `events_from_file_reader.py`
2. **Format Detection** → Dispatch to XML or JSON parser
3. **Parsing** → `xml_to_py.py` or `json_to_py.py`
4. **Event Extraction** → Normalized Python objects
5. **Hash Generation** → `hash_generator.py`
6. **Output** → Hashes and optional pre-hashes

## Core Components

### Entry Points

- **`__main__.py`**: CLI interface with argparse
- `main()` - Primary entry point
- `command_line_parsing()` - Argument handling
- `epcis_hash_from_file()` - High-level API

### File Parsing Layer

- **`events_from_file_reader.py`**: Unified file reading interface

- `event_list_from_file()` - Main function, auto-detects format
- `_event_list_from_epcis_document_xml/json()` - Format-specific readers

- **`xml_to_py.py`**: XML EPCIS parsing

- Handles XML namespace resolution
- Converts XML structure to Python objects

- **`json_to_py.py`**: JSON-LD EPCIS parsing
- Uses PyLD for JSON-LD expansion
- Handles context resolution via `file_document_loader.py`

### Processing Layer

- **`hash_generator.py`**: Core hashing logic

- Property order traversal
- Value normalization
- Pre-hash string generation
- Hash calculation

- **`dl_normaliser.py`**: URI normalization
- GS1 Digital Link processing
- CURIE expansion
- CBV vocabulary mapping

### Support Modules

- **`context.py`**: Import path resolution for module vs script execution
- **`file_document_loader.py`**: Custom JSON-LD document loader for offline context files
- **`json_xml_model_mismatch_correction.py`**: Format compatibility fixes

## Key Design Patterns

### Modular Parsing

- Format-agnostic interface in `events_from_file_reader`
- Separate parsers for XML and JSON-LD maintain format-specific logic
- Common output format (Python objects) for downstream processing

### Configuration-Driven Processing

- `PROP_ORDER` in `__init__.py` defines canonical ordering
- Declarative structure allows easy maintenance of algorithm specification

### Error Handling

- Graceful handling of malformed timestamps
- Logging throughout for debugging
- File I/O error management

### Testing Strategy

- Extensive test data in `tests/examples/`
- Expected output files (`.hashes`, `.prehashes`) for validation
- Format equivalence testing in `tests/expected_equal/`

## Dependencies

### External Libraries

- **PyLD**: JSON-LD processing and expansion
- **python-dateutil**: Robust timestamp parsing
- **Flask**: Web interface capabilities (if needed)

### Internal Dependencies

- Tight coupling between `hash_generator.py` and property order configuration
- `dl_normaliser` used by hash generator for URI processing
- Context files in `gs1_web_voc_context_files/` for JSON-LD processing

## Extension Points

- **Custom hash algorithms**: Configurable in CLI and hash generator
- **Property order modifications**: Update `PROP_ORDER` in `__init__.py`
- **Additional normalizers**: Extend `dl_normaliser.py` or add new modules
- **Format support**: Add new parsers following existing patterns
150 changes: 150 additions & 0 deletions .cursor/notes/development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Development Guide

## Development Setup

### Requirements

- Python 3.6+ (specified in `setup.py`)
- Dependencies from `requirements.txt`:
- `python-dateutil>=2.8`
- `Flask>=1.1`
- `PyLD==2.0.3`
- `requests>=2.25`

### Installation

```bash
# Development installation
pip install -e .

# Install dependencies
pip install -r requirements.txt
```

## Testing Strategy

### Test Structure (`tests/` directory)

- **Unit Tests**: `test_*.py` files for specific functionality
- **Integration Tests**: Full workflow testing with sample data
- **Reference Data**: `examples/` and `expected_equal/` directories

### Key Test Files

- **`test_explicit_hash_values.py`**: Validates against known hash values
- **`test_xml_to_py.py`**: XML parsing functionality
- **`test_bare_string_normalisation.py`**: String processing rules
- **`test_required_properties.py`**: Required property validation
- **`test_all_values_present.py`**: Comprehensive data coverage

### Test Data Organization

- **`examples/`**: Sample EPCIS documents with expected `.hashes` and `.prehashes` outputs
- **`expected_equal/`**: Different representations of same events (should yield identical hashes)

### Running Tests

```bash
# Run all tests
python -m pytest tests/

# Run specific test
python -m pytest tests/test_explicit_hash_values.py

# Run with verbose output
python -m pytest tests/ -v
```

## Code Style and Conventions

### Documentation Standards

- **Module docstrings**: Follow format in existing files
- **Function docstrings**: Describe parameters and return values
- **Copyright headers**: Include in all new files

### Import Patterns

```python
# Conditional imports for module vs script execution
try:
from .context import epcis_event_hash_generator
except ImportError:
from context import epcis_event_hash_generator # noqa: F401
```

### Error Handling

- Use `logging` module for debug/warning messages
- Graceful handling of malformed data with appropriate warnings
- Preserve original data when normalization fails

## Current Development Focus

### MDAF Support (Current Branch)

Working on Master Data and Analytics Framework integration as indicated by branch `MDAF-Support`.

### Key Development Areas

1. **Algorithm Compliance**: Ensure strict adherence to CBV 2.0 specification
2. **Format Support**: Maintain parity between XML and JSON-LD processing
3. **Performance**: Optimize for large EPCIS documents
4. **Testing**: Comprehensive coverage of edge cases

## Contribution Guidelines

### Code Changes

1. **Algorithm modifications**: Update `PROP_ORDER` in `__init__.py` if needed
2. **New normalizers**: Add to or extend `dl_normaliser.py`
3. **Parser enhancements**: Modify format-specific parsers (`xml_to_py.py`, `json_to_py.py`)

### Testing Requirements

- Add test cases for new functionality
- Update reference data if algorithm changes
- Validate against existing test suite

### Documentation Updates

- Update `README.md` for user-facing changes
- Add to `Changelog.md` following conventions in `.cursor/rules/changelog-conventions.mdc`
- Update algorithm documentation for specification changes

## Release Process

### Package Management

- **`setup.py`**: Package configuration and metadata
- **`pypi_release.sh`**: Automated release script
- **Version numbering**: Semantic versioning in `setup.py`

### Release Checklist

1. Update version in `setup.py`
2. Update `Changelog.md` with release notes
3. Run full test suite
4. Create release via `pypi_release.sh`
5. Tag release in Git

## Debugging and Troubleshooting

### Common Issues

- **Timestamp parsing**: Check `_fix_time_stamp_format()` in `hash_generator.py`
- **URI normalization**: Debug `dl_normaliser.py` functions
- **Property ordering**: Verify `PROP_ORDER` configuration
- **Context loading**: Check JSON-LD context files in `gs1_web_voc_context_files/`

### Debug Tools

- Use `-j "\n"` CLI flag to inspect pre-hash string structure
- Enable debug logging with `-l DEBUG`
- Compare `.prehashes` output with expected values

### Performance Profiling

- Monitor memory usage with large EPCIS documents
- Profile JSON-LD expansion performance
- Optimize property traversal for complex events
Loading