Skip to content

Compress a string or dataset using a generated dictionary

License

Notifications You must be signed in to change notification settings

dcmox/tiny-string

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tiny-string

Compress a string down to a smaller length using this library. Use training data to improve the performance depending on the type of data you want to compress.

Dictionary slot length

When generating a dictionary, you can specify the slot length as the second parameter, eg:

let dict = generateDictionary(trainingData, 5) // slot length of 5

Note: The larger the slot length, the more computationally expensive it will be for generating a dictionary. If you choose a slot size larger than 6, it is recommended that you cache the dictionary for re-use.

Dictionary size

You can adjust the dictionary size of this algorithm using the 3rd parameter. Default has been increased from 128 to 896. The first 128 ASCII characters are reserved for the standard character set. String length can be reduced 40-65% with compression. True compression size (total byte size of string) will be much less (even the well known Smaz algorithm fails to note this).

Example:

let dict = generateDictionary(trainingData, 6, 2048) // slot length of 6 with 2048 entries and encoding range of 128-2176 (2048+128)

Sample usage

const jsonTrainingData: string = `
{"id":1,"name":"Ryan Peterson","country":"Northern Mariana Islands","email":"[email protected]"},
{"id":2,"name":"Judith Mason","country":"Puerto Rico","email":"[email protected]"},
{"id":3,"name":"Kenneth Berry","country":"Pakistan","email":"[email protected]"},
{"id":4,"name":"Judith Ortiz","country":"Cuba","email":"[email protected]"},
{"id":5,"name":"Adam Lewis","country":"Poland","email":"[email protected]"},
{"id":6,"name":"Angela Spencer","country":"Poland","email":"[email protected]"},
{"id":7,"name":"Jason Snyder","country":"Cambodia","email":"[email protected]"},
{"id":8,"name":"Pamela Palmer","country":"Guinea-Bissau","email":"[email protected]"},
{"id":9,"name":"Mary Graham","country":"Niger","email":"[email protected]"},
{"id":10,"name":"Christopher Brooks","country":"Trinidad and Tobago","email":"[email protected]"},
{"id":11,"name":"Anna West","country":"Nepal","email":"[email protected]"},
{"id":12,"name":"Angela Watkins","country":"Iceland","email":"[email protected]"},
{"id":13,"name":"Gregory Coleman","country":"Oman","email":"[email protected]"},
{"id":14,"name":"Andrew Hamilton","country":"Ukraine","email":"[email protected]"},
{"id":15,"name":"James Patterson","country":"Poland","email":"[email protected]"},
{"id":16,"name":"Patricia Kelley","country":"Papua New Guinea","email":"[email protected]"},
{"id":17,"name":"Annie Burton","country":"Germany","email":"[email protected]"},
{"id":18,"name":"Margaret Wilson","country":"Saudia Arabia","email":"[email protected]"},
{"id":19,"name":"Louise Harper","country":"Poland","email":"[email protected]"},
{"id":20,"name":"Henry Hunt","country":"Martinique","email":"[email protected]"}
`

const jsonSample: string = `{"id":33,"name":"John Doe","country":"United States","email":"[email protected]"}`

const jsonTrainedDictionary: string[] = generateDictionary(jsonTrainingData)
console.time('JSON trainedDict')
const jsonTrainedDictCompression: string = tinyStringCompress(jsonSample, jsonTrainedDictionary)
console.timeEnd('JSON trainedDict')

console.time('JSON trainedDict decompress')
const jsonTrainedDictDecompression: string = tinyStringDecompress(jsonTrainedDictCompression, jsonTrainedDictionary)
console.timeEnd('JSON trainedDict decompress')

console.log('Original:', redditPost + '\n')
console.log('Decompressed', jsonTrainedDictDecompression + '\n')
console.log('Compressed:', jsonTrainedDictCompression,
    Number((1 - jsonTrainedDictCompression.length / jsonSample.length) * 100).toFixed(2) + '% compressed' + '\n')
console.log('Original length', jsonSample.length + '\n')
console.log('Compressed length', jsonTrainedDictCompression.length + '\n')

About

Compress a string or dataset using a generated dictionary

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published