Releases: Qihoo360/tensornet
v0.2.1
Full Changelog: v0.2.0...v0.2.1
Changes:
- Addition of CVM Plugin:
- Based on the CVM plugin in PCTR-DNN, we have integrated and tested its functionality within our TN environment. The plugin extends embeddings by incorporating exposure and click data of features into the training process. During the forward propagation, while retrieving feature embeddings from the sparse table, the exposure and click values of the features are also fetched and included as two additional output columns. In the backward propagation, the show/click counts for the batch are propagated back to the sparse table along with the gradients.
- Normalization Enhancements with PCTRDNN Statistics Logic:
- We have introduced new statistical logic in the normalization process. This includes calculating incremental counts and sums, as well as computing the incremental variance as (data - mean).sqrt().
- Removal of Environment Variable Control for Sparse Initialization:
- To reduce performance overhead, we have eliminated the use of environment variables for controlling the initialization of sparse data structures.
v0.2.0
Full Changelog: v0.1.3...v0.2.0
Add Global Normalization
Code: normalization_layer.py
Changes:
-
Use Cumulative Sum and Sum of Squares Across Samples:
- Instead of relying solely on mean and variance within a single batch, the normalization now utilizes the cumulative sum and sum of squares across all samples.
-
Synchronize Across Nodes After Accumulating Partial Batches:
- After accumulating a portion of batches, the statistics are synchronized across all nodes to ensure consistency.
-
Update BN Table Statistics Per Batch:
- For each batch, the statistics in the Batch Normalization (BN) table are updated. The overall mean and variance are computed, and the results are outputted accordingly.
-
Store Mean and Variance in Checkpoint:
- The mean and variance are stored within the checkpoint, serving as parameters for prediction.
- The statistics are also stored in HDFS for future reference and usage.
Ref
Local Testing
0.1.3.post2-tool
Full Changelog: v0.1.3.post1...0.1.3.post2-tool
新增 qihoo-tensornet-tool pip package, 主要包括
- 合并sparse: sparse文件分散在不同层级下, 合并成一个hdfs目录下的单个或多个文件
- sparse/dense的并行度修改: 旧版在并行度确定后无法更改, 不能实现扩缩容.
- 合并外部embedding: 通过传入 已有sign与新sign值的对应关系, 实现新增embedding
新增功能都通过脚本手动提交, 暂不支持日常训练嵌入
v0.1.3.post1
Full Changelog: v0.1.3...v0.1.3.post1
v0.1.3
What's Changed
- Format tensornet build env
- Format release pipeline
- Add deleteByShow for longtail embeddings
- Compat input format and file pattern
- Add sequence_embedding_features.py
- Provide tool to merge sparse table
Full Changelog: 0.1.1...v0.1.3
tensornet-0.1.1
enhance:
- optimize parameter push and pull performance
- compatible with tf-2.3, tf-2.4
- support sparse table save with name.
- dd feature drop show threshold and update show decay with moving avg
add:
- add ftrl optimizer
- add deepfm demo.
delete:
- delete
version
field in sparse table.
bug fix:
- fix model reload warning bug
tensornet-0.1.0
TensorNet V-0.1.0
First version of TensorNet.
In this version we have published tensornet with async train mode support which we have tested completely.
the main API contained:
-
tn.distribute.PsStrategy.
tn.distribute.PsStrategy
with same interface with tensorflow's strategy in order to cluster management. -
tn.feature_column.category_column
tn.feature_column.category_column
is one of the important API of tensornet, with which we could define sparse feature column support dimension close to 2**64.tn.feature_column.category_column
has the same interface withtf.feature_column
. -
tn.layers.EmbeddingFeatures
tn.layers.EmbeddingFeatures
is the second important API in tensornet, in which we pull and push sparse embeddiong vector from parameter server. -
tn.optimizer.Optimizer
We wrapped tensorflow optimizer intn.optimizer.Optimizer
, this is mainly used in asyc train mode, in which we intercept tensorflow's gradients update logic, and update gradients in parameter server asynchronously. -
tn.model.Model
We inherited keras.layers.model and override its save model method to support save and load sparse feature in parameter server.