Skip to content

Commit 2ff523d

Browse files
authored
Update README
1 parent b13d649 commit 2ff523d

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# Programming Language Translator
22

3-
This repository is build on the released Pytorch original implementation of TransCoder in [Unsupervised Translation of Programming Languages](https://arxiv.org/pdf/2006.03511.pdf) by Facebook Research in their [organization] (https://github.com/facebookresearch/TransCoder)
3+
This repository is build on the released Pytorch original implementation of TransCoder in [Unsupervised Translation of Programming Languages](https://arxiv.org/pdf/2006.03511.pdf) by Facebook Research in their [organization](https://github.com/facebookresearch/TransCoder)
44

55
Facebook AI Research has announced TransCoder, a system that uses unsupervised deep-learning to convert code from one programming language to another. TransCoder was trained on more than 2.8 million open source projects and outperforms existing code translation systems that use rule-based methods.
66

77
The team described the system in a paper published on arXiv. TransCoder is inspired by other neural machine translation (NMT) systems that use deep-learning to translate text from one natural language to another and is trained only on monolingual source data. To compare the performance of the model, the Facebook team collected a validation set of 852 functions and associated unit tests in each of the system's target languages: Java, Python, and C++. Compared to existing systems, TransCoder performed better on this validation set than existing commercial solutions: by up to 33 percentage points compared to j2py, a Java-to-Python translator. Although the team restricted its work to only those three languages, they claim it can "easily be extended to most programming languages."
88

9-
TransCoder builds on advances in natural-language processing (NLP), in particular unsupervised NMT. The model uses a Transformer-based sequence-to-sequence architecture which consists of an attention-based encoder and decoder. Since obtaining a dataset for supervised learning would be difficult---it would require many pairs of equivalent code samples in both the source and target languages---the team opted to used monolingual datasets to do unsupervised learning, using three strategies. First, the model is trained on input sequences that have random tokens masked; the model must learn to predict the correct value for the masked tokens. Next, the model is trained on sequences that have been corrupted by randomly masking, shuffling, or removing tokens; the model must learn to output the corrected sequence. Finally, two version of these models are trained in parallel to do back-translation; one model learns to translate from the source to target language, and the other learns to translate back to the source.
9+
TransCoder builds on advances in natural-language processing (NLP), in particular unsupervised NMT. The model uses a Transformer-based sequence-to-sequence architecture which consists of an attention-based encoder and decoder. Since obtaining a dataset for supervised learning would be difficult. It would require many pairs of equivalent code samples in both the source and target languages, the team opted to used monolingual datasets to do unsupervised learning, using three strategies. First, the model is trained on input sequences that have random tokens masked, the model must learn to predict the correct value for the masked tokens. Next, the model is trained on sequences that have been corrupted by randomly masking, shuffling, or removing tokens, the model must learn to output the corrected sequence. Finally, two version of these models are trained in parallel to do back-translation, one model learns to translate from the source to target language, and the other learns to translate back to the source.
1010

1111
![Model](https://dl.fbaipublicfiles.com/transcoder/TransCoder_Schema.jpg)
1212
Image Source: https://arxiv.org/abs/2006.03511

0 commit comments

Comments
 (0)