CuBERT: BERT Pretrained on Programming Languages

CuBERT: BERT Pretrained on Programming Languages

CuBERT is a collection of BERT-based models trained on source code, specifically five popular programming languages: Python, Java, JavaScript, PHP, and Ruby. Developed by Google Research, CuBERT leverages the BERT architecture to understand and process source code for a variety of downstream software engineering tasks. It is designed to support classification and sequence prediction tasks on code, enabling powerful static analysis and code intelligence capabilities.

Key Features

  • CuBERT is a collection of BERT-based models trained on source code, specifically five popular programming languages: Python, Java, JavaScript, PHP, and Ruby. Developed by Google Research, CuBERT leverages the BERT architecture to understand and process source code for a variety of downstream software engineering tasks. It is designed to support classification and sequence prediction tasks on code, enabling powerful static analysis and code intelligence capabilities.