CuBERT: BERT Pretrained on Programming Languages

Category: Natural Language Processing

License: Apache-2.0

Model Type: Generative AI

CuBERT is a collection of BERT-based models trained on source code, specifically five popular programming languages: Python, Java, JavaScript, PHP, and Ruby. Developed by Google Research, CuBERT leverages the BERT architecture to understand and process source code for a variety of downstream software engineering tasks. It is designed to support classification and sequence prediction tasks on code, enabling powerful static analysis and code intelligence capabilities.

Key Features

CuBERT is a collection of BERT-based models trained on source code, specifically five popular programming languages: Python, Java, JavaScript, PHP, and Ruby. Developed by Google Research, CuBERT leverages the BERT architecture to understand and process source code for a variety of downstream software engineering tasks. It is designed to support classification and sequence prediction tasks on code, enabling powerful static analysis and code intelligence capabilities.

GitHub Demo Video Youtube

Similar Projects

CuBERT: BERT Pretrained on Programming Languages

Key Features

Similar Projects

PG-19: A Dataset for Long-Form Language Modeling

llmtranslate

GPT‑4o Language Translator

Transformers by Hugging Face

GPT-Neo: Open Source GPT-3 Style Language Model with TPU & GPU Support

XLNet: Generalized Autoregressive Pretraining for Language Understanding