Skip to content

Latest commit

 

History

History
23 lines (15 loc) · 581 Bytes

File metadata and controls

23 lines (15 loc) · 581 Bytes

CodeSearchNet dataset

Step 1: Download CSN raw dataset (~/codesearchnet/raw)

bash ncc_dataset/codesearchnet/download.sh

Step 2: Flatten attributes of code snippets into different files.

For instance, flatten ruby's code_tokens into ~/codesearchnet/attributes/[train/valid/test].code_tokens.

python -m ncc_dataset.codesearchnet.attributes_cast

Step 3 (optional): Parse codes

If you want to get AST/binary-AST etc. of code and so on. Plz run such command.

python -m ncc_dataset.codesearchnet.feature_extract