I did not find details in the paper and code to deal with such files https://github.com/bigcode-project/the-stack-v2/blob/b274af7e5c1116dc99c80816944d0ef9a173abfa/the_stack/2_download_files/get_file_contents.py#L124