In the StarCoder, the XML filter removes some files that contain `<?xml version=` within the first 100 characters. <img width="70%" alt="image" src="https://github.com/bigcode-project/the-stack-v2/assets/42370681/c0ac6592-4923-4ebf-97aa-258d2087e199"> However, this step is not included in the language-specific filters of the the-stack-v2. <img width="70%" alt="image" src="https://github.com/bigcode-project/the-stack-v2/assets/42370681/6a1f7a4a-f290-4a0f-a166-e270927a5b30"> **So, why remove this filter?**