Support for Amazon Transcribe format for multiple channels (not the same as speakers)

Found your wonderful software, but had minor issue when loading an Amazon Transcribe transcript that had the variant format for independent audio channels as oppose to the typical speakers format.

Impressively, your software still loaded the rows of the transcript correctly, however, it made every speaker label have a unique number suffix, so it was impossible to relabel the speaker labels all at once and almost insurmountable task to track and correct by hand a very long transcript.

It's used when each speakers are each on a dedicated channel/track in the source audio file:
[https://docs.aws.amazon.com/transcribe/latest/dg/how-channel-id.html](https://docs.aws.amazon.com/transcribe/latest/dg/how-channel-id.html)
Excerpt from referred AWS doc showing the JSON format:
```json
{
  "jobName": "job id",
  "accountId": "account id",
  "results": {
    "transcripts": [
      {
        "transcript": "When you try ... It seems to ..."
      }
    ],
    "channel_labels": {
      "channels": [
        {
          "channel_label": "ch_0",
          "items": [
            {
              "start_time": "12.282",
              "end_time": "12.592",
              "alternatives": [
                {
                  "confidence": "1.0000",
                  "content": "When"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.592",
              "end_time": "12.692",
              "alternatives": [
                {
                  "confidence": "0.8787",
                  "content": "you"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.702",
              "end_time": "13.252",
              "alternatives": [
                {
                  "confidence": "0.8318",
                  "content": "try"
                }
              ],
              "type": "pronunciation"
            },
            Transcription abbreviated
         ]
      },
      {
          "channel_label": "ch_1",
          "items": [
            {
              "start_time": "12.379",
              "end_time": "12.589",
              "alternatives": [
                {
                  "confidence": "0.5645",
                  "content": "It"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.599",
              "end_time": "12.659",
              "alternatives": [
                {
                  "confidence": "0.2907",
                  "content": "seems"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.669",
              "end_time": "13.029",
              "alternatives": [
                {
                  "confidence": "0.2497",
                  "content": "to"
                }
              ],
              "type": "pronunciation"
            },
            Transcription abbreviated
        ]
    }
}
```

It has "channel_labels" (object) -> "channels" (array/list) ->"channel" (object) with each channel containing it's own "items" for words oppose to "items" being declared once in the other format and uses "channel_label" instead of "speaker_label" for speakers. 

Could you please accommodate the Amazon Transcribe channel format variant and at least have speaker ID labels be consistent per channel if not matching the "channel_label?"

Just for reference here's the doc for speaker identification format:
[https://docs.aws.amazon.com/transcribe/latest/dg/how-diarization.html](https://docs.aws.amazon.com/transcribe/latest/dg/how-diarization.html)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Amazon Transcribe format for multiple channels (not the same as speakers) #235

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for Amazon Transcribe format for multiple channels (not the same as speakers) #235

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions