Does not run in Pytorch 1.3.1

I just cloned your repo and when I'm launching the command:

`CUDA_VISIBLE_DEVICES=2,3,4,5 python imagenet.py -a mobilenetv2 -d /path/to/dataset/ImageNet2012/ --epochs 150 --lr-decay cos --lr 0.05 --wd 4e-5 -c checkpoints --width-mult 1 --input-size 224 -j 12`

It gets stuck at this point:

```
=> creating model 'mobilenetv2'

Epoch: [1 | 150]
Processing

<Ctrl+C pressed after 10 min of nothing happening:>

^CTraceback (most recent call last):
  File "imagenet.py", line 403, in <module>
    main()
  File "imagenet.py", line 224, in main
    train_loss, train_acc = train(train_loader, train_loader_len, model, criterion, optimizer, epoch)
  File "imagenet.py", line 271, in train
    for i, (input, target) in enumerate(train_loader):
  File "/home/michael/mobilenetv2.pytorch/utils/dataloaders.py", line 190, in prefetched_loader
    for next_input, next_target in loader:
  File "/home/michael/miniconda2/envs/pt/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 804, in __next__
    idx, data = self._get_data()
  File "/home/michael/miniconda2/envs/pt/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 761, in _get_data
    success, data = self._try_get_data()
  File "/home/michael/miniconda2/envs/pt/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/michael/miniconda2/envs/pt/lib/python3.7/queue.py", line 179, in get
    self.not_empty.wait(remaining)
  File "/home/michael/miniconda2/envs/pt/lib/python3.7/threading.py", line 300, in wait
    gotit = waiter.acquire(True, timeout)
KeyboardInterrupt
```

Nothing is happening at this point. nvidia-smi shows that a single GPU consumes ~500M of memory, and CPU cores are ~60% busy, but it's not clear what are they doing. I waited for 10 minutes before aborting. I also tried it on a single GPU - same issue.
If I switch to `--data-backend dali-cpu` (using nvidia-dali version 0.16) it fails with the following error:

`=> creating model 'mobilenetv2'                                                                                                                                                                                                                               Traceback (most recent call last):                                                                                                                                                                                                                              File "imagenet.py", line 403, in <module>                                                                                                                                                                                                                       main()                                                                                                                                                                                                                                                      File "imagenet.py", line 194, in main                                                                                                                                                                                                                           train_loader, train_loader_len = get_train_loader(args.data, args.batch_size, workers=args.workers, input_size=args.input_size)                                                                                                                           TypeError: gdtl() got an unexpected keyword argument 'input_size' `

I'm using Pytorch 1.3.1 with 4x Titan Xp cards. The only thing I had to change in your code is to replace `cuda(async=True)` with `cuda(non_blocking=True)`. Changing to`non_blocking=False` does not help.

Can you please try cloning your repo to a clean Pytorch 1.3.1 environment and see if you can run it? Any idea what's going on?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does not run in Pytorch 1.3.1 #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Does not run in Pytorch 1.3.1 #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions