DataLoader 是 torch 给你用来包装你的数据的工具. 所以你要讲自己的 (numpy array 或其他) 数据形式装换成 Tensor, 然后再放进这个包装器中. 使用 DataLoader 有什么好处呢? 就是他们帮你有效地迭代数据, 举例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | import torch import torch.utils.data as Data torch.manual_seed(1) # reproducible BATCH_SIZE = 5 # 批训练的数据个数 x = torch.linspace(1, 10, 10) # x data (torch tensor) y = torch.linspace(10, 1, 10) # y data (torch tensor) # 先转换成 torch 能识别的 Dataset torch_dataset = Data.TensorDataset(data_tensor=x, target_tensor=y) # 把 dataset 放入 DataLoader loader = Data.DataLoader( dataset=torch_dataset, # torch TensorDataset format batch_size=BATCH_SIZE, # mini batch size shuffle=True, # 要不要打乱数据 (打乱比较好) num_workers=2, # 多线程来读数据 ) for epoch in range(3): # 训练所有!整套!数据 3 次 for step, (batch_x, batch_y) in enumerate(loader): # 每一步 loader 释放一小批数据用来学习 # 假设这里就是你训练的地方... # 打出来一些数据 print(\'Epoch: \', epoch, \'| Step: \', step, \'| batch x: \', batch_x.numpy(), \'| batch y: \', batch_y.numpy()) """ Epoch: 0 | Step: 0 | batch x: [ 6. 7. 2. 3. 1.] | batch y: [ 5. 4. 9. 8. 10.] Epoch: 0 | Step: 1 | batch x: [ 9. 10. 4. 8. 5.] | batch y: [ 2. 1. 7. 3. 6.] Epoch: 1 | Step: 0 | batch x: [ 3. 4. 2. 9. 10.] | batch y: [ 8. 7. 9. 2. 1.] Epoch: 1 | Step: 1 | batch x: [ 1. 7. 8. 5. 6.] | batch y: [ 10. 4. 3. 6. 5.] Epoch: 2 | Step: 0 | batch x: [ 3. 9. 2. 6. 7.] | batch y: [ 8. 2. 9. 5. 4.] Epoch: 2 | Step: 1 | batch x: [ 10. 4. 8. 1. 5.] | batch y: [ 1. 7. 3. 10. 6.] """ |
可以看出, 每步都导出了5个数据进行学习. 然后每个 epoch 的导出数据都是先打乱了以后再导出.
真正方便的还不是这点. 如果我们改变一下 BATCH_SIZE = 8 , 这样我们就知道, step=0 会导出8个数据, 但是, step=1 时数据库中的数据不够 8个, 这时怎么办呢:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | BATCH_SIZE = 8 # 批训练的数据个数 ... for ...: for ...: ... print(\'Epoch: \', epoch, \'| Step: \', step, \'| batch x: \', batch_x.numpy(), \'| batch y: \', batch_y.numpy()) """ Epoch: 0 | Step: 0 | batch x: [ 6. 7. 2. 3. 1. 9. 10. 4.] | batch y: [ 5. 4. 9. 8. 10. 2. 1. 7.] Epoch: 0 | Step: 1 | batch x: [ 8. 5.] | batch y: [ 3. 6.] Epoch: 1 | Step: 0 | batch x: [ 3. 4. 2. 9. 10. 1. 7. 8.] | batch y: [ 8. 7. 9. 2. 1. 10. 4. 3.] Epoch: 1 | Step: 1 | batch x: [ 5. 6.] | batch y: [ 6. 5.] Epoch: 2 | Step: 0 | batch x: [ 3. 9. 2. 6. 7. 10. 4. 8.] | batch y: [ 8. 2. 9. 5. 4. 1. 7. 3.] Epoch: 2 | Step: 1 | batch x: [ 1. 5.] | batch y: [ 10. 6.] """ |
这时, 在 step=1 就只给你返回这个 epoch 中剩下的数据就好了.
所以这也就是在我 github 代码 中的每一步的意义啦.
文章来源:莫烦
本站微信群、QQ群(三群号 726282629):
想请教一个问题,就是这样训练的时候,变量需要变成 Variable 类型的,我训练处的代码是这么写的,不知道这样写是否正确:
# 数据整体训练三次
for epoch in range(50):
for step,(batch_x,batch_y) in enumerate(loader):
print(‘Epoch : ‘,epoch,’step : ‘,step)
batch_x,batch_y = Variable(batch_x),Variable(batch_y)
prediction = net(batch_x) #预测值
loss = loss_func(prediction,batch_y) #计算误差,注意prediction和y的顺序
optimizer.zero_grad() #首先把所有梯度设为0
loss.backward() #反向传递
optimizer.step() #优化梯度
直接复制粘贴格式好像会出现问题,里面主要是这一步,不知道是否正确:
batch_x,batch_y = Variable(batch_x),Variable(batch_y)
请教一个问题:
在遍历“for step, (batch_x, batch_y) in enumerate(loader): ”这一步会报这样的错误:
raceback (most recent call last):
File “”, line 1, in
Traceback (most recent call last):
File “”, line 1, in
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 105, in spawn_main
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 105, in spawn_main
exitcode = _main(fd)
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 114, in _main
exitcode = _main(fd)
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 114, in _main
prepare(preparation_data)
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 225, in prepare
prepare(preparation_data)
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 225, in prepare
_fixup_main_from_path(data[‘init_main_from_path’])
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 277, in _fixup_main_from_path
_fixup_main_from_path(data[‘init_main_from_path’])
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 277, in _fixup_main_from_path
run_name=”__mp_main__”)
File “G:\ANACONDA\lib\runpy.py”, line 263, in run_path
run_name=”__mp_main__”)
File “G:\ANACONDA\lib\runpy.py”, line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File “G:\ANACONDA\lib\runpy.py”, line 96, in _run_module_code
pkg_name=pkg_name, script_name=fname)
File “G:\ANACONDA\lib\runpy.py”, line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File “G:\ANACONDA\lib\runpy.py”, line 85, in _run_code
mod_name, mod_spec, pkg_name, script_name)
File “G:\ANACONDA\lib\runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “G:\code\torch\net\DataLoader.py”, line 25, in
for step, (batch_x, batch_y) in enumerate(loader): # 每一步 loader 释放一小批数据用来学习
File “G:\ANACONDA\lib\site-packages\torch\utils\data\dataloader.py”, line 310, in __iter__
exec(code, run_globals)
File “G:\code\torch\net\DataLoader.py”, line 25, in
for step, (batch_x, batch_y) in enumerate(loader): # 每一步 loader 释放一小批数据用来学习
File “G:\ANACONDA\lib\site-packages\torch\utils\data\dataloader.py”, line 310, in __iter__
return DataLoaderIter(self)
File “G:\ANACONDA\lib\site-packages\torch\utils\data\dataloader.py”, line 167, in __init__
return DataLoaderIter(self)
File “G:\ANACONDA\lib\site-packages\torch\utils\data\dataloader.py”, line 167, in __init__
w.start()
File “G:\ANACONDA\lib\multiprocessing\process.py”, line 105, in start
w.start()
File “G:\ANACONDA\lib\multiprocessing\process.py”, line 105, in start
self._popen = self._Popen(self)
File “G:\ANACONDA\lib\multiprocessing\context.py”, line 223, in _Popen
self._popen = self._Popen(self)
File “G:\ANACONDA\lib\multiprocessing\context.py”, line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File “G:\ANACONDA\lib\multiprocessing\context.py”, line 322, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File “G:\ANACONDA\lib\multiprocessing\context.py”, line 322, in _Popen
return Popen(process_obj)
File “G:\ANACONDA\lib\multiprocessing\popen_spawn_win32.py”, line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 143, in get_preparation_data
return Popen(process_obj)
File “G:\ANACONDA\lib\multiprocessing\popen_spawn_win32.py”, line 33, in __init__
_check_not_importing_main()
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 136, in _check_not_importing_main
prep_data = spawn.get_preparation_data(process_obj._name)
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 143, in get_preparation_data
is not going to be frozen to produce an executable.”’)
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == ‘__main__’:
freeze_support()
…
The “freeze_support()” line can be omitted if the program
is not going to be frozen to produce an executable.
_check_not_importing_main()
File “G:\ANACONDA\lib\multiprocessing\spawn.py”, line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.”’)
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == ‘__main__’:
freeze_support()
…
The “freeze_support()” line can be omitted if the program
is not going to be frozen to produce an executable.
请问解决了吗
AttributeError: module ‘torch.utils.data’ has no attribute ‘Dataloader’ 刚刚接触pytorch , 请问一下这个是什么问题呢。我搜了好多都没有遇到这个问题,
请问解决了吗,我也遇到了一样的问题,是环境没装好吗
已经解决,将代码换为
torch.utils.data.dataloader.Dataloader()