[DL4C] Ch6. Multi Categorical Classification
Table of Contents
#hide
! [ -e /content ] && pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
#hide
from fastbook import *
Other Computer Vision Problems
Multi-Label Classification
The Data
from fastai.vision.all import *
path = untar_data(URLs.PASCAL_2007)
df = pd.read_csv(path/'train.csv')
df.head()
fname | labels | is_valid | |
---|---|---|---|
0 | 000005.jpg | chair | True |
1 | 000007.jpg | car | True |
2 | 000009.jpg | horse person | True |
3 | 000012.jpg | car | False |
4 | 000016.jpg | bicycle | True |
df.head()
fname | labels | is_valid | |
---|---|---|---|
0 | 000005.jpg | chair | True |
1 | 000007.jpg | car | True |
2 | 000009.jpg | horse person | True |
3 | 000012.jpg | car | False |
4 | 000016.jpg | bicycle | True |
Sidebar: Pandas and DataFrames
df.iloc[:,0]
0 000005.jpg
1 000007.jpg
2 000009.jpg
3 000012.jpg
4 000016.jpg
...
5006 009954.jpg
5007 009955.jpg
5008 009958.jpg
5009 009959.jpg
5010 009961.jpg
Name: fname, Length: 5011, dtype: object
df.iloc[0,:]
# Trailing :s are always optional (in numpy, pytorch, pandas, etc.),
# so this is equivalent:
df.iloc[0]
fname 000005.jpg
labels chair
is_valid True
Name: 0, dtype: object
df['fname']
0 000005.jpg
1 000007.jpg
2 000009.jpg
3 000012.jpg
4 000016.jpg
...
5006 009954.jpg
5007 009955.jpg
5008 009958.jpg
5009 009959.jpg
5010 009961.jpg
Name: fname, Length: 5011, dtype: object
tmp_df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
tmp_df
a | b | |
---|---|---|
0 | 1 | 3 |
1 | 2 | 4 |
tmp_df['c'] = tmp_df['a']+tmp_df['b']
tmp_df
a | b | c | |
---|---|---|---|
0 | 1 | 3 | 4 |
1 | 2 | 4 | 6 |
End sidebar
Constructing a DataBlock
# data block created with no parameters
dblock = DataBlock()
# create Datasets objects from 'source' params
dsets = dblock.datasets(df)
len(dsets.train),len(dsets.valid)
(4009, 1002)
dsets.train[0]
(fname 008663.jpg
labels car person
is_valid False
Name: 4346, dtype: object,
fname 008663.jpg
labels car person
is_valid False
Name: 4346, dtype: object)
dsets.valid[0]
(fname 002613.jpg
labels bottle person
is_valid True
Name: 1311, dtype: object,
fname 002613.jpg
labels bottle person
is_valid True
Name: 1311, dtype: object)
x,y = dsets.train[0]
x,y
(fname 008663.jpg
labels car person
is_valid False
Name: 4346, dtype: object,
fname 008663.jpg
labels car person
is_valid False
Name: 4346, dtype: object)
x['fname']
'008663.jpg'
# data block -> assume we have two things (input and target)
dblock = DataBlock(get_x = lambda r: r['fname'], get_y = lambda r: r['labels'])
dsets = dblock.datasets(df)
dsets.train[0]
('009546.jpg', 'sofa person')
# identical to above approach
# BUT, use more verbose approach when exporting your Learner after training
def get_x(r): return r['fname']
def get_y(r): return r['labels']
dblock = DataBlock(get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
('005620.jpg', 'aeroplane')
path
Path('/Users/ridealist/.fastai/data/pascal_2007')
def get_x(r): return path/'train'/r['fname']
def get_y(r): return r['labels'].split(' ')
dblock = DataBlock(get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
(Path('/Users/ridealist/.fastai/data/pascal_2007/train/002549.jpg'),
['tvmonitor'])
# MultiCategoryBlock -> One-Hot Encoding
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0], type(dsets.train[0])
((PILImage mode=RGB size=500x325,
TensorMultiCategory([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.])),
tuple)
dblock.summary(df)
Setting-up type transforms pipelines
Collecting items from fname labels is_valid
0 000005.jpg chair True
1 000007.jpg car True
2 000009.jpg horse person True
3 000012.jpg car False
4 000016.jpg bicycle True
... ... ... ...
5006 009954.jpg horse person True
5007 009955.jpg boat True
5008 009958.jpg person bicycle True
5009 009959.jpg car False
5010 009961.jpg dog False
[5011 rows x 3 columns]
Found 5011 items
2 datasets of sizes 4009,1002
Setting up Pipeline: get_x -> PILBase.create
Setting up Pipeline: get_y -> MultiCategorize -- {'vocab': None, 'sort': True, 'add_na': False} -> OneHotEncode -- {'c': None}
Building one sample
Pipeline: get_x -> PILBase.create
starting from
fname 002546.jpg
labels dog
is_valid True
Name: 1277, dtype: object
applying get_x gives
/Users/ridealist/.fastai/data/pascal_2007/train/002546.jpg
applying PILBase.create gives
PILImage mode=RGB size=500x375
Pipeline: get_y -> MultiCategorize -- {'vocab': None, 'sort': True, 'add_na': False} -> OneHotEncode -- {'c': None}
starting from
fname 002546.jpg
labels dog
is_valid True
Name: 1277, dtype: object
applying get_y gives
[dog]
applying MultiCategorize -- {'vocab': None, 'sort': True, 'add_na': False} gives
TensorMultiCategory([11])
applying OneHotEncode -- {'c': None} gives
TensorMultiCategory([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.])
Final sample: (PILImage mode=RGB size=500x375, TensorMultiCategory([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]))
Collecting items from fname labels is_valid
0 000005.jpg chair True
1 000007.jpg car True
2 000009.jpg horse person True
3 000012.jpg car False
4 000016.jpg bicycle True
... ... ... ...
5006 009954.jpg horse person True
5007 009955.jpg boat True
5008 009958.jpg person bicycle True
5009 009959.jpg car False
5010 009961.jpg dog False
[5011 rows x 3 columns]
Found 5011 items
2 datasets of sizes 4009,1002
Setting up Pipeline: get_x -> PILBase.create
Setting up Pipeline: get_y -> MultiCategorize -- {'vocab': None, 'sort': True, 'add_na': False} -> OneHotEncode -- {'c': None}
Setting up after_item: Pipeline: ToTensor
Setting up before_batch: Pipeline:
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}
Building one batch
Applying item_tfms to the first sample:
Pipeline: ToTensor
starting from
(PILImage mode=RGB size=500x375, TensorMultiCategory([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]))
applying ToTensor gives
(TensorImage of size 3x375x500, TensorMultiCategory([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]))
Adding the next 3 samples
No before_batch transform to apply
Collating items in a batch
Error! It's not possible to collate your items in a batch
Could not collate the 0-th members of your tuples because got the following shapes
torch.Size([3, 375, 500]),torch.Size([3, 333, 500]),torch.Size([3, 363, 500]),torch.Size([3, 334, 500])
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[19], line 1
----> 1 dblock.summary(df)
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/fastai/data/block.py:237, in summary(self, source, bs, show_batch, **kwargs)
235 why = _find_fail_collate(s)
236 print("Make sure all parts of your samples are tensors of the same size" if why is None else why)
--> 237 raise e
239 if len([f for f in dls.train.after_batch.fs if f.name != 'noop'])!=0:
240 print("\nApplying batch_tfms to the batch built")
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/fastai/data/block.py:231, in summary(self, source, bs, show_batch, **kwargs)
229 print("\nCollating items in a batch")
230 try:
--> 231 b = dls.train.create_batch(s)
232 b = retain_types(b, s[0] if is_listy(s) else s)
233 except Exception as e:
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/fastai/data/load.py:166, in DataLoader.create_batch(self, b)
164 try: return (fa_collate,fa_convert)[self.prebatched](b)
165 except Exception as e:
--> 166 if not self.prebatched: collate_error(e,b)
167 raise
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/fastai/data/load.py:164, in DataLoader.create_batch(self, b)
163 def create_batch(self, b):
--> 164 try: return (fa_collate,fa_convert)[self.prebatched](b)
165 except Exception as e:
166 if not self.prebatched: collate_error(e,b)
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/fastai/data/load.py:52, in fa_collate(t)
49 "A replacement for PyTorch `default_collate` which maintains types and handles `Sequence`s"
50 b = t[0]
51 return (default_collate(t) if isinstance(b, _collate_types)
---> 52 else type(t[0])([fa_collate(s) for s in zip(*t)]) if isinstance(b, Sequence)
53 else default_collate(t))
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/fastai/data/load.py:52, in <listcomp>(.0)
49 "A replacement for PyTorch `default_collate` which maintains types and handles `Sequence`s"
50 b = t[0]
51 return (default_collate(t) if isinstance(b, _collate_types)
---> 52 else type(t[0])([fa_collate(s) for s in zip(*t)]) if isinstance(b, Sequence)
53 else default_collate(t))
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/fastai/data/load.py:51, in fa_collate(t)
49 "A replacement for PyTorch `default_collate` which maintains types and handles `Sequence`s"
50 b = t[0]
---> 51 return (default_collate(t) if isinstance(b, _collate_types)
52 else type(t[0])([fa_collate(s) for s in zip(*t)]) if isinstance(b, Sequence)
53 else default_collate(t))
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py:264, in default_collate(batch)
203 def default_collate(batch):
204 r"""
205 Function that takes in a batch of data and puts the elements within the batch
206 into a tensor with an additional outer dimension - batch size. The exact output type can be
(...)
262 >>> default_collate(batch) # Handle `CustomType` automatically
263 """
--> 264 return collate(batch, collate_fn_map=default_collate_fn_map)
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py:123, in collate(batch, collate_fn_map)
121 for collate_type in collate_fn_map:
122 if isinstance(elem, collate_type):
--> 123 return collate_fn_map[collate_type](batch, collate_fn_map=collate_fn_map)
125 if isinstance(elem, collections.abc.Mapping):
126 try:
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py:162, in collate_tensor_fn(batch, collate_fn_map)
160 storage = elem._typed_storage()._new_shared(numel, device=elem.device)
161 out = elem.new(storage).resize_(len(batch), *list(elem.size()))
--> 162 return torch.stack(batch, 0, out=out)
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/fastai/torch_core.py:382, in TensorBase.__torch_function__(cls, func, types, args, kwargs)
380 if cls.debug and func.__name__ not in ('__str__','__repr__'): print(func, types, args, kwargs)
381 if _torch_handled(args, cls._opt, func): types = (torch.Tensor,)
--> 382 res = super().__torch_function__(func, types, args, ifnone(kwargs, {}))
383 dict_objs = _find_args(args) if args else _find_args(list(kwargs.values()))
384 if issubclass(type(res),TensorBase) and dict_objs: res.set_meta(dict_objs[0],as_copy=True)
File ~/miniconda3/envs/fastbook/lib/python3.9/site-packages/torch/_tensor.py:1295, in Tensor.__torch_function__(cls, func, types, args, kwargs)
1292 return NotImplemented
1294 with _C.DisableTorchFunctionSubclass():
-> 1295 ret = func(*args, **kwargs)
1296 if func in get_default_nowrap_functions():
1297 return ret
RuntimeError: Error when trying to collate the data into batches with fa_collate, at least two tensors in the batch are not the same size.
Mismatch found on axis 0 of the batch and is of type `TensorImage`:
Item at index 0 has shape: torch.Size([3, 375, 500])
Item at index 1 has shape: torch.Size([3, 333, 500])
Please include a transform in `after_item` that ensures all data of type TensorImage is the same size
idxs = torch.where(dsets.train[0][1]==1.); idxs
(TensorMultiCategory([6]),)
idxs = torch.where(dsets.train[0][1]==1.)[0] ; idxs
# dsets.train.vocab[idxs]
TensorMultiCategory([6])
dsets.train.vocab
['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']
dsets.train.vocab[idxs]
(#1) ['car']
len(dsets.train.vocab)
20
def splitter(df):
train = df.index[~df['is_valid']].tolist()
valid = df.index[df['is_valid']].tolist()
return train,valid
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
splitter=splitter,
get_x=get_x,
get_y=get_y)
dsets = dblock.datasets(df)
dsets.train[0]
(PILImage mode=RGB size=500x333,
TensorMultiCategory([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
splitter=splitter,
get_x=get_x,
get_y=get_y,
item_tfms = RandomResizedCrop(128, min_scale=0.35))
dls = dblock.dataloaders(df)
dls.show_batch(nrows=1, ncols=3)
Remember “summary” method
dblock.summary(df)
Setting-up type transforms pipelines
Collecting items from fname labels is_valid
0 000005.jpg chair True
1 000007.jpg car True
2 000009.jpg horse person True
3 000012.jpg car False
4 000016.jpg bicycle True
... ... ... ...
5006 009954.jpg horse person True
5007 009955.jpg boat True
5008 009958.jpg person bicycle True
5009 009959.jpg car False
5010 009961.jpg dog False
[5011 rows x 3 columns]
Found 5011 items
2 datasets of sizes 2501,2510
Setting up Pipeline: get_x -> PILBase.create
Setting up Pipeline: get_y -> MultiCategorize -- {'vocab': None, 'sort': True, 'add_na': False} -> OneHotEncode -- {'c': None}
Building one sample
Pipeline: get_x -> PILBase.create
starting from
fname 000012.jpg
labels car
is_valid False
Name: 3, dtype: object
applying get_x gives
/Users/ridealist/.fastai/data/pascal_2007/train/000012.jpg
applying PILBase.create gives
PILImage mode=RGB size=500x333
Pipeline: get_y -> MultiCategorize -- {'vocab': None, 'sort': True, 'add_na': False} -> OneHotEncode -- {'c': None}
starting from
fname 000012.jpg
labels car
is_valid False
Name: 3, dtype: object
applying get_y gives
[car]
applying MultiCategorize -- {'vocab': None, 'sort': True, 'add_na': False} gives
TensorMultiCategory([6])
applying OneHotEncode -- {'c': None} gives
TensorMultiCategory([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Final sample: (PILImage mode=RGB size=500x333, TensorMultiCategory([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))
Collecting items from fname labels is_valid
0 000005.jpg chair True
1 000007.jpg car True
2 000009.jpg horse person True
3 000012.jpg car False
4 000016.jpg bicycle True
... ... ... ...
5006 009954.jpg horse person True
5007 009955.jpg boat True
5008 009958.jpg person bicycle True
5009 009959.jpg car False
5010 009961.jpg dog False
[5011 rows x 3 columns]
Found 5011 items
2 datasets of sizes 2501,2510
Setting up Pipeline: get_x -> PILBase.create
Setting up Pipeline: get_y -> MultiCategorize -- {'vocab': None, 'sort': True, 'add_na': False} -> OneHotEncode -- {'c': None}
Setting up after_item: Pipeline: RandomResizedCrop -- {'size': (128, 128), 'min_scale': 0.35, 'ratio': (0.75, 1.3333333333333333), 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'val_xtra': 0.14, 'max_scale': 1.0, 'p': 1.0} -> ToTensor
Setting up before_batch: Pipeline:
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}
Building one batch
Applying item_tfms to the first sample:
Pipeline: RandomResizedCrop -- {'size': (128, 128), 'min_scale': 0.35, 'ratio': (0.75, 1.3333333333333333), 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'val_xtra': 0.14, 'max_scale': 1.0, 'p': 1.0} -> ToTensor
starting from
(PILImage mode=RGB size=500x333, TensorMultiCategory([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))
applying RandomResizedCrop -- {'size': (128, 128), 'min_scale': 0.35, 'ratio': (0.75, 1.3333333333333333), 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'val_xtra': 0.14, 'max_scale': 1.0, 'p': 1.0} gives
(PILImage mode=RGB size=128x128, TensorMultiCategory([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))
applying ToTensor gives
(TensorImage of size 3x128x128, TensorMultiCategory([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))
Adding the next 3 samples
No before_batch transform to apply
Collating items in a batch
Applying batch_tfms to the batch built
Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}
starting from
(TensorImage of size 4x3x128x128, TensorMultiCategory of size 4x20)
applying IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} gives
(TensorImage of size 4x3x128x128, TensorMultiCategory of size 4x20)
Binary Cross-Entropy
Learner contains 4 main things
- Model
- DataLoaders object
- Optimizer
- loss function
learn = vision_learner(dls, resnet18)
# resnet18 / dls / SGD / ( )
/Users/ridealist/miniconda3/envs/fastbook/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/Users/ridealist/miniconda3/envs/fastbook/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
dls.train.one_batch()
(TensorImage([[[[-0.6281, -0.7650, -1.0904, ..., -0.9534, -0.9363, -0.9192],
[ 0.1939, 0.1254, -0.0972, ..., -0.9534, -0.9363, -0.9363],
[ 0.8961, 1.3070, 1.2043, ..., -0.9363, -0.9192, -0.9192],
...,
[ 0.2624, -0.0972, -1.6727, ..., -0.8164, -0.8849, -1.2274],
[ 0.3652, -1.0390, -1.6727, ..., -0.6965, -0.7308, -0.8335],
[ 0.4508, -0.6794, -1.7069, ..., -0.7479, -0.7650, -0.7650]],
[[ 0.0826, 0.0126, -0.1800, ..., -0.0399, -0.0224, -0.0224],
[ 0.5028, 0.4678, 0.3277, ..., -0.0049, -0.0224, -0.0224],
[ 1.0455, 1.4482, 1.3782, ..., -0.0049, -0.0049, -0.0049],
...,
[ 0.3627, -0.0049, -1.5280, ..., -0.7752, -0.7927, -1.1253],
[ 0.4328, -0.9328, -1.5455, ..., -0.6702, -0.6702, -0.7227],
[ 0.5378, -0.3725, -1.5455, ..., -0.7402, -0.7402, -0.7052]],
[[ 1.0539, 0.9668, 0.8274, ..., 1.2108, 1.2282, 1.2282],
[ 0.9494, 0.9319, 0.8622, ..., 1.2282, 1.2282, 1.2282],
[ 1.3154, 1.6640, 1.6291, ..., 1.1934, 1.2108, 1.2282],
...,
[ 0.4439, 0.1476, -1.3164, ..., -0.6367, -0.6367, -0.8807],
[ 0.5485, -0.7936, -1.3164, ..., -0.5147, -0.5147, -0.5670],
[ 0.6531, -0.2881, -1.3164, ..., -0.5670, -0.6018, -0.5495]]],
[[[-1.0562, -1.0733, -0.9877, ..., -1.9124, -1.8953, -1.7583],
[-1.1418, -1.0219, -0.9705, ..., -1.7925, -1.7925, -1.8268],
[-1.2959, -1.0562, -0.9363, ..., -1.8097, -1.7240, -1.7069],
...,
[ 0.3309, 0.3994, 0.4337, ..., 0.7762, 0.7248, 0.6734],
[ 0.3481, 0.4337, 0.4679, ..., 0.7933, 0.7419, 0.7077],
[ 0.3994, 0.5193, 0.4166, ..., 0.7591, 0.7419, 0.7077]],
[[-0.9328, -0.9678, -0.8803, ..., -1.8957, -1.8782, -1.7381],
[-1.0203, -0.9328, -0.8627, ..., -1.7556, -1.7731, -1.8081],
[-1.1954, -0.9678, -0.8452, ..., -1.7906, -1.7381, -1.7381],
...,
[ 0.2402, 0.2402, 0.2577, ..., 0.8179, 0.7304, 0.5903],
[ 0.2402, 0.2927, 0.3277, ..., 0.8704, 0.7829, 0.6604],
[ 0.2577, 0.3803, 0.3627, ..., 0.8704, 0.8004, 0.6954]],
[[-0.7936, -0.8284, -0.7413, ..., -1.6999, -1.6824, -1.5604],
[-0.8807, -0.7761, -0.7413, ..., -1.6127, -1.6127, -1.6476],
[-1.0376, -0.8284, -0.6890, ..., -1.6302, -1.5779, -1.5779],
...,
[ 0.3045, 0.2522, 0.2522, ..., 0.9668, 0.8622, 0.7054],
[ 0.3045, 0.3045, 0.3568, ..., 1.0191, 0.9319, 0.7925],
[ 0.3219, 0.4091, 0.4091, ..., 1.0191, 0.9668, 0.8448]]],
[[[-0.5596, -0.6281, -0.6452, ..., -1.1247, -1.0904, -1.0733],
[ 0.0227, -0.5424, -0.4911, ..., -1.1589, -1.0048, -1.0219],
[ 0.4851, -0.1999, 0.2624, ..., -1.2103, -1.0562, -0.9534],
...,
[-2.0837, -2.0837, -2.0494, ..., -1.7069, -1.8782, -1.9809],
[-2.0837, -2.0837, -2.0494, ..., -1.7583, -1.5870, -1.4843],
[-2.0837, -2.0837, -2.0494, ..., -2.0494, -2.0494, -1.8782]],
[[-0.3025, -0.3725, -0.4076, ..., -1.0028, -0.9328, -0.8978],
[ 0.2227, -0.3725, -0.3025, ..., -0.9678, -0.8102, -0.8277],
[ 0.7654, 0.0301, 0.5203, ..., -0.9853, -0.8627, -0.7577],
...,
[-2.0007, -2.0007, -1.9657, ..., -1.6681, -1.8256, -1.9132],
[-2.0007, -2.0007, -1.9657, ..., -1.7031, -1.5105, -1.4055],
[-2.0007, -2.0007, -1.9657, ..., -1.9657, -1.9657, -1.7731]],
[[ 0.0779, -0.0092, -0.0790, ..., -0.2532, -0.1312, -0.0964],
[ 0.6008, 0.0431, 0.0256, ..., -0.1661, -0.0267, -0.0790],
[ 1.1237, 0.4439, 0.9145, ..., -0.2184, -0.0615, 0.0082],
...,
[-1.7696, -1.7696, -1.7347, ..., -1.4559, -1.6127, -1.6999],
[-1.7696, -1.7696, -1.7347, ..., -1.4907, -1.2816, -1.1596],
[-1.7696, -1.7696, -1.7347, ..., -1.7347, -1.7347, -1.5430]]],
...,
[[[ 0.1083, 0.0912, 0.0056, ..., 0.6734, 0.7419, 0.8447],
[ 0.0227, -0.0287, -0.0801, ..., 0.7248, 0.8447, 0.9132],
[ 0.0569, -0.0287, -0.0629, ..., 0.7762, 0.8961, 0.9646],
...,
[ 1.1358, 0.3652, 0.2111, ..., -1.9467, -1.9809, -2.0152],
[ 0.9303, 0.3481, 0.2111, ..., -2.0152, -2.0152, -1.9980],
[ 0.8276, 0.3481, 0.2111, ..., -1.9124, -1.8953, -1.8610]],
[[ 0.2402, 0.2227, 0.1352, ..., 0.8179, 0.8880, 0.9930],
[ 0.1527, 0.1001, 0.0476, ..., 0.8704, 0.9930, 1.0630],
[ 0.1877, 0.1001, 0.0651, ..., 0.9230, 1.0455, 1.1155],
...,
[ 1.2906, 0.5028, 0.3452, ..., -1.8606, -1.8957, -1.9307],
[ 1.0805, 0.4853, 0.3452, ..., -1.9307, -1.9307, -1.9132],
[ 0.9755, 0.4853, 0.3452, ..., -1.8256, -1.8081, -1.7731]],
[[ 0.4614, 0.4439, 0.3568, ..., 1.0365, 1.1062, 1.2108],
[ 0.3742, 0.3219, 0.2696, ..., 1.0888, 1.2108, 1.2805],
[ 0.4091, 0.3219, 0.2871, ..., 1.1411, 1.2631, 1.3328],
...,
[ 1.5071, 0.7228, 0.5659, ..., -1.6302, -1.6650, -1.6999],
[ 1.2980, 0.7054, 0.5659, ..., -1.6999, -1.6999, -1.6824],
[ 1.1934, 0.7054, 0.5659, ..., -1.5953, -1.5779, -1.5430]]],
[[[ 0.4337, 0.4508, 0.4508, ..., -1.5528, -1.3473, -0.9020],
[ 0.4508, 0.4679, 0.4679, ..., -1.5528, -1.5528, -1.2445],
[ 0.4508, 0.4508, 0.4337, ..., -1.5699, -1.5357, -1.3473],
...,
[-1.7583, -1.7583, -1.7754, ..., -0.0116, -0.0458, -0.1143],
[-1.7240, -1.7754, -1.7754, ..., -0.0458, -0.0801, -0.1314],
[-1.6213, -1.7925, -1.7925, ..., -0.2684, -0.3027, -0.3541]],
[[ 0.5203, 0.5378, 0.5378, ..., -1.5980, -1.4405, -1.1078],
[ 0.5378, 0.5553, 0.5553, ..., -1.5980, -1.5105, -1.2129],
[ 0.5378, 0.5378, 0.5203, ..., -1.5980, -1.5455, -1.3179],
...,
[-1.7556, -1.7206, -1.7206, ..., -0.6176, -0.6352, -0.6527],
[-1.8081, -1.7731, -1.7556, ..., -0.3375, -0.3725, -0.4251],
[-1.7906, -1.8431, -1.7906, ..., -0.5126, -0.5126, -0.5476]],
[[ 0.8622, 0.8797, 0.8797, ..., -1.3164, -1.2293, -1.0201],
[ 0.8797, 0.8971, 0.8971, ..., -1.3164, -1.2467, -1.0027],
[ 0.8797, 0.8797, 0.8622, ..., -1.3164, -1.2816, -1.0898],
...,
[-1.5081, -1.4210, -1.4210, ..., -0.9504, -0.9678, -1.0027],
[-1.6302, -1.5256, -1.4907, ..., -0.3230, -0.4101, -0.5147],
[-1.7173, -1.6127, -1.5430, ..., -0.3927, -0.4275, -0.4798]]],
[[[-0.7479, -0.8678, -1.0048, ..., 0.5022, 0.3309, 0.1939],
[-0.7137, -0.8849, -1.0390, ..., 0.5364, 0.4851, 0.3994],
[-0.7137, -0.9020, -1.0048, ..., 0.5193, 0.4679, 0.4851],
...,
[-0.9363, -1.0904, -1.1418, ..., -0.1486, -0.0972, -0.1999],
[-1.1760, -1.2274, -1.2445, ..., -0.3369, -0.2342, -0.3541],
[-1.3130, -1.2103, -1.1075, ..., -0.4226, -0.3541, -0.3198]],
[[-0.4251, -0.6527, -0.8277, ..., 0.9055, 0.7304, 0.6254],
[-0.3901, -0.6352, -0.8452, ..., 0.9580, 0.9055, 0.8529],
[-0.3725, -0.6527, -0.8102, ..., 0.9580, 0.8880, 0.9755],
...,
[-0.6527, -0.7752, -0.7927, ..., 0.2227, 0.2927, 0.1877],
[-0.8978, -0.9328, -0.9678, ..., -0.0224, 0.1527, 0.0301],
[-0.9678, -0.8277, -0.7752, ..., -0.1275, 0.0301, 0.0476]],
[[ 0.1825, -0.0790, -0.2532, ..., 1.6291, 1.4548, 1.3328],
[ 0.2173, -0.0790, -0.2881, ..., 1.6465, 1.5942, 1.5245],
[ 0.2348, -0.0964, -0.2707, ..., 1.6291, 1.5942, 1.6640],
...,
[-0.0267, -0.1661, -0.1661, ..., 0.8099, 0.9494, 0.8099],
[-0.2881, -0.3230, -0.3055, ..., 0.6182, 0.7925, 0.6356],
[-0.3230, -0.1835, -0.0790, ..., 0.5485, 0.7228, 0.7751]]]], device='mps:0'),
TensorMultiCategory([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 1.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], device='mps:0'))
## the model in Learner is an object of a class inheriting from "nn.Module"
## can call model using parentheses -> return the activations of model
x,y = to_cpu(dls.train.one_batch())
activs = learn.model(x)
# batch size of 64 / 20 categories
activs.shape
torch.Size([64, 20])
activs[0]
TensorImage([ 0.2109, 0.0866, -0.1316, 1.7263, -2.0373, 0.0544, -0.6262, -0.1172, -3.2036, -0.3143, 0.6041, -0.7967, -0.8520, 2.4630, 1.4149, 0.9663, -1.1261, -0.8106, 0.0800, -1.8106],
grad_fn=<AliasBackward0>)
def binary_cross_entropy(inputs, targets):
inputs = inputs.sigmoid()
return -torch.where(targets==1, inputs, 1-inputs).log().mean()
activs.shape, y.shape
(torch.Size([64, 20]), torch.Size([64, 20]))
# x= x.as_subclass(torch.Tensor)
y = y.as_subclass(torch.Tensor)
loss_func = nn.BCEWithLogitsLoss()
loss = loss_func(activs, y)
loss
TensorImage(1.1001, grad_fn=<AliasBackward0>)
‘partial’ function
- allows us to bind a “function” with some “arguments” or “keyword arguments”
- making a new version of that function, always includes those arguments
def say_hello(name, say_what="Hello"): return f"{say_what} {name}."
say_hello('Jeremy'), say_hello('Jeremy', 'Ahoy!')
('Hello Jeremy.', 'Ahoy! Jeremy.')
f = partial(say_hello, say_what="Bonjour")
f("Jeremy"), f("Sylvain")
('Bonjour Jeremy.', 'Bonjour Sylvain.')
accuracy multi
def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):
if sigmoid: inp = inp.sigmoid()
## only one element tensors can be converted to Python scalars
## return (int(inp > thresh) == targ).float().mean()
return ((inp > thresh) == targ.bool()).float().mean()
learn = vision_learner(dls, resnet50, metrics=partial(accuracy_multi, thresh=0.2))
learn.fine_tune(3, base_lr=3e-3, freeze_epochs=4)
/Users/ridealist/miniconda3/envs/fastbook/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
epoch | train_loss | valid_loss | accuracy_multi | time |
---|---|---|---|---|
0 | 0.942904 | 0.708307 | 0.238287 | 00:34 |
1 | 0.825156 | 0.561708 | 0.293626 | 00:33 |
2 | 0.605840 | 0.204976 | 0.819681 | 00:33 |
3 | 0.360526 | 0.126902 | 0.939223 | 00:34 |
epoch | train_loss | valid_loss | accuracy_multi | time |
---|---|---|---|---|
0 | 0.132665 | 0.118781 | 0.942092 | 00:39 |
1 | 0.118319 | 0.107946 | 0.948327 | 00:38 |
2 | 0.099634 | 0.103022 | 0.952908 | 00:38 |
learn.metrics = partial(accuracy_multi, thresh=0.1)
learn.validate()
# val_loss / accuracy -> thresh 따라 차이가 큼!
(#2) [0.10986798256635666,1.0]
learn.metrics = partial(accuracy_multi, thresh=0.99)
learn.validate()
## we can find the best threshold by trying a few levels and seeing what works best
preds,targs = learn.get_preds()
# by default "get_pres" applies the output activation function for us (sigmoid, in this case)
# so we'll set not to apply it
accuracy_multi(preds, targs, thresh=0.9, sigmoid=False)
tensor(0.9571)
xs = torch.linspace(0.05,0.95,29)
accs = [accuracy_multi(preds, targs, thresh=i, sigmoid=False) for i in xs]
plt.plot(xs,accs);
Theory vs. Practice
- In this case, we’re using the validation set to pick a hyperparameter (the threshold), which is the purpose of the validation set.
- Cocern about Overfitting is unneeded
- because results in a smooth curve, we’re clearly not picking an inappropriate outlier
Regression
Assemble the Data
path = untar_data(URLs.BIWI_HEAD_POSE)
#hide
Path.BASE_PATH = path
path.ls().sorted()
(#50) [Path('01'),Path('01.obj'),Path('02'),Path('02.obj'),Path('03'),Path('03.obj'),Path('04'),Path('04.obj'),Path('05'),Path('05.obj')...]
(path/'01').ls().sorted()
(#1000) [Path('01/depth.cal'),Path('01/frame_00003_pose.txt'),Path('01/frame_00003_rgb.jpg'),Path('01/frame_00004_pose.txt'),Path('01/frame_00004_rgb.jpg'),Path('01/frame_00005_pose.txt'),Path('01/frame_00005_rgb.jpg'),Path('01/frame_00006_pose.txt'),Path('01/frame_00006_rgb.jpg'),Path('01/frame_00007_pose.txt')...]
# get all image files recursively with "get_image_files"
img_files = get_image_files(path)
def img2pose(x): return Path(f'{str(x)[:-7]}pose.txt')
img2pose(img_files[0])
Path('03/frame_00393_pose.txt')
img_files
(#15678) [Path('03/frame_00393_rgb.jpg'),Path('03/frame_00383_rgb.jpg'),Path('03/frame_00619_rgb.jpg'),Path('03/frame_00609_rgb.jpg'),Path('03/frame_00134_rgb.jpg'),Path('03/frame_00124_rgb.jpg'),Path('03/frame_00252_rgb.jpg'),Path('03/frame_00242_rgb.jpg'),Path('03/frame_00407_rgb.jpg'),Path('03/frame_00417_rgb.jpg')...]
im = PILImage.create(img_files[0])
im.shape
(480, 640)
im.to_thumb(160)
# extract the head center point : return the coordinates as a tensor of two items
# np.gengromtxt -> 텍스트 파일 열기, 컬럼별로 구분
cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6)
def get_ctr(f):
# 출력결과에서 앞 3개를 뺌
ctr = np.genfromtxt(img2pose(f), skip_header=3)
c1 = ctr[0] * cal[0][0]/ctr[2] + cal[0][2]
c2 = ctr[1] * cal[1][1]/ctr[2] + cal[1][2]
return tensor([c1,c2])
cal
array([[517.679, 0. , 320. ],
[ 0. , 517.679, 240.5 ],
[ 0. , 0. , 1. ]])
get_ctr(img_files[0])
tensor([387.1024, 261.9126])
img_files[0], img_files[0].parent, img_files[0].parent.name
(Path('03/frame_00393_rgb.jpg'), Path('03'), '03')
biwi = DataBlock( #the lable represent "coordinate" -> same augmentation coordinate & image
blocks=(ImageBlock, PointBlock),
get_items=get_image_files,
get_y=get_ctr,
# create splitter function that returns True for just one person.
# resulting in a validation set containing just that person's images
splitter=FuncSplitter(lambda o: o.parent.name=='13'),
# batch_tfms=[*aug_transforms(size=(240,320)), Normalize.from_stats(*imagenet_stats)]
)
biwi.summary(path)
Setting-up type transforms pipelines
Collecting items from /Users/ridealist/.fastai/data/biwi_head_pose
Found 15678 items
2 datasets of sizes 15193,485
Setting up Pipeline: PILBase.create
Setting up Pipeline: get_ctr -> TensorPoint.create
Building one sample
Pipeline: PILBase.create
starting from
/Users/ridealist/.fastai/data/biwi_head_pose/03/frame_00393_rgb.jpg
applying PILBase.create gives
PILImage mode=RGB size=640x480
Pipeline: get_ctr -> TensorPoint.create
starting from
/Users/ridealist/.fastai/data/biwi_head_pose/03/frame_00393_rgb.jpg
applying get_ctr gives
tensor([387.1024, 261.9126])
applying TensorPoint.create gives
TensorPoint of size 1x2
Final sample: (PILImage mode=RGB size=640x480, TensorPoint([[387.1024, 261.9126]]))
Collecting items from /Users/ridealist/.fastai/data/biwi_head_pose
Found 15678 items
2 datasets of sizes 15193,485
Setting up Pipeline: PILBase.create
Setting up Pipeline: get_ctr -> TensorPoint.create
Setting up after_item: Pipeline: PointScaler -> ToTensor
Setting up before_batch: Pipeline:
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}
Building one batch
Applying item_tfms to the first sample:
Pipeline: PointScaler -> ToTensor
starting from
(PILImage mode=RGB size=640x480, TensorPoint of size 1x2)
applying PointScaler gives
(PILImage mode=RGB size=640x480, TensorPoint of size 1x2)
applying ToTensor gives
(TensorImage of size 3x480x640, TensorPoint of size 1x2)
Adding the next 3 samples
No before_batch transform to apply
Collating items in a batch
Applying batch_tfms to the batch built
Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}
starting from
(TensorImage of size 4x3x480x640, TensorPoint of size 4x1x2)
applying IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} gives
(TensorImage of size 4x3x480x640, TensorPoint of size 4x1x2)
dls = biwi.dataloaders(path)
dls.show_batch(max_n=9, figsize=(8,6))
# Make sure that WHY these are the shapes for mini-batches
xb, yb = dls.one_batch()
xb.shape, yb.shape
(torch.Size([64, 3, 480, 640]), torch.Size([64, 1, 2]))
yb[0]
TensorPoint([[0.2792, 0.0445]], device='mps:0')
Training a Model
## (coordinates in fastai and PyTorch are always rescaled between –1 and +1)
learn = vision_learner(dls, resnet18, y_range=(-1,1))
# This is set as the final layer of the model
# it forces the model to output activations in the range (lo, hi)
def sigmoid_range(x, lo, hi): return torch.sigmoid(x) * (hi-lo) + lo
# Basic sigmoid
plot_function(partial(sigmoid_range,lo=0,hi=1), min=-4, max=4)
plot_function(partial(sigmoid_range,lo=-1,hi=1), min=-4, max=4)
plot_function(partial(sigmoid_range,lo=-2,hi=5), min=-4, max=4)
# see what loss function choosed by fastai as the default
dls.loss_func
FlattenedLoss of MSELoss()
# pick a good learning rate wit "learning rate finder"
learn.lr_find()
lr = 1e-2
learn.fine_tune(3, lr)
math.sqrt(0.0001)
## take a look at our results
# left : the actual coordinates / right : model's predictions
learn.show_results(ds_idx=1, nrows=3, figsize=(6,8))
Conclusion
Questionnaire
- How could multi-label classification improve the usability of the bear classifier?
- How do we encode the dependent variable in a multi-label classification problem?
- How do you access the rows and columns of a DataFrame as if it was a matrix?
- How do you get a column by name from a DataFrame?
- What is the difference between a
Dataset
andDataLoader
? - What does a
Datasets
object normally contain? - What does a
DataLoaders
object normally contain? - What does
lambda
do in Python? - What are the methods to customize how the independent and dependent variables are created with the data block API?
- Why is softmax not an appropriate output activation function when using a one hot encoded target?
- Why is
nll_loss
not an appropriate loss function when using a one-hot-encoded target? - What is the difference between
nn.BCELoss
andnn.BCEWithLogitsLoss
? - Why can’t we use regular accuracy in a multi-label problem?
- When is it okay to tune a hyperparameter on the validation set?
- How is
y_range
implemented in fastai? (See if you can implement it yourself and test it without peeking!) - What is a regression problem? What loss function should you use for such a problem?
- What do you need to do to make sure the fastai library applies the same data augmentation to your input images and your target point coordinates?
Further Research
- Read a tutorial about Pandas DataFrames and experiment with a few methods that look interesting to you. See the book’s website for recommended tutorials.
- Retrain the bear classifier using multi-label classification. See if you can make it work effectively with images that don’t contain any bears, including showing that information in the web application. Try an image with two different kinds of bears. Check whether the accuracy on the single-label dataset is impacted using multi-label classification.