Utils
taskchain.cache
Cache
Bases: abc.ABC
Cache interface.
Source code in taskchain/cache.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
get(key)
abstractmethod
Get value for given key if cached.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
str
|
key under which value is cached |
required |
Returns:
Type | Description |
---|---|
Any
|
cached value or NO_VALUE |
Source code in taskchain/cache.py
30 31 32 33 34 35 36 37 38 39 40 41 |
|
get_or_compute(key, computer, force=False)
abstractmethod
Get value for given key if cached or compute and cache it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
str
|
key under which value is cached |
required |
computer |
Callable
|
function which returns value if not cached |
required |
force |
bool
|
recompute value even if it is in cache |
False
|
Returns:
Type | Description |
---|---|
Any
|
cached or computed value |
Source code in taskchain/cache.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
|
subcache(*args)
abstractmethod
Create separate sub-cache of this cache.
Source code in taskchain/cache.py
58 59 60 61 |
|
DummyCache
Bases: Cache
No caching.
Source code in taskchain/cache.py
64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
InMemoryCache
Bases: Cache
Cache only in memory.
Source code in taskchain/cache.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
FileCache
Bases: Cache
General cache for saving values in files.
Source code in taskchain/cache.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
JsonCache
Bases: FileCache
Cache json-like objects in .json
files.
Source code in taskchain/cache.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
DataFrameCache
Bases: FileCache
Cache pandas DataFrame objects in .pd
files.
Source code in taskchain/cache.py
201 202 203 204 205 206 207 208 209 210 211 212 |
|
NumpyArrayCache
Bases: FileCache
Cache numpy arrays in .npy
files.
Source code in taskchain/cache.py
215 216 217 218 219 220 221 222 223 224 225 226 |
|
cached
Decorator for automatic caching of method results. Decorated method is for given arguments called only once a result is cached. Cache key is automatically constructed based on method arguments. Cache can be defined in decorator or as attribute of object.
Source code in taskchain/cache.py
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 |
|
__init__(cache_object=None, key=None, cache_attr='cache', ignore_kwargs=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cache_object |
Cache
|
Cache used for caching. |
None
|
key |
Callable
|
custom function for computing key from arguments |
None
|
cache_attr |
str
|
if |
'cache'
|
ignore_kwargs |
List[str]
|
kwargs to ignore in key construction, e.g. |
None
|
Source code in taskchain/cache.py
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 |
|
taskchain.utils.clazz
persistent
Method decorator.
Has to be used on decorator without arguments.
Saves result in self.__method_name
and next time does not call decorated method and only return saved value.
Source code in taskchain/utils/clazz.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|
repeat_on_error
Method decorator which calls method again on error.
Source code in taskchain/utils/clazz.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
__init__(retries=10, waiting_time=1, wait_extension=1.0)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
retries |
int
|
how many times try to call again |
10
|
waiting_time |
int
|
how many seconds wait before first retry |
1
|
wait_extension |
float
|
how many times increase waiting time after each retry |
1.0
|
Source code in taskchain/utils/clazz.py
53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
taskchain.utils.io
write_jsons(jsons, filename, use_tqdm=True, overwrite=True, nan_to_null=True, **kwargs)
Write json-like object to .jsonl
file (json lines).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
jsons |
Iterable
|
Iterable of json-like objects. |
required |
filename |
Path | str
|
required | |
use_tqdm |
bool
|
Show progress bar. |
True
|
overwrite |
bool
|
Overwrite existing file. |
True
|
nan_to_null |
bool
|
Change nan values to nulls. |
True
|
**kwargs |
other arguments to tqdm. |
{}
|
Source code in taskchain/utils/io.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
iter_json_file(filename, use_tqdm=True, **kwargs)
Yield loaded jsons from .jsonl
file (json lines).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
Path | str
|
required | |
use_tqdm |
bool
|
True
|
|
**kwargs |
additional arguments to tqdm |
{}
|
Source code in taskchain/utils/io.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
taskchain.utils.iter
list_or_str_to_list(value)
Helper function for cases where list of string is expected but single string is also ok.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value |
Union[None, List[str], str]
|
required |
Returns:
Type | Description |
---|---|
List[str]
|
original list or original string in list |
Source code in taskchain/utils/iter.py
45 46 47 48 49 50 51 52 53 54 55 56 |
|
taskchain.utils.migration
migrate_to_parameter_mode(config, target_dir, dry=True, verbose=True)
Migrate a chain to parameter mode.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config |
Config
|
config defining the chain |
required |
target_dir |
dir to migrate data to |
required | |
dry |
bool
|
show only info, do not copy data |
True
|
verbose |
bool
|
True
|
Source code in taskchain/utils/migration.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
taskchain.utils.threading
parallel_map(fun, iterable, threads=10, sort=True, use_tqdm=True, desc='Running tasks in parallel.', total=None, chunksize=1000)
Map function to iterable in multiple threads.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fun |
Callable
|
function to apply |
required |
iterable |
Iterable
|
required | |
threads |
int
|
number of threads |
10
|
sort |
bool
|
return values in same order as itarable |
True
|
use_tqdm |
bool
|
show progressbar |
True
|
desc |
str
|
text of progressbar |
'Running tasks in parallel.'
|
total |
int
|
size of iterable to allow show better progressbar |
None
|
Returns:
Name | Type | Description |
---|---|---|
list | of returned values by fce |
Source code in taskchain/utils/threading.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
parallel_starmap(fun, iterable, **kwargs)
Allows use parallel_map
for function with multiple arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fun |
Callable
|
function with multiple arguments |
required |
iterable |
Iterable
|
lists or tuples of arguments |
required |
Source code in taskchain/utils/threading.py
69 70 71 72 73 74 75 76 77 78 79 80 81 |
|