docs for muutils v0.8.7

Contents

PyPI PyPI - Downloads docs

Checks Checks Coverage

GitHub commits GitHub commit activity GitHub closed pull requests code size, bytes

muutils, stylized as “μutils” or “μutils”, is a collection of miscellaneous python utilities, meant to be small and with no dependencies outside of standard python.

installation

PyPi: muutils

pip install muutils

Note that for using mlutils, tensor_utils, nbutils.configure_notebook, or the array serialization features of json_serialize, you will need to install with optional array dependencies:

pip install muutils[array]

documentation

hosted html docs: https://miv.name/muutils

modules

statcounter

an extension of collections.Counter that provides “smart” computation of stats (mean, variance, median, other percentiles) from the counter object without using Counter.elements()

dictmagic

has utilities for working with dictionaries, like:

kappa

Anonymous gettitem, so you can do things like

>>> k = Kappa(lambda x: x**2)
>>> k[2]
4

sysinfo

utility for getting a bunch of system information. useful for logging.

misc:

contains a few utilities: - stable_hash() uses hashlib.sha256 to compute a hash of an object that is stable across runs of python - list_join and list_split which behave like str.join and str.split but for lists - sanitize_fname and dict_to_filename for simplifying the creation of unique filename - shorten_numerical_to_str() and str_to_numeric turns numbers like 123456789 into "123M" and back - freeze, which prevents an object from being modified. Also see gelidum

nbutils

contains utilities for working with jupyter notebooks, such as:

json_serialize

a tool for serializing and loading arbitrary python objects into json. plays nicely with ZANJ

[tensor_utils]

contains minor utilities for working with pytorch tensors and numpy arrays, mostly for making type conversions easier

group_equiv

groups elements from a sequence according to a given equivalence relation, without assuming that the equivalence relation obeys the transitive property

jsonlines

an extremely simple utility for reading/writing jsonl files

ZANJ

is a human-readable and simple format for ML models, datasets, and arbitrary objects. It’s build around having a zip file with json and npy files, and has been spun off into its own project.

There are a couple work-in-progress utilities in _wip that aren’t ready for anything, but nothing in this repo is suitable for production. Use at your own risk!

Submodules

View Source on GitHub

muutils

PyPI PyPI - Downloads docs

Checks Checks Coverage

GitHub commits GitHub commit activity GitHub closed pull requests code size, bytes

muutils, stylized as “μutils” or “μutils”, is a collection of miscellaneous python utilities, meant to be small and with no dependencies outside of standard python.

installation

PyPi: muutils

pip install muutils

Note that for using mlutils, tensor_utils, nbutils.configure_notebook, or the array serialization features of json_serialize, you will need to install with optional array dependencies:

pip install muutils[array]

documentation

hosted html docs: https://miv.name/muutils

modules

statcounter

an extension of collections.Counter that provides “smart” computation of stats (mean, variance, median, other percentiles) from the counter object without using Counter.elements()

dictmagic

has utilities for working with dictionaries, like:

kappa

Anonymous gettitem, so you can do things like

>>> k = Kappa(lambda x: x**2)
>>> k[2]
4

sysinfo

utility for getting a bunch of system information. useful for logging.

misc:

contains a few utilities: - stable_hash() uses hashlib.sha256 to compute a hash of an object that is stable across runs of python - list_join and list_split which behave like str.join and str.split but for lists - sanitize_fname and dict_to_filename for simplifying the creation of unique filename - shorten_numerical_to_str() and str_to_numeric turns numbers like 123456789 into "123M" and back - freeze, which prevents an object from being modified. Also see gelidum

nbutils

contains utilities for working with jupyter notebooks, such as:

json_serialize

a tool for serializing and loading arbitrary python objects into json. plays nicely with ZANJ

[tensor_utils]

contains minor utilities for working with pytorch tensors and numpy arrays, mostly for making type conversions easier

group_equiv

groups elements from a sequence according to a given equivalence relation, without assuming that the equivalence relation obeys the transitive property

jsonlines

an extremely simple utility for reading/writing jsonl files

ZANJ

is a human-readable and simple format for ML models, datasets, and arbitrary objects. It’s build around having a zip file with json and npy files, and has been spun off into its own project.

There are a couple work-in-progress utilities in _wip that aren’t ready for anything, but nothing in this repo is suitable for production. Use at your own risk!

View Source on GitHub

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.console_unicode

View Source on GitHub

def get_console_safe_str

(default: str, fallback: str) -> str

View Source on GitHub

Determine a console-safe string based on the preferred encoding.

This function attempts to encode a given default string using the system’s preferred encoding. If encoding is successful, it returns the default string; otherwise, it returns a fallback string.

Parameters:

Returns:

Usage:

>>> get_console_safe_str("café", "cafe")
"café"  # This result may vary based on the system's preferred encoding.

docs for muutils v0.8.7

Contents

this code is based on an implementation of the Rust builtin dbg! for Python, originally from https://github.com/tylerwince/pydbg/blob/master/pydbg.py although it has been significantly modified

licensed under MIT:

Copyright (c) 2019 Tyler Wince

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

API Documentation

View Source on GitHub

muutils.dbg

this code is based on an implementation of the Rust builtin dbg! for Python, originally from https://github.com/tylerwince/pydbg/blob/master/pydbg.py although it has been significantly modified

licensed under MIT:

Copyright (c) 2019 Tyler Wince

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

View Source on GitHub

def dbg

(
    exp: Union[~_ExpType, muutils.dbg._NoExpPassedSentinel] = <muutils.dbg._NoExpPassedSentinel object>,
    formatter: Optional[Callable[[Any], str]] = None,
    val_joiner: str = ' = '
) -> Union[~_ExpType, muutils.dbg._NoExpPassedSentinel]

View Source on GitHub

Call dbg with any variable or expression.

Calling dbg will print to stderr the current filename and lineno, as well as the passed expression and what the expression evaluates to:

    from <a href="">muutils.dbg</a> import dbg

    a = 2
    b = 5

    dbg(a+b)

    def square(x: int) -> int:
            return x * x

    dbg(square(a))

def tensor_info

(tensor: Any) -> str

View Source on GitHub

docs for muutils v0.8.7

Contents

making working with dictionaries easier

API Documentation

View Source on GitHub

muutils.dictmagic

making working with dictionaries easier

View Source on GitHub

class DefaulterDict(typing.Dict[~_KT, ~_VT], typing.Generic[~_KT, ~_VT]):

View Source on GitHub

like a defaultdict, but default_factory is passed the key as an argument

Inherited Members

def defaultdict_to_dict_recursive

(
    dd: Union[collections.defaultdict, muutils.dictmagic.DefaulterDict]
) -> dict

View Source on GitHub

Convert a defaultdict or DefaulterDict to a normal dict, recursively

def dotlist_to_nested_dict

(dot_dict: Dict[str, Any], sep: str = '.') -> Dict[str, Any]

View Source on GitHub

Convert a dict with dot-separated keys to a nested dict

Example:

>>> dotlist_to_nested_dict({'a.b.c': 1, 'a.b.d': 2, 'a.e': 3})
{'a': {'b': {'c': 1, 'd': 2}, 'e': 3}}

def nested_dict_to_dotlist

(
    nested_dict: Dict[str, Any],
    sep: str = '.',
    allow_lists: bool = False
) -> dict[str, typing.Any]

View Source on GitHub

def update_with_nested_dict

(
    original: dict[str, typing.Any],
    update: dict[str, typing.Any]
) -> dict[str, typing.Any]

View Source on GitHub

Update a dict with a nested dict

Example: >>> update_with_nested_dict({‘a’: {‘b’: 1}, “c”: -1}, {‘a’: {“b”: 2}}) {‘a’: {‘b’: 2}, ‘c’: -1}

Arguments

Returns

def kwargs_to_nested_dict

(
    kwargs_dict: dict[str, typing.Any],
    sep: str = '.',
    strip_prefix: Optional[str] = None,
    when_unknown_prefix: Union[muutils.errormode.ErrorMode, str] = ErrorMode.Warn,
    transform_key: Optional[Callable[[str], str]] = None
) -> dict[str, typing.Any]

View Source on GitHub

given kwargs from fire, convert them to a nested dict

if strip_prefix is not None, then all keys must start with the prefix. by default, will warn if an unknown prefix is found, but can be set to raise an error or ignore it: when_unknown_prefix: ErrorMode

Example:

def main(**kwargs):
    print(kwargs_to_nested_dict(kwargs))
fire.Fire(main)

running the above script will give:

$ python test.py --a.b.c=1 --a.b.d=2 --a.e=3
{'a': {'b': {'c': 1, 'd': 2}, 'e': 3}}

Arguments

def is_numeric_consecutive

(lst: list[str]) -> bool

View Source on GitHub

Check if the list of keys is numeric and consecutive.

def condense_nested_dicts_numeric_keys

(data: dict[str, typing.Any]) -> dict[str, typing.Any]

View Source on GitHub

condense a nested dict, by condensing numeric keys with matching values to ranges

Examples:

>>> condense_nested_dicts_numeric_keys({'1': 1, '2': 1, '3': 1, '4': 2, '5': 2, '6': 2})
{'[1-3]': 1, '[4-6]': 2}
>>> condense_nested_dicts_numeric_keys({'1': {'1': 'a', '2': 'a'}, '2': 'b'})
{"1": {"[1-2]": "a"}, "2": "b"}

def condense_nested_dicts_matching_values

(
    data: dict[str, typing.Any],
    val_condense_fallback_mapping: Optional[Callable[[Any], Hashable]] = None
) -> dict[str, typing.Any]

View Source on GitHub

condense a nested dict, by condensing keys with matching values

Examples: TODO

Parameters:

def condense_nested_dicts

(
    data: dict[str, typing.Any],
    condense_numeric_keys: bool = True,
    condense_matching_values: bool = True,
    val_condense_fallback_mapping: Optional[Callable[[Any], Hashable]] = None
) -> dict[str, typing.Any]

View Source on GitHub

condense a nested dict, by condensing numeric or matching keys with matching values to ranges

combines the functionality of condense_nested_dicts_numeric_keys() and condense_nested_dicts_matching_values()

NOTE: this process is not meant to be reversible, and is intended for pretty-printing and visualization purposes

it’s not reversible because types are lost to make the printing pretty

Parameters:

def tuple_dims_replace

(
    t: tuple[int, ...],
    dims_names_map: Optional[dict[int, str]] = None
) -> tuple[typing.Union[int, str], ...]

View Source on GitHub

def condense_tensor_dict

(
    data: 'TensorDict | TensorIterable',
    fmt: Literal['dict', 'json', 'yaml', 'yml'] = 'dict',
    *args,
    shapes_convert: Callable[[tuple], Any] = <function _default_shapes_convert>,
    drop_batch_dims: int = 0,
    sep: str = '.',
    dims_names_map: Optional[dict[int, str]] = None,
    condense_numeric_keys: bool = True,
    condense_matching_values: bool = True,
    val_condense_fallback_mapping: Optional[Callable[[Any], Hashable]] = None,
    return_format: Optional[Literal['dict', 'json', 'yaml', 'yml']] = None
) -> Union[str, dict[str, str | tuple[int, ...]]]

View Source on GitHub

Convert a dictionary of tensors to a dictionary of shapes.

by default, values are converted to strings of their shapes (for nice printing). If you want the actual shapes, set shapes_convert = lambda x: x or shapes_convert = None.

Parameters:

Returns:

Examples:

>>> model = transformer_lens.HookedTransformer.from_pretrained("gpt2")
>>> print(condense_tensor_dict(model.named_parameters(), return_format='yaml'))
embed:
  W_E: (50257, 768)
pos_embed:
  W_pos: (1024, 768)
blocks:
  '[0-11]':
    attn:
      '[W_Q, W_K, W_V]': (12, 768, 64)
      W_O: (12, 64, 768)
      '[b_Q, b_K, b_V]': (12, 64)
      b_O: (768,)
    mlp:
      W_in: (768, 3072)
      b_in: (3072,)
      W_out: (3072, 768)
      b_out: (768,)
unembed:
  W_U: (768, 50257)
  b_U: (50257,)

Raises:

docs for muutils v0.8.7

Contents

provides ErrorMode enum for handling errors consistently

pass an error_mode: ErrorMode to a function to specify how to handle a certain kind of exception. That function then instead of raiseing or warnings.warning, calls error_mode.process with the message and the exception.

you can also specify the exception class to raise, the warning class to use, and the source of the exception/warning.

API Documentation

View Source on GitHub

muutils.errormode

provides ErrorMode enum for handling errors consistently

pass an error_mode: ErrorMode to a function to specify how to handle a certain kind of exception. That function then instead of raiseing or warnings.warning, calls error_mode.process with the message and the exception.

you can also specify the exception class to raise, the warning class to use, and the source of the exception/warning.

View Source on GitHub

class WarningFunc(typing.Protocol):

View Source on GitHub

Base class for protocol classes.

Protocol classes are defined as::

class Proto(Protocol):
    def meth(self) -> int:
        ...

Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing).

For example::

class C:
    def meth(self) -> int:
        return 0

def func(x: Proto) -> int:
    return x.meth()

func(C())  # Passes static type check

See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as::

class GenProto[T](Protocol):
    def meth(self) -> T:
        ...

WarningFunc

(*args, **kwargs)

View Source on GitHub

def GLOBAL_WARN_FUNC

(unknown)

Issue a warning, or maybe ignore it or raise an exception.

message Text of the warning message. category The Warning category subclass. Defaults to UserWarning. stacklevel How far up the call stack to make this warning appear. A value of 2 for example attributes the warning to the caller of the code calling warn(). source If supplied, the destroyed object which emitted a ResourceWarning skip_file_prefixes An optional tuple of module filename prefixes indicating frames to skip during stacklevel computations for stack frame attribution.

def GLOBAL_LOG_FUNC

(*args, sep=' ', end='\n', file=None, flush=False)

Prints the values to a stream, or to sys.stdout by default.

sep string inserted between values, default a space. end string appended after the last value, default a newline. file a file-like object (stream); defaults to the current sys.stdout. flush whether to forcibly flush the stream.

def custom_showwarning

(
    message: Warning | str,
    category: Optional[Type[Warning]] = None,
    filename: str | None = None,
    lineno: int | None = None,
    file: Optional[TextIO] = None,
    line: Optional[str] = None
) -> None

View Source on GitHub

class ErrorMode(enum.Enum):

View Source on GitHub

Enum for handling errors consistently

pass one of the instances of this enum to a function to specify how to handle a certain kind of exception.

That function then instead of raiseing or warnings.warning, calls error_mode.process with the message and the exception.

def process

(
    self,
    msg: str,
    except_cls: Type[Exception] = <class 'ValueError'>,
    warn_cls: Type[Warning] = <class 'UserWarning'>,
    except_from: Optional[Exception] = None,
    warn_func: muutils.errormode.WarningFunc | None = None,
    log_func: Optional[Callable[[str], NoneType]] = None
)

View Source on GitHub

process an exception or warning according to the error mode

Parameters:

Raises:

def from_any

(
    cls,
    mode: str | muutils.errormode.ErrorMode,
    allow_aliases: bool = True,
    allow_prefix: bool = True
) -> muutils.errormode.ErrorMode

View Source on GitHub

initialize an ErrorMode from a string or an ErrorMode instance

def serialize

(self) -> str

View Source on GitHub

def load

(cls, data: str) -> muutils.errormode.ErrorMode

View Source on GitHub

Inherited Members

map of string aliases to ErrorMode instances

docs for muutils v0.8.7

Contents

group items by assuming that eq_func defines an equivalence relation

API Documentation

View Source on GitHub

muutils.group_equiv

group items by assuming that eq_func defines an equivalence relation

View Source on GitHub

def group_by_equivalence

(
    items_in: Sequence[~T],
    eq_func: Callable[[~T, ~T], bool]
) -> list[list[~T]]

View Source on GitHub

group items by assuming that eq_func implies an equivalence relation but might not be transitive

so, if f(a,b) and f(b,c) then f(a,c) might be false, but we still want to put [a,b,c] in the same class

note that lists are used to avoid the need for hashable items, and to allow for duplicates

Arguments

docs for muutils v0.8.7

Contents

represents a mathematical Interval over the real numbers

API Documentation

View Source on GitHub

muutils.interval

represents a mathematical Interval over the real numbers

View Source on GitHub

class Interval:

View Source on GitHub

Represents a mathematical interval, open by default.

The Interval class can represent both open and closed intervals, as well as half-open intervals. It supports various initialization methods and provides containment checks.

Examples:

>>> i1 = Interval(1, 5)  # Default open interval (1, 5)
>>> 3 in i1
True
>>> 1 in i1
False
>>> i2 = Interval([1, 5])  # Closed interval [1, 5]
>>> 1 in i2
True
>>> i3 = Interval(1, 5, closed_L=True)  # Half-open interval [1, 5)
>>> str(i3)
'[1, 5)'
>>> i4 = ClosedInterval(1, 5)  # Closed interval [1, 5]
>>> i5 = OpenInterval(1, 5)  # Open interval (1, 5)

Interval

(
    *args: Union[Sequence[Union[float, int]], float, int],
    is_closed: Optional[bool] = None,
    closed_L: Optional[bool] = None,
    closed_R: Optional[bool] = None
)

View Source on GitHub

View Source on GitHub

View Source on GitHub

View Source on GitHub

View Source on GitHub

View Source on GitHub

View Source on GitHub

View Source on GitHub

def get_empty

() -> muutils.interval.Interval

View Source on GitHub

def get_singleton

(value: Union[float, int]) -> muutils.interval.Interval

View Source on GitHub

def numerical_contained

(self, item: Union[float, int]) -> bool

View Source on GitHub

def interval_contained

(self, item: muutils.interval.Interval) -> bool

View Source on GitHub

def from_str

(cls, input_str: str) -> muutils.interval.Interval

View Source on GitHub

def copy

(self) -> muutils.interval.Interval

View Source on GitHub

def size

(self) -> float

View Source on GitHub

Returns the size of the interval.

Returns:

def clamp

(self, value: Union[int, float], epsilon: float = 1e-10) -> float

View Source on GitHub

Clamp the given value to the interval bounds.

For open bounds, the clamped value will be slightly inside the interval (by epsilon).

Parameters:

Returns:

Raises:

def intersection

(self, other: muutils.interval.Interval) -> muutils.interval.Interval

View Source on GitHub

def union

(self, other: muutils.interval.Interval) -> muutils.interval.Interval

View Source on GitHub

class ClosedInterval(Interval):

View Source on GitHub

Represents a mathematical interval, open by default.

The Interval class can represent both open and closed intervals, as well as half-open intervals. It supports various initialization methods and provides containment checks.

Examples:

>>> i1 = Interval(1, 5)  # Default open interval (1, 5)
>>> 3 in i1
True
>>> 1 in i1
False
>>> i2 = Interval([1, 5])  # Closed interval [1, 5]
>>> 1 in i2
True
>>> i3 = Interval(1, 5, closed_L=True)  # Half-open interval [1, 5)
>>> str(i3)
'[1, 5)'
>>> i4 = ClosedInterval(1, 5)  # Closed interval [1, 5]
>>> i5 = OpenInterval(1, 5)  # Open interval (1, 5)

ClosedInterval

(*args: Union[Sequence[float], float], **kwargs: Any)

View Source on GitHub

Inherited Members

class OpenInterval(Interval):

View Source on GitHub

Represents a mathematical interval, open by default.

The Interval class can represent both open and closed intervals, as well as half-open intervals. It supports various initialization methods and provides containment checks.

Examples:

>>> i1 = Interval(1, 5)  # Default open interval (1, 5)
>>> 3 in i1
True
>>> 1 in i1
False
>>> i2 = Interval([1, 5])  # Closed interval [1, 5]
>>> 1 in i2
True
>>> i3 = Interval(1, 5, closed_L=True)  # Half-open interval [1, 5)
>>> str(i3)
'[1, 5)'
>>> i4 = ClosedInterval(1, 5)  # Closed interval [1, 5]
>>> i5 = OpenInterval(1, 5)  # Open interval (1, 5)

OpenInterval

(*args: Union[Sequence[float], float], **kwargs: Any)

View Source on GitHub

Inherited Members

docs for muutils v0.8.7

Contents

submodule for serializing things to json in a recoverable way

you can throw any object into muutils.json_serialize.json_serialize and it will return a JSONitem, meaning a bool, int, float, str, None, list of JSONitems, or a dict mappting to JSONitem.

The goal of this is if you want to just be able to store something as relatively human-readable JSON, and don’t care as much about recovering it, you can throw it into json_serialize and it will just work. If you want to do so in a recoverable way, check out ZANJ.

it will do so by looking in DEFAULT_HANDLERS, which will keep it as-is if its already valid, then try to find a .serialize() method on the object, and then have a bunch of special cases. You can add handlers by initializing a JsonSerializer object and passing a sequence of them to handlers_pre

additionally, SerializeableDataclass is a special kind of dataclass where you specify how to serialize each field, and a .serialize() method is automatically added to the class. This is done by using the serializable_dataclass decorator, inheriting from SerializeableDataclass, and serializable_field in place of dataclasses.field when defining non-standard fields.

This module plays nicely with and is a dependency of the ZANJ library, which extends this to support saving things to disk in a more efficient way than just plain json (arrays are saved as npy files, for example), and automatically detecting how to load saved objects into their original classes.

Submodules

API Documentation

View Source on GitHub

muutils.json_serialize

submodule for serializing things to json in a recoverable way

you can throw any object into <a href="json_serialize/json_serialize.html">muutils.json_serialize.json_serialize</a> and it will return a JSONitem, meaning a bool, int, float, str, None, list of JSONitems, or a dict mappting to JSONitem.

The goal of this is if you want to just be able to store something as relatively human-readable JSON, and don’t care as much about recovering it, you can throw it into json_serialize and it will just work. If you want to do so in a recoverable way, check out ZANJ.

it will do so by looking in DEFAULT_HANDLERS, which will keep it as-is if its already valid, then try to find a .serialize() method on the object, and then have a bunch of special cases. You can add handlers by initializing a JsonSerializer object and passing a sequence of them to handlers_pre

additionally, SerializeableDataclass is a special kind of dataclass where you specify how to serialize each field, and a .serialize() method is automatically added to the class. This is done by using the serializable_dataclass decorator, inheriting from SerializeableDataclass, and serializable_field in place of dataclasses.field when defining non-standard fields.

This module plays nicely with and is a dependency of the ZANJ library, which extends this to support saving things to disk in a more efficient way than just plain json (arrays are saved as npy files, for example), and automatically detecting how to load saved objects into their original classes.

View Source on GitHub

def json_serialize

(
    obj: Any,
    path: tuple[typing.Union[str, int], ...] = ()
) -> Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]

View Source on GitHub

serialize object to json-serializable object with default config

def serializable_dataclass

(
    _cls=None,
    *,
    init: bool = True,
    repr: bool = True,
    eq: bool = True,
    order: bool = False,
    unsafe_hash: bool = False,
    frozen: bool = False,
    properties_to_serialize: Optional[list[str]] = None,
    register_handler: bool = True,
    on_typecheck_error: muutils.errormode.ErrorMode = ErrorMode.Except,
    on_typecheck_mismatch: muutils.errormode.ErrorMode = ErrorMode.Warn,
    methods_no_override: list[str] | None = None,
    **kwargs
)

View Source on GitHub

decorator to make a dataclass serializable. must also make it inherit from SerializableDataclass!!

types will be validated (like pydantic) unless on_typecheck_mismatch is set to ErrorMode.IGNORE

behavior of most kwargs matches that of dataclasses.dataclass, but with some additional kwargs. any kwargs not listed here are passed to dataclasses.dataclass

Returns the same class as was passed in, with dunder methods added based on the fields defined in the class.

Examines PEP 526 __annotations__ to determine fields.

If init is true, an __init__() method is added to the class. If repr is true, a __repr__() method is added. If order is true, rich comparison dunder methods are added. If unsafe_hash is true, a __hash__() method function is added. If frozen is true, fields may not be assigned to after instance creation.

@serializable_dataclass(kw_only=True)
class Myclass(SerializableDataclass):
    a: int
    b: str
>>> Myclass(a=1, b="q").serialize()
{_FORMAT_KEY: 'Myclass(SerializableDataclass)', 'a': 1, 'b': 'q'}

Parameters:

Returns:

Raises:

def serializable_field

(
    *_args,
    default: Union[Any, dataclasses._MISSING_TYPE] = <dataclasses._MISSING_TYPE object>,
    default_factory: Union[Any, dataclasses._MISSING_TYPE] = <dataclasses._MISSING_TYPE object>,
    init: bool = True,
    repr: bool = True,
    hash: Optional[bool] = None,
    compare: bool = True,
    metadata: Optional[mappingproxy] = None,
    kw_only: Union[bool, dataclasses._MISSING_TYPE] = <dataclasses._MISSING_TYPE object>,
    serialize: bool = True,
    serialization_fn: Optional[Callable[[Any], Any]] = None,
    deserialize_fn: Optional[Callable[[Any], Any]] = None,
    assert_type: bool = True,
    custom_typecheck_fn: Optional[Callable[[type], bool]] = None,
    **kwargs: Any
) -> Any

View Source on GitHub

Create a new SerializableField

default: Sfield_T | dataclasses._MISSING_TYPE = dataclasses.MISSING,
default_factory: Callable[[], Sfield_T]
| dataclasses._MISSING_TYPE = dataclasses.MISSING,
init: bool = True,
repr: bool = True,
hash: Optional[bool] = None,
compare: bool = True,
metadata: types.MappingProxyType | None = None,
kw_only: bool | dataclasses._MISSING_TYPE = dataclasses.MISSING,
### ----------------------------------------------------------------------
### new in `SerializableField`, not in `dataclasses.Field`
serialize: bool = True,
serialization_fn: Optional[Callable[[Any], Any]] = None,
loading_fn: Optional[Callable[[Any], Any]] = None,
deserialize_fn: Optional[Callable[[Any], Any]] = None,
assert_type: bool = True,
custom_typecheck_fn: Optional[Callable[[type], bool]] = None,

new Parameters:

Gotchas:

class MyClass:
    my_field: int = serializable_field(
        serialization_fn=lambda x: str(x),
        loading_fn=lambda x["my_field"]: int(x)
    )

using deserialize_fn instead:

class MyClass:
    my_field: int = serializable_field(
        serialization_fn=lambda x: str(x),
        deserialize_fn=lambda x: int(x)
    )

In the above code, my_field is an int but will be serialized as a string.

note that if not using ZANJ, and you have a class inside a container, you MUST provide serialization_fn and loading_fn to serialize and load the container. ZANJ will automatically do this for you.

TODO: custom_value_check_fn: function taking the value of the field and returning whether the value itself is valid. if not provided, any value is valid as long as it passes the type test

def arr_metadata

(arr) -> dict[str, list[int] | str | int]

View Source on GitHub

get metadata for a numpy array

def load_array

(
    arr: Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]],
    array_mode: Optional[Literal['list', 'array_list_meta', 'array_hex_meta', 'array_b64_meta', 'external', 'zero_dim']] = None
) -> Any

View Source on GitHub

load a json-serialized array, infer the mode if not specified

class JsonSerializer:

View Source on GitHub

Json serialization class (holds configs)

Parameters:

Raises:

JsonSerializer

(
    *args,
    array_mode: Literal['list', 'array_list_meta', 'array_hex_meta', 'array_b64_meta', 'external', 'zero_dim'] = 'array_list_meta',
    error_mode: muutils.errormode.ErrorMode = ErrorMode.Except,
    handlers_pre: None = (),
    handlers_default: None = (SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='base types', desc='base types (bool, int, float, str, None)'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='dictionaries', desc='dictionaries'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='(list, tuple) -> list', desc='lists and tuples as lists'), SerializerHandler(check=<function <lambda>>, serialize_func=<function _serialize_override_serialize_func>, uid='.serialize override', desc='objects with .serialize method'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='namedtuple -> dict', desc='namedtuples as dicts'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='dataclass -> dict', desc='dataclasses as dicts'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='path -> str', desc='Path objects as posix strings'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='obj -> str(obj)', desc='directly serialize objects in `SERIALIZE_DIRECT_AS_STR` to strings'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='numpy.ndarray', desc='numpy arrays'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='torch.Tensor', desc='pytorch tensors'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='pandas.DataFrame', desc='pandas DataFrames'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='(set, list, tuple, Iterable) -> list', desc='sets, lists, tuples, and Iterables as lists'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='fallback', desc='fallback handler -- serialize object attributes and special functions as strings')),
    write_only_format: bool = False
)

View Source on GitHub

def json_serialize

(
    self,
    obj: Any,
    path: tuple[typing.Union[str, int], ...] = ()
) -> Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]

View Source on GitHub

def hashify

(
    self,
    obj: Any,
    path: tuple[typing.Union[str, int], ...] = (),
    force: bool = True
) -> Union[bool, int, float, str, tuple]

View Source on GitHub

try to turn any object into something hashable

def try_catch

(func: Callable)

View Source on GitHub

wraps the function to catch exceptions, returns serialized error message on exception

returned func will return normal result on success, or error message on exception

def dc_eq

(
    dc1,
    dc2,
    except_when_class_mismatch: bool = False,
    false_when_class_mismatch: bool = True,
    except_when_field_mismatch: bool = False
) -> bool

View Source on GitHub

checks if two dataclasses which (might) hold numpy arrays are equal

Parameters:

Returns:

Raises:

TODO: after “except when class mismatch” is False, shouldn’t we then go to “field keys match”?

          [START]
             ▼
       ┌───────────┐  ┌─────────┐
       │dc1 is dc2?├─►│ classes │
       └──┬────────┘No│ match?  │
  ────    │           ├─────────┤
 (True)◄──┘Yes        │No       │Yes
  ────                ▼         ▼
      ┌────────────────┐ ┌────────────┐
      │ except when    │ │ fields keys│
      │ class mismatch?│ │ match?     │
      ├───────────┬────┘ ├───────┬────┘
      │Yes        │No    │No     │Yes
      ▼           ▼      ▼       ▼
 ───────────  ┌──────────┐  ┌────────┐
{ raise     } │ except   │  │ field  │
{ TypeError } │ when     │  │ values │
 ───────────  │ field    │  │ match? │
              │ mismatch?│  ├────┬───┘
              ├───────┬──┘  │    │Yes
              │Yes    │No   │No  ▼
              ▼       ▼     │   ────
 ───────────────     ─────  │  (True)
{ raise         }   (False)◄┘   ────
{ AttributeError}    ─────
 ───────────────

class SerializableDataclass(abc.ABC):

View Source on GitHub

Base class for serializable dataclasses

only for linting and type checking, still need to call serializable_dataclass decorator

Usage:

@serializable_dataclass
class MyClass(SerializableDataclass):
    a: int
    b: str

and then you can call my_obj.serialize() to get a dict that can be serialized to json. So, you can do:

>>> my_obj = MyClass(a=1, b="q")
>>> s = json.dumps(my_obj.serialize())
>>> s
'{_FORMAT_KEY: "MyClass(SerializableDataclass)", "a": 1, "b": "q"}'
>>> read_obj = MyClass.load(json.loads(s))
>>> read_obj == my_obj
True

This isn’t too impressive on its own, but it gets more useful when you have nested classses, or fields that are not json-serializable by default:

@serializable_dataclass
class NestedClass(SerializableDataclass):
    x: str
    y: MyClass
    act_fun: torch.nn.Module = serializable_field(
        default=torch.nn.ReLU(),
        serialization_fn=lambda x: str(x),
        deserialize_fn=lambda x: getattr(torch.nn, x)(),
    )

which gives us:

>>> nc = NestedClass(x="q", y=MyClass(a=1, b="q"), act_fun=torch.nn.Sigmoid())
>>> s = json.dumps(nc.serialize())
>>> s
'{_FORMAT_KEY: "NestedClass(SerializableDataclass)", "x": "q", "y": {_FORMAT_KEY: "MyClass(SerializableDataclass)", "a": 1, "b": "q"}, "act_fun": "Sigmoid"}'
>>> read_nc = NestedClass.load(json.loads(s))
>>> read_nc == nc
True

def serialize

(self) -> dict[str, typing.Any]

View Source on GitHub

returns the class as a dict, implemented by using @serializable_dataclass decorator

def load

(cls: Type[~T], data: Union[dict[str, Any], ~T]) -> ~T

View Source on GitHub

takes in an appropriately structured dict and returns an instance of the class, implemented by using @serializable_dataclass decorator

def validate_fields_types

(
    self,
    on_typecheck_error: muutils.errormode.ErrorMode = ErrorMode.Except
) -> bool

View Source on GitHub

validate the types of all the fields on a SerializableDataclass. calls SerializableDataclass__validate_field_type for each field

def validate_field_type

(
    self,
    field: muutils.json_serialize.serializable_field.SerializableField | str,
    on_typecheck_error: muutils.errormode.ErrorMode = ErrorMode.Except
) -> bool

View Source on GitHub

given a dataclass, check the field matches the type hint

def diff

(
    self,
    other: muutils.json_serialize.serializable_dataclass.SerializableDataclass,
    of_serialized: bool = False
) -> dict[str, typing.Any]

View Source on GitHub

get a rich and recursive diff between two instances of a serializable dataclass

>>> Myclass(a=1, b=2).diff(Myclass(a=1, b=3))
{'b': {'self': 2, 'other': 3}}
>>> NestedClass(x="q1", y=Myclass(a=1, b=2)).diff(NestedClass(x="q2", y=Myclass(a=1, b=3)))
{'x': {'self': 'q1', 'other': 'q2'}, 'y': {'b': {'self': 2, 'other': 3}}}

Parameters:

Returns:

Raises:

def update_from_nested_dict

(self, nested_dict: dict[str, typing.Any])

View Source on GitHub

update the instance from a nested dict, useful for configuration from command line args

Parameters:

- `nested_dict : dict[str, Any]`
    nested dict to update the instance with

docs for muutils v0.8.7

Contents

this utilities module handles serialization and loading of numpy and torch arrays as json

API Documentation

View Source on GitHub

muutils.json_serialize.array

this utilities module handles serialization and loading of numpy and torch arrays as json

View Source on GitHub

def array_n_elements

(arr) -> int

View Source on GitHub

get the number of elements in an array

def arr_metadata

(arr) -> dict[str, list[int] | str | int]

View Source on GitHub

get metadata for a numpy array

def serialize_array

(
    jser: "'JsonSerializer'",
    arr: numpy.ndarray,
    path: Union[str, Sequence[str | int]],
    array_mode: Optional[Literal['list', 'array_list_meta', 'array_hex_meta', 'array_b64_meta', 'external', 'zero_dim']] = None
) -> Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]

View Source on GitHub

serialize a numpy or pytorch array in one of several modes

if the object is zero-dimensional, simply get the unique item

array_mode: ArrayMode can be one of: - list: serialize as a list of values, no metadata (equivalent to arr.tolist()) - array_list_meta: serialize dict with metadata, actual list under the key data - array_hex_meta: serialize dict with metadata, actual hex string under the key data - array_b64_meta: serialize dict with metadata, actual base64 string under the key data

for array_list_meta, array_hex_meta, and array_b64_meta, the serialized object is:

{
    _FORMAT_KEY: <array_list_meta|array_hex_meta>,
    "shape": arr.shape,
    "dtype": str(arr.dtype),
    "data": <arr.tolist()|arr.tobytes().hex()|base64.b64encode(arr.tobytes()).decode()>,
}

Parameters:

Returns:

Raises:

def infer_array_mode

(
    arr: Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]
) -> Literal['list', 'array_list_meta', 'array_hex_meta', 'array_b64_meta', 'external', 'zero_dim']

View Source on GitHub

given a serialized array, infer the mode

assumes the array was serialized via serialize_array()

def load_array

(
    arr: Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]],
    array_mode: Optional[Literal['list', 'array_list_meta', 'array_hex_meta', 'array_b64_meta', 'external', 'zero_dim']] = None
) -> Any

View Source on GitHub

load a json-serialized array, infer the mode if not specified

docs for muutils v0.8.7

Contents

provides the basic framework for json serialization of objects

notably:

API Documentation

View Source on GitHub

muutils.json_serialize.json_serialize

provides the basic framework for json serialization of objects

notably:

View Source on GitHub

class SerializerHandler:

View Source on GitHub

a handler for a specific type of object

Parameters:

- `check : Callable[[JsonSerializer, Any], bool]` takes a JsonSerializer and an object, returns whether to use this handler
- `serialize : Callable[[JsonSerializer, Any, ObjectPath], JSONitem]` takes a JsonSerializer, an object, and the current path, returns the serialized object
- `desc : str` description of the handler (optional)

SerializerHandler

(
    check: Callable[[muutils.json_serialize.json_serialize.JsonSerializer, Any, tuple[Union[str, int], ...]], bool],
    serialize_func: Callable[[muutils.json_serialize.json_serialize.JsonSerializer, Any, tuple[Union[str, int], ...]], Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]],
    uid: str,
    desc: str
)

def serialize

(self) -> dict

View Source on GitHub

serialize the handler info

class JsonSerializer:

View Source on GitHub

Json serialization class (holds configs)

Parameters:

Raises:

JsonSerializer

(
    *args,
    array_mode: Literal['list', 'array_list_meta', 'array_hex_meta', 'array_b64_meta', 'external', 'zero_dim'] = 'array_list_meta',
    error_mode: muutils.errormode.ErrorMode = ErrorMode.Except,
    handlers_pre: None = (),
    handlers_default: None = (SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='base types', desc='base types (bool, int, float, str, None)'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='dictionaries', desc='dictionaries'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='(list, tuple) -> list', desc='lists and tuples as lists'), SerializerHandler(check=<function <lambda>>, serialize_func=<function _serialize_override_serialize_func>, uid='.serialize override', desc='objects with .serialize method'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='namedtuple -> dict', desc='namedtuples as dicts'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='dataclass -> dict', desc='dataclasses as dicts'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='path -> str', desc='Path objects as posix strings'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='obj -> str(obj)', desc='directly serialize objects in `SERIALIZE_DIRECT_AS_STR` to strings'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='numpy.ndarray', desc='numpy arrays'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='torch.Tensor', desc='pytorch tensors'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='pandas.DataFrame', desc='pandas DataFrames'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='(set, list, tuple, Iterable) -> list', desc='sets, lists, tuples, and Iterables as lists'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='fallback', desc='fallback handler -- serialize object attributes and special functions as strings')),
    write_only_format: bool = False
)

View Source on GitHub

def json_serialize

(
    self,
    obj: Any,
    path: tuple[typing.Union[str, int], ...] = ()
) -> Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]

View Source on GitHub

def hashify

(
    self,
    obj: Any,
    path: tuple[typing.Union[str, int], ...] = (),
    force: bool = True
) -> Union[bool, int, float, str, tuple]

View Source on GitHub

try to turn any object into something hashable

def json_serialize

(
    obj: Any,
    path: tuple[typing.Union[str, int], ...] = ()
) -> Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]

View Source on GitHub

serialize object to json-serializable object with default config

docs for muutils v0.8.7

Contents

save and load objects to and from json or compatible formats in a recoverable way

d = dataclasses.asdict(my_obj) will give you a dict, but if some fields are not json-serializable, you will get an error when you call json.dumps(d). This module provides a way around that.

Instead, you define your class:

@serializable_dataclass
class MyClass(SerializableDataclass):
    a: int
    b: str

and then you can call my_obj.serialize() to get a dict that can be serialized to json. So, you can do:

>>> my_obj = MyClass(a=1, b="q")
>>> s = json.dumps(my_obj.serialize())
>>> s
'{_FORMAT_KEY: "MyClass(SerializableDataclass)", "a": 1, "b": "q"}'
>>> read_obj = MyClass.load(json.loads(s))
>>> read_obj == my_obj
True

This isn’t too impressive on its own, but it gets more useful when you have nested classses, or fields that are not json-serializable by default:

@serializable_dataclass
class NestedClass(SerializableDataclass):
    x: str
    y: MyClass
    act_fun: torch.nn.Module = serializable_field(
        default=torch.nn.ReLU(),
        serialization_fn=lambda x: str(x),
        deserialize_fn=lambda x: getattr(torch.nn, x)(),
    )

which gives us:

>>> nc = NestedClass(x="q", y=MyClass(a=1, b="q"), act_fun=torch.nn.Sigmoid())
>>> s = json.dumps(nc.serialize())
>>> s
'{_FORMAT_KEY: "NestedClass(SerializableDataclass)", "x": "q", "y": {_FORMAT_KEY: "MyClass(SerializableDataclass)", "a": 1, "b": "q"}, "act_fun": "Sigmoid"}'
>>> read_nc = NestedClass.load(json.loads(s))
>>> read_nc == nc
True

API Documentation

View Source on GitHub

muutils.json_serialize.serializable_dataclass

save and load objects to and from json or compatible formats in a recoverable way

d = dataclasses.asdict(my_obj) will give you a dict, but if some fields are not json-serializable, you will get an error when you call json.dumps(d). This module provides a way around that.

Instead, you define your class:

@serializable_dataclass
class MyClass(SerializableDataclass):
    a: int
    b: str

and then you can call my_obj.serialize() to get a dict that can be serialized to json. So, you can do:

>>> my_obj = MyClass(a=1, b="q")
>>> s = json.dumps(my_obj.serialize())
>>> s
'{_FORMAT_KEY: "MyClass(SerializableDataclass)", "a": 1, "b": "q"}'
>>> read_obj = MyClass.load(json.loads(s))
>>> read_obj == my_obj
True

This isn’t too impressive on its own, but it gets more useful when you have nested classses, or fields that are not json-serializable by default:

@serializable_dataclass
class NestedClass(SerializableDataclass):
    x: str
    y: MyClass
    act_fun: torch.nn.Module = serializable_field(
        default=torch.nn.ReLU(),
        serialization_fn=lambda x: str(x),
        deserialize_fn=lambda x: getattr(torch.nn, x)(),
    )

which gives us:

>>> nc = NestedClass(x="q", y=MyClass(a=1, b="q"), act_fun=torch.nn.Sigmoid())
>>> s = json.dumps(nc.serialize())
>>> s
'{_FORMAT_KEY: "NestedClass(SerializableDataclass)", "x": "q", "y": {_FORMAT_KEY: "MyClass(SerializableDataclass)", "a": 1, "b": "q"}, "act_fun": "Sigmoid"}'
>>> read_nc = NestedClass.load(json.loads(s))
>>> read_nc == nc
True

View Source on GitHub

class CantGetTypeHintsWarning(builtins.UserWarning):

View Source on GitHub

special warning for when we can’t get type hints

Inherited Members

class ZanjMissingWarning(builtins.UserWarning):

View Source on GitHub

special warning for when ZANJ is missing – register_loader_serializable_dataclass will not work

Inherited Members

def zanj_register_loader_serializable_dataclass

(cls: Type[~T])

View Source on GitHub

Register a serializable dataclass with the ZANJ import

this allows ZANJ().read() to load the class and not just return plain dicts

TODO: there is some duplication here with register_loader_handler

class FieldIsNotInitOrSerializeWarning(builtins.UserWarning):

View Source on GitHub

Base class for warnings generated by user code.

Inherited Members

def SerializableDataclass__validate_field_type

(
    self: muutils.json_serialize.serializable_dataclass.SerializableDataclass,
    field: muutils.json_serialize.serializable_field.SerializableField | str,
    on_typecheck_error: muutils.errormode.ErrorMode = ErrorMode.Except
) -> bool

View Source on GitHub

given a dataclass, check the field matches the type hint

this function is written to <a href="#SerializableDataclass.validate_field_type">SerializableDataclass.validate_field_type</a>

Parameters:

Returns:

def SerializableDataclass__validate_fields_types__dict

(
    self: muutils.json_serialize.serializable_dataclass.SerializableDataclass,
    on_typecheck_error: muutils.errormode.ErrorMode = ErrorMode.Except
) -> dict[str, bool]

View Source on GitHub

validate the types of all the fields on a SerializableDataclass. calls SerializableDataclass__validate_field_type for each field

returns a dict of field names to bools, where the bool is if the field type is valid

def SerializableDataclass__validate_fields_types

(
    self: muutils.json_serialize.serializable_dataclass.SerializableDataclass,
    on_typecheck_error: muutils.errormode.ErrorMode = ErrorMode.Except
) -> bool

View Source on GitHub

validate the types of all the fields on a SerializableDataclass. calls SerializableDataclass__validate_field_type for each field

class SerializableDataclass(abc.ABC):

View Source on GitHub

Base class for serializable dataclasses

only for linting and type checking, still need to call serializable_dataclass decorator

Usage:

@serializable_dataclass
class MyClass(SerializableDataclass):
    a: int
    b: str

and then you can call my_obj.serialize() to get a dict that can be serialized to json. So, you can do:

>>> my_obj = MyClass(a=1, b="q")
>>> s = json.dumps(my_obj.serialize())
>>> s
'{_FORMAT_KEY: "MyClass(SerializableDataclass)", "a": 1, "b": "q"}'
>>> read_obj = MyClass.load(json.loads(s))
>>> read_obj == my_obj
True

This isn’t too impressive on its own, but it gets more useful when you have nested classses, or fields that are not json-serializable by default:

@serializable_dataclass
class NestedClass(SerializableDataclass):
    x: str
    y: MyClass
    act_fun: torch.nn.Module = serializable_field(
        default=torch.nn.ReLU(),
        serialization_fn=lambda x: str(x),
        deserialize_fn=lambda x: getattr(torch.nn, x)(),
    )

which gives us:

>>> nc = NestedClass(x="q", y=MyClass(a=1, b="q"), act_fun=torch.nn.Sigmoid())
>>> s = json.dumps(nc.serialize())
>>> s
'{_FORMAT_KEY: "NestedClass(SerializableDataclass)", "x": "q", "y": {_FORMAT_KEY: "MyClass(SerializableDataclass)", "a": 1, "b": "q"}, "act_fun": "Sigmoid"}'
>>> read_nc = NestedClass.load(json.loads(s))
>>> read_nc == nc
True

def serialize

(self) -> dict[str, typing.Any]

View Source on GitHub

returns the class as a dict, implemented by using @serializable_dataclass decorator

def load

(cls: Type[~T], data: Union[dict[str, Any], ~T]) -> ~T

View Source on GitHub

takes in an appropriately structured dict and returns an instance of the class, implemented by using @serializable_dataclass decorator

def validate_fields_types

(
    self,
    on_typecheck_error: muutils.errormode.ErrorMode = ErrorMode.Except
) -> bool

View Source on GitHub

validate the types of all the fields on a SerializableDataclass. calls SerializableDataclass__validate_field_type for each field

def validate_field_type

(
    self,
    field: muutils.json_serialize.serializable_field.SerializableField | str,
    on_typecheck_error: muutils.errormode.ErrorMode = ErrorMode.Except
) -> bool

View Source on GitHub

given a dataclass, check the field matches the type hint

def diff

(
    self,
    other: muutils.json_serialize.serializable_dataclass.SerializableDataclass,
    of_serialized: bool = False
) -> dict[str, typing.Any]

View Source on GitHub

get a rich and recursive diff between two instances of a serializable dataclass

>>> Myclass(a=1, b=2).diff(Myclass(a=1, b=3))
{'b': {'self': 2, 'other': 3}}
>>> NestedClass(x="q1", y=Myclass(a=1, b=2)).diff(NestedClass(x="q2", y=Myclass(a=1, b=3)))
{'x': {'self': 'q1', 'other': 'q2'}, 'y': {'b': {'self': 2, 'other': 3}}}

Parameters:

Returns:

Raises:

def update_from_nested_dict

(self, nested_dict: dict[str, typing.Any])

View Source on GitHub

update the instance from a nested dict, useful for configuration from command line args

Parameters:

- `nested_dict : dict[str, Any]`
    nested dict to update the instance with

def get_cls_type_hints_cached

(cls: Type[~T]) -> dict[str, typing.Any]

View Source on GitHub

cached typing.get_type_hints for a class

def get_cls_type_hints

(cls: Type[~T]) -> dict[str, typing.Any]

View Source on GitHub

helper function to get type hints for a class

class KWOnlyError(builtins.NotImplementedError):

View Source on GitHub

kw-only dataclasses are not supported in python <3.9

Inherited Members

class FieldError(builtins.ValueError):

View Source on GitHub

base class for field errors

Inherited Members

class NotSerializableFieldException(FieldError):

View Source on GitHub

field is not a SerializableField

Inherited Members

class FieldSerializationError(FieldError):

View Source on GitHub

error while serializing a field

Inherited Members

class FieldLoadingError(FieldError):

View Source on GitHub

error while loading a field

Inherited Members

class FieldTypeMismatchError(FieldError, builtins.TypeError):

View Source on GitHub

error when a field type does not match the type hint

Inherited Members

def serializable_dataclass

(
    _cls=None,
    *,
    init: bool = True,
    repr: bool = True,
    eq: bool = True,
    order: bool = False,
    unsafe_hash: bool = False,
    frozen: bool = False,
    properties_to_serialize: Optional[list[str]] = None,
    register_handler: bool = True,
    on_typecheck_error: muutils.errormode.ErrorMode = ErrorMode.Except,
    on_typecheck_mismatch: muutils.errormode.ErrorMode = ErrorMode.Warn,
    methods_no_override: list[str] | None = None,
    **kwargs
)

View Source on GitHub

decorator to make a dataclass serializable. must also make it inherit from SerializableDataclass!!

types will be validated (like pydantic) unless on_typecheck_mismatch is set to ErrorMode.IGNORE

behavior of most kwargs matches that of dataclasses.dataclass, but with some additional kwargs. any kwargs not listed here are passed to dataclasses.dataclass

Returns the same class as was passed in, with dunder methods added based on the fields defined in the class.

Examines PEP 526 __annotations__ to determine fields.

If init is true, an __init__() method is added to the class. If repr is true, a __repr__() method is added. If order is true, rich comparison dunder methods are added. If unsafe_hash is true, a __hash__() method function is added. If frozen is true, fields may not be assigned to after instance creation.

@serializable_dataclass(kw_only=True)
class Myclass(SerializableDataclass):
    a: int
    b: str
>>> Myclass(a=1, b="q").serialize()
{_FORMAT_KEY: 'Myclass(SerializableDataclass)', 'a': 1, 'b': 'q'}

Parameters:

Returns:

Raises:

docs for muutils v0.8.7

Contents

extends dataclasses.Field for use with SerializableDataclass

In particular, instead of using dataclasses.field, use serializable_field to define fields in a SerializableDataclass. You provide information on how the field should be serialized and loaded (as well as anything that goes into dataclasses.field) when you define the field, and the SerializableDataclass will automatically use those functions.

API Documentation

View Source on GitHub

muutils.json_serialize.serializable_field

extends dataclasses.Field for use with SerializableDataclass

In particular, instead of using dataclasses.field, use serializable_field to define fields in a SerializableDataclass. You provide information on how the field should be serialized and loaded (as well as anything that goes into dataclasses.field) when you define the field, and the SerializableDataclass will automatically use those functions.

View Source on GitHub

class SerializableField(dataclasses.Field):

View Source on GitHub

extension of dataclasses.Field with additional serialization properties

SerializableField

(
    default: Union[Any, dataclasses._MISSING_TYPE] = <dataclasses._MISSING_TYPE object>,
    default_factory: Union[Callable[[], Any], dataclasses._MISSING_TYPE] = <dataclasses._MISSING_TYPE object>,
    init: bool = True,
    repr: bool = True,
    hash: Optional[bool] = None,
    compare: bool = True,
    metadata: Optional[mappingproxy] = None,
    kw_only: Union[bool, dataclasses._MISSING_TYPE] = <dataclasses._MISSING_TYPE object>,
    serialize: bool = True,
    serialization_fn: Optional[Callable[[Any], Any]] = None,
    loading_fn: Optional[Callable[[Any], Any]] = None,
    deserialize_fn: Optional[Callable[[Any], Any]] = None,
    assert_type: bool = True,
    custom_typecheck_fn: Optional[Callable[[<member 'type' of 'SerializableField' objects>], bool]] = None
)

View Source on GitHub

def from_Field

(
    cls,
    field: dataclasses.Field
) -> muutils.json_serialize.serializable_field.SerializableField

View Source on GitHub

copy all values from a dataclasses.Field to new SerializableField

def serializable_field

(
    *_args,
    default: Union[Any, dataclasses._MISSING_TYPE] = <dataclasses._MISSING_TYPE object>,
    default_factory: Union[Any, dataclasses._MISSING_TYPE] = <dataclasses._MISSING_TYPE object>,
    init: bool = True,
    repr: bool = True,
    hash: Optional[bool] = None,
    compare: bool = True,
    metadata: Optional[mappingproxy] = None,
    kw_only: Union[bool, dataclasses._MISSING_TYPE] = <dataclasses._MISSING_TYPE object>,
    serialize: bool = True,
    serialization_fn: Optional[Callable[[Any], Any]] = None,
    deserialize_fn: Optional[Callable[[Any], Any]] = None,
    assert_type: bool = True,
    custom_typecheck_fn: Optional[Callable[[type], bool]] = None,
    **kwargs: Any
) -> Any

View Source on GitHub

Create a new SerializableField

default: Sfield_T | dataclasses._MISSING_TYPE = dataclasses.MISSING,
default_factory: Callable[[], Sfield_T]
| dataclasses._MISSING_TYPE = dataclasses.MISSING,
init: bool = True,
repr: bool = True,
hash: Optional[bool] = None,
compare: bool = True,
metadata: types.MappingProxyType | None = None,
kw_only: bool | dataclasses._MISSING_TYPE = dataclasses.MISSING,
### ----------------------------------------------------------------------
### new in `SerializableField`, not in `dataclasses.Field`
serialize: bool = True,
serialization_fn: Optional[Callable[[Any], Any]] = None,
loading_fn: Optional[Callable[[Any], Any]] = None,
deserialize_fn: Optional[Callable[[Any], Any]] = None,
assert_type: bool = True,
custom_typecheck_fn: Optional[Callable[[type], bool]] = None,

new Parameters:

Gotchas:

class MyClass:
    my_field: int = serializable_field(
        serialization_fn=lambda x: str(x),
        loading_fn=lambda x["my_field"]: int(x)
    )

using deserialize_fn instead:

class MyClass:
    my_field: int = serializable_field(
        serialization_fn=lambda x: str(x),
        deserialize_fn=lambda x: int(x)
    )

In the above code, my_field is an int but will be serialized as a string.

note that if not using ZANJ, and you have a class inside a container, you MUST provide serialization_fn and loading_fn to serialize and load the container. ZANJ will automatically do this for you.

TODO: custom_value_check_fn: function taking the value of the field and returning whether the value itself is valid. if not provided, any value is valid as long as it passes the type test

docs for muutils v0.8.7

Contents

utilities for json_serialize

API Documentation

View Source on GitHub

muutils.json_serialize.util

utilities for json_serialize

View Source on GitHub

class UniversalContainer:

View Source on GitHub

contains everything – x in UniversalContainer() is always True

def isinstance_namedtuple

(x: Any) -> bool

View Source on GitHub

checks if x is a namedtuple

credit to https://stackoverflow.com/questions/2166818/how-to-check-if-an-object-is-an-instance-of-a-namedtuple

def try_catch

(func: Callable)

View Source on GitHub

wraps the function to catch exceptions, returns serialized error message on exception

returned func will return normal result on success, or error message on exception

class SerializationException(builtins.Exception):

View Source on GitHub

Common base class for all non-exit exceptions.

Inherited Members

def string_as_lines

(s: str | None) -> list[str]

View Source on GitHub

for easier reading of long strings in json, split up by newlines

sort of like how jupyter notebooks do it

def safe_getsource

(func) -> list[str]

View Source on GitHub

def array_safe_eq

(a: Any, b: Any) -> bool

View Source on GitHub

check if two objects are equal, account for if numpy arrays or torch tensors

def dc_eq

(
    dc1,
    dc2,
    except_when_class_mismatch: bool = False,
    false_when_class_mismatch: bool = True,
    except_when_field_mismatch: bool = False
) -> bool

View Source on GitHub

checks if two dataclasses which (might) hold numpy arrays are equal

Parameters:

Returns:

Raises:

TODO: after “except when class mismatch” is False, shouldn’t we then go to “field keys match”?

          [START]
             ▼
       ┌───────────┐  ┌─────────┐
       │dc1 is dc2?├─►│ classes │
       └──┬────────┘No│ match?  │
  ────    │           ├─────────┤
 (True)◄──┘Yes        │No       │Yes
  ────                ▼         ▼
      ┌────────────────┐ ┌────────────┐
      │ except when    │ │ fields keys│
      │ class mismatch?│ │ match?     │
      ├───────────┬────┘ ├───────┬────┘
      │Yes        │No    │No     │Yes
      ▼           ▼      ▼       ▼
 ───────────  ┌──────────┐  ┌────────┐
{ raise     } │ except   │  │ field  │
{ TypeError } │ when     │  │ values │
 ───────────  │ field    │  │ match? │
              │ mismatch?│  ├────┬───┘
              ├───────┬──┘  │    │Yes
              │Yes    │No   │No  ▼
              ▼       ▼     │   ────
 ───────────────     ─────  │  (True)
{ raise         }   (False)◄┘   ────
{ AttributeError}    ─────
 ───────────────

class MonoTuple:

View Source on GitHub

tuple type hint, but for a tuple of any length with all the same type

docs for muutils v0.8.7

Contents

utilities for reading and writing jsonlines files, including gzip support

API Documentation

View Source on GitHub

muutils.jsonlines

utilities for reading and writing jsonlines files, including gzip support

View Source on GitHub

def jsonl_load

(
    path: str,
    /,
    *,
    use_gzip: bool | None = None
) -> list[typing.Union[bool, int, float, str, NoneType, typing.List[typing.Union[bool, int, float, str, NoneType, typing.List[typing.Any], typing.Dict[str, typing.Any]]], typing.Dict[str, typing.Union[bool, int, float, str, NoneType, typing.List[typing.Any], typing.Dict[str, typing.Any]]]]]

View Source on GitHub

def jsonl_load_log

(path: str, /, *, use_gzip: bool | None = None) -> list[dict]

View Source on GitHub

def jsonl_write

(
    path: str,
    items: Sequence[Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]],
    use_gzip: bool | None = None,
    gzip_compresslevel: int = 2
) -> None

View Source on GitHub

docs for muutils v0.8.7

Contents

anonymous getitem class

util for constructing a class which has a getitem method which just calls a function

a lambda is an anonymous function: kappa is the letter before lambda in the greek alphabet, hence the name of this class

API Documentation

View Source on GitHub

muutils.kappa

anonymous getitem class

util for constructing a class which has a getitem method which just calls a function

a lambda is an anonymous function: kappa is the letter before lambda in the greek alphabet, hence the name of this class

View Source on GitHub

class Kappa(typing.Mapping[~_kappa_K, ~_kappa_V]):

View Source on GitHub

A Mapping is a generic container for associating key/value pairs.

This class provides concrete generic implementations of all methods except for getitem, iter, and len.

Kappa

(func_getitem: Callable[[~_kappa_K], ~_kappa_V])

View Source on GitHub

Inherited Members

docs for muutils v0.8.7

Contents

(deprecated) experimenting with logging utilities

Submodules

API Documentation

View Source on GitHub

muutils.logger

(deprecated) experimenting with logging utilities

View Source on GitHub

class Logger(muutils.logger.simplelogger.SimpleLogger):

View Source on GitHub

logger with more features, including log levels and streams

Parameters:

    - `log_path : str | None`
    default log file path
    (defaults to `None`)
    - `log_file : AnyIO | None`
    default log io, should have a `.write()` method (pass only this or `log_path`, not both)
    (defaults to `None`)
    - `timestamp : bool`
    whether to add timestamps to every log message (under the `_timestamp` key)
    (defaults to `True`)
    - `default_level : int`
    default log level for streams/messages that don't specify a level
    (defaults to `0`)
    - `console_print_threshold : int`
    log level at which to print to the console, anything greater will not be printed unless overridden by `console_print`
    (defaults to `50`)
    - `level_header : HeaderFunction`
    function for formatting log messages when printing to console
    (defaults to `HEADER_FUNCTIONS["md"]`)

Raises:

    - `ValueError` : _description_

Logger

(
    log_path: str | None = None,
    log_file: Union[TextIO, muutils.logger.simplelogger.NullIO, NoneType] = None,
    default_level: int = 0,
    console_print_threshold: int = 50,
    level_header: muutils.logger.headerfuncs.HeaderFunction = <function md_header_function>,
    streams: Union[dict[str | None, muutils.logger.loggingstream.LoggingStream], Sequence[muutils.logger.loggingstream.LoggingStream]] = (),
    keep_last_msg_time: bool = True,
    timestamp: bool = True,
    **kwargs
)

View Source on GitHub

def log

(
    self,
    msg: Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]] = None,
    lvl: int | None = None,
    stream: str | None = None,
    console_print: bool = False,
    extra_indent: str = '',
    **kwargs
)

View Source on GitHub

logging function

Parameters:

def log_elapsed_last

(
    self,
    lvl: int | None = None,
    stream: str | None = None,
    console_print: bool = True,
    **kwargs
) -> float

View Source on GitHub

logs the time elapsed since the last message was printed to the console (in any stream)

def flush_all

(self)

View Source on GitHub

flush all streams

class LoggingStream:

View Source on GitHub

properties of a logging stream

LoggingStream

(
    name: str | None,
    aliases: set[str | None] = <factory>,
    file: Union[str, bool, TextIO, muutils.logger.simplelogger.NullIO, NoneType] = None,
    default_level: int | None = None,
    default_contents: dict[str, typing.Callable[[], typing.Any]] = <factory>,
    handler: Union[TextIO, muutils.logger.simplelogger.NullIO, NoneType] = None
)

def make_handler

(self) -> Union[TextIO, muutils.logger.simplelogger.NullIO, NoneType]

View Source on GitHub

class SimpleLogger:

View Source on GitHub

logs training data to a jsonl file

SimpleLogger

(
    log_path: str | None = None,
    log_file: Union[TextIO, muutils.logger.simplelogger.NullIO, NoneType] = None,
    timestamp: bool = True
)

View Source on GitHub

def log

(
    self,
    msg: Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]],
    console_print: bool = False,
    **kwargs
)

View Source on GitHub

log a message to the log file, and optionally to the console

class TimerContext:

View Source on GitHub

context manager for timing code

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.logger.exception_context

View Source on GitHub

class ExceptionContext:

View Source on GitHub

context manager which catches all exceptions happening while the context is open, .write() the exception trace to the given stream, and then raises the exception

for example:

errorfile = open('error.log', 'w')

with ExceptionContext(errorfile):
        # do something that might throw an exception
        # if it does, the exception trace will be written to errorfile
        # and then the exception will be raised

ExceptionContext

(stream)

View Source on GitHub

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.logger.headerfuncs

View Source on GitHub

class HeaderFunction(typing.Protocol):

View Source on GitHub

Base class for protocol classes.

Protocol classes are defined as::

class Proto(Protocol):
    def meth(self) -> int:
        ...

Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing).

For example::

class C:
    def meth(self) -> int:
        return 0

def func(x: Proto) -> int:
    return x.meth()

func(C())  # Passes static type check

See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as::

class GenProto[T](Protocol):
    def meth(self) -> T:
        ...

HeaderFunction

(*args, **kwargs)

View Source on GitHub

def md_header_function

(
    msg: Any,
    lvl: int,
    stream: str | None = None,
    indent_lvl: str = '  ',
    extra_indent: str = '',
    **kwargs
) -> str

View Source on GitHub

standard header function. will output

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.logger.log_util

View Source on GitHub

def get_any_from_stream

(stream: list[dict], key: str) -> None

View Source on GitHub

get the first value of a key from a stream. errors if not found

def gather_log

(file: str) -> dict[str, list[dict]]

View Source on GitHub

gathers and sorts all streams from a log

def gather_stream

(file: str, stream: str) -> list[dict]

View Source on GitHub

gets all entries from a specific stream in a log file

def gather_val

(
    file: str,
    stream: str,
    keys: tuple[str],
    allow_skip: bool = True
) -> list[list]

View Source on GitHub

gather specific keys from a specific stream in a log file

example: if “log.jsonl” has contents:

{"a": 1, "b": 2, "c": 3, "_stream": "s1"}
{"a": 4, "b": 5, "c": 6, "_stream": "s1"}
{"a": 7, "b": 8, "c": 9, "_stream": "s2"}

then gather_val("log.jsonl", "s1", ("a", "b")) will return

[
    [1, 2],
    [4, 5]
]

docs for muutils v0.8.7

Contents

logger with streams & levels, and a timer context manager

API Documentation

View Source on GitHub

muutils.logger.logger

logger with streams & levels, and a timer context manager

View Source on GitHub

def decode_level

(level: int) -> str

View Source on GitHub

class Logger(muutils.logger.simplelogger.SimpleLogger):

View Source on GitHub

logger with more features, including log levels and streams

Parameters:

    - `log_path : str | None`
    default log file path
    (defaults to `None`)
    - `log_file : AnyIO | None`
    default log io, should have a `.write()` method (pass only this or `log_path`, not both)
    (defaults to `None`)
    - `timestamp : bool`
    whether to add timestamps to every log message (under the `_timestamp` key)
    (defaults to `True`)
    - `default_level : int`
    default log level for streams/messages that don't specify a level
    (defaults to `0`)
    - `console_print_threshold : int`
    log level at which to print to the console, anything greater will not be printed unless overridden by `console_print`
    (defaults to `50`)
    - `level_header : HeaderFunction`
    function for formatting log messages when printing to console
    (defaults to `HEADER_FUNCTIONS["md"]`)

Raises:

    - `ValueError` : _description_

Logger

(
    log_path: str | None = None,
    log_file: Union[TextIO, muutils.logger.simplelogger.NullIO, NoneType] = None,
    default_level: int = 0,
    console_print_threshold: int = 50,
    level_header: muutils.logger.headerfuncs.HeaderFunction = <function md_header_function>,
    streams: Union[dict[str | None, muutils.logger.loggingstream.LoggingStream], Sequence[muutils.logger.loggingstream.LoggingStream]] = (),
    keep_last_msg_time: bool = True,
    timestamp: bool = True,
    **kwargs
)

View Source on GitHub

def log

(
    self,
    msg: Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]] = None,
    lvl: int | None = None,
    stream: str | None = None,
    console_print: bool = False,
    extra_indent: str = '',
    **kwargs
)

View Source on GitHub

logging function

Parameters:

def log_elapsed_last

(
    self,
    lvl: int | None = None,
    stream: str | None = None,
    console_print: bool = True,
    **kwargs
) -> float

View Source on GitHub

logs the time elapsed since the last message was printed to the console (in any stream)

def flush_all

(self)

View Source on GitHub

flush all streams

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.logger.loggingstream

View Source on GitHub

class LoggingStream:

View Source on GitHub

properties of a logging stream

LoggingStream

(
    name: str | None,
    aliases: set[str | None] = <factory>,
    file: Union[str, bool, TextIO, muutils.logger.simplelogger.NullIO, NoneType] = None,
    default_level: int | None = None,
    default_contents: dict[str, typing.Callable[[], typing.Any]] = <factory>,
    handler: Union[TextIO, muutils.logger.simplelogger.NullIO, NoneType] = None
)

def make_handler

(self) -> Union[TextIO, muutils.logger.simplelogger.NullIO, NoneType]

View Source on GitHub

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.logger.simplelogger

View Source on GitHub

class NullIO:

View Source on GitHub

null IO class

def write

(self, msg: str) -> int

View Source on GitHub

write to nothing! this throws away the message

def flush

(self) -> None

View Source on GitHub

flush nothing! this is a no-op

def close

(self) -> None

View Source on GitHub

close nothing! this is a no-op

class SimpleLogger:

View Source on GitHub

logs training data to a jsonl file

SimpleLogger

(
    log_path: str | None = None,
    log_file: Union[TextIO, muutils.logger.simplelogger.NullIO, NoneType] = None,
    timestamp: bool = True
)

View Source on GitHub

def log

(
    self,
    msg: Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]],
    console_print: bool = False,
    **kwargs
)

View Source on GitHub

log a message to the log file, and optionally to the console

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.logger.timing

View Source on GitHub

class TimerContext:

View Source on GitHub

context manager for timing code

def filter_time_str

(time: str) -> str

View Source on GitHub

assuming format h:mm:ss, clips off the hours if its 0

class ProgressEstimator:

View Source on GitHub

estimates progress and can give a progress bar

ProgressEstimator

(
    n_total: int,
    pbar_fill: str = '█',
    pbar_empty: str = ' ',
    pbar_bounds: tuple[str, str] = ('|', '|')
)

View Source on GitHub

def get_timing_raw

(self, i: int) -> dict[str, float]

View Source on GitHub

returns dict(elapsed, per_iter, remaining, percent)

def get_pbar

(self, i: int, width: int = 30) -> str

View Source on GitHub

returns a progress bar

def get_progress_default

(self, i: int) -> str

View Source on GitHub

returns a progress string

docs for muutils v0.8.7

Contents

miscellaneous utilities

Submodules

API Documentation

View Source on GitHub

muutils.misc

miscellaneous utilities

View Source on GitHub

def stable_hash

(s: str | bytes) -> int

View Source on GitHub

Returns a stable hash of the given string. not cryptographically secure, but stable between runs

def empty_sequence_if_attr_false

(itr: Iterable[Any], attr_owner: Any, attr_name: str) -> Iterable[Any]

View Source on GitHub

Returns itr if attr_owner has the attribute attr_name and it boolean casts to True. Returns an empty sequence otherwise.

Particularly useful for optionally inserting delimiters into a sequence depending on an TokenizerElement attribute.

Parameters:

Returns:

def flatten

(it: Iterable[Any], levels_to_flatten: int | None = None) -> Generator

View Source on GitHub

Flattens an arbitrarily nested iterable. Flattens all iterable data types except for str and bytes.

Returns

Generator over the flattened sequence.

Parameters

def list_split

(lst: list, val: Any) -> list[list]

View Source on GitHub

split a list into sublists by val. similar to “a_b_c”.split(“_“)

>>> list_split([1,2,3,0,4,5,0,6], 0)
[[1, 2, 3], [4, 5], [6]]
>>> list_split([0,1,2,3], 0)
[[], [1, 2, 3]]
>>> list_split([1,2,3], 0)
[[1, 2, 3]]
>>> list_split([], 0)
[[]]

def list_join

(lst: list, factory: Callable) -> list

View Source on GitHub

add a new instance of factory() between each element of lst

>>> list_join([1,2,3], lambda : 0)
[1,0,2,0,3]
>>> list_join([1,2,3], lambda: [time.sleep(0.1), time.time()][1])
[1, 1600000000.0, 2, 1600000000.1, 3]

def apply_mapping

(
    mapping: Mapping[~_AM_K, ~_AM_V],
    iter: Iterable[~_AM_K],
    when_missing: Literal['except', 'skip', 'include'] = 'skip'
) -> list[typing.Union[~_AM_K, ~_AM_V]]

View Source on GitHub

Given an iterable and a mapping, apply the mapping to the iterable with certain options

Gotcha: if when_missing is invalid, this is totally fine until a missing key is actually encountered.

Note: you can use this with <a href="kappa.html#Kappa">muutils.kappa.Kappa</a> if you want to pass a function instead of a dict

Parameters:

Returns:

return type is one of: - list[_AM_V] if when_missing is "skip" or "except" - list[Union[_AM_K, _AM_V]] if when_missing is "include"

Raises:

def apply_mapping_chain

(
    mapping: Mapping[~_AM_K, Iterable[~_AM_V]],
    iter: Iterable[~_AM_K],
    when_missing: Literal['except', 'skip', 'include'] = 'skip'
) -> list[typing.Union[~_AM_K, ~_AM_V]]

View Source on GitHub

Given an iterable and a mapping, chain the mappings together

Gotcha: if when_missing is invalid, this is totally fine until a missing key is actually encountered.

Note: you can use this with <a href="kappa.html#Kappa">muutils.kappa.Kappa</a> if you want to pass a function instead of a dict

Parameters:

Returns:

return type is one of: - list[_AM_V] if when_missing is "skip" or "except" - list[Union[_AM_K, _AM_V]] if when_missing is "include"

Raises:

def sanitize_name

(
    name: str | None,
    additional_allowed_chars: str = '',
    replace_invalid: str = '',
    when_none: str | None = '_None_',
    leading_digit_prefix: str = ''
) -> str

View Source on GitHub

sanitize a string, leaving only alphanumerics and additional_allowed_chars

Parameters:

Returns:

def sanitize_fname

(fname: str | None, **kwargs) -> str

View Source on GitHub

sanitize a filename to posix standards

def sanitize_identifier

(fname: str | None, **kwargs) -> str

View Source on GitHub

sanitize an identifier (variable or function name)

def dict_to_filename

(
    data: dict,
    format_str: str = '{key}_{val}',
    separator: str = '.',
    max_length: int = 255
)

View Source on GitHub

def dynamic_docstring

(**doc_params)

View Source on GitHub

def shorten_numerical_to_str

(
    num: int | float,
    small_as_decimal: bool = True,
    precision: int = 1
) -> str

View Source on GitHub

shorten a large numerical value to a string 1234 -> 1K

precision guaranteed to 1 in 10, but can be higher. reverse of str_to_numeric

def str_to_numeric

(
    quantity: str,
    mapping: None | bool | dict[str, int | float] = True
) -> int | float

View Source on GitHub

Convert a string representing a quantity to a numeric value.

The string can represent an integer, python float, fraction, or shortened via shorten_numerical_to_str.

Examples:

>>> str_to_numeric("5")
5
>>> str_to_numeric("0.1")
0.1
>>> str_to_numeric("1/5")
0.2
>>> str_to_numeric("-1K")
-1000.0
>>> str_to_numeric("1.5M")
1500000.0
>>> str_to_numeric("1.2e2")
120.0

class FrozenDict(builtins.dict):

View Source on GitHub

Inherited Members

class FrozenList(builtins.list):

View Source on GitHub

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

def append

(self, value)

View Source on GitHub

Append object to the end of the list.

def extend

(self, iterable)

View Source on GitHub

Extend list by appending elements from the iterable.

def insert

(self, index, value)

View Source on GitHub

Insert object before index.

def remove

(self, value)

View Source on GitHub

Remove first occurrence of value.

Raises ValueError if the value is not present.

def pop

(self, index=-1)

View Source on GitHub

Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

def clear

(self)

View Source on GitHub

Remove all items from list.

Inherited Members

def freeze

(instance: Any) -> Any

View Source on GitHub

recursively freeze an object in-place so that its attributes and elements cannot be changed

messy in the sense that sometimes the object is modified in place, but you can’t rely on that. always use the return value.

the gelidum package is a more complete implementation of this idea

def is_abstract

(cls: type) -> bool

View Source on GitHub

Returns if a class is abstract.

def get_all_subclasses

(class_: type, include_self=False) -> set[type]

View Source on GitHub

Returns a set containing all child classes in the subclass graph of class_. I.e., includes subclasses of subclasses, etc.

Parameters

Development

Since most class hierarchies are small, the inefficiencies of the existing recursive implementation aren’t problematic. It might be valuable to refactor with memoization if the need arises to use this function on a very large class hierarchy.

def isinstance_by_type_name

(o: object, type_name: str)

View Source on GitHub

Behaves like stdlib isinstance except it accepts a string representation of the type rather than the type itself. This is a hacky function intended to circumvent the need to import a type into a module. It is susceptible to type name collisions.

Parameters

o: Object (not the type itself) whose type to interrogate type_name: The string returned by type_.__name__. Generic types are not supported, only types that would appear in type_.__mro__.

class IsDataclass(typing.Protocol):

View Source on GitHub

Base class for protocol classes.

Protocol classes are defined as::

class Proto(Protocol):
    def meth(self) -> int:
        ...

Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing).

For example::

class C:
    def meth(self) -> int:
        return 0

def func(x: Proto) -> int:
    return x.meth()

func(C())  # Passes static type check

See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as::

class GenProto[T](Protocol):
    def meth(self) -> T:
        ...

IsDataclass

(*args, **kwargs)

View Source on GitHub

def get_hashable_eq_attrs

(dc: muutils.misc.classes.IsDataclass) -> tuple[typing.Any]

View Source on GitHub

Returns a tuple of all fields used for equality comparison, including the type of the dataclass itself. The type is included to preserve the unequal equality behavior of instances of different dataclasses whose fields are identical. Essentially used to generate a hashable dataclass representation for equality comparison even if it’s not frozen.

def dataclass_set_equals

(
    coll1: Iterable[muutils.misc.classes.IsDataclass],
    coll2: Iterable[muutils.misc.classes.IsDataclass]
) -> bool

View Source on GitHub

Compares 2 collections of dataclass instances as if they were sets. Duplicates are ignored in the same manner as a set. Unfrozen dataclasses can’t be placed in sets since they’re not hashable. Collections of them may be compared using this function.

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.misc.classes

View Source on GitHub

def is_abstract

(cls: type) -> bool

View Source on GitHub

Returns if a class is abstract.

def get_all_subclasses

(class_: type, include_self=False) -> set[type]

View Source on GitHub

Returns a set containing all child classes in the subclass graph of class_. I.e., includes subclasses of subclasses, etc.

Parameters

Development

Since most class hierarchies are small, the inefficiencies of the existing recursive implementation aren’t problematic. It might be valuable to refactor with memoization if the need arises to use this function on a very large class hierarchy.

def isinstance_by_type_name

(o: object, type_name: str)

View Source on GitHub

Behaves like stdlib isinstance except it accepts a string representation of the type rather than the type itself. This is a hacky function intended to circumvent the need to import a type into a module. It is susceptible to type name collisions.

Parameters

o: Object (not the type itself) whose type to interrogate type_name: The string returned by type_.__name__. Generic types are not supported, only types that would appear in type_.__mro__.

class IsDataclass(typing.Protocol):

View Source on GitHub

Base class for protocol classes.

Protocol classes are defined as::

class Proto(Protocol):
    def meth(self) -> int:
        ...

Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing).

For example::

class C:
    def meth(self) -> int:
        return 0

def func(x: Proto) -> int:
    return x.meth()

func(C())  # Passes static type check

See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as::

class GenProto[T](Protocol):
    def meth(self) -> T:
        ...

IsDataclass

(*args, **kwargs)

View Source on GitHub

def get_hashable_eq_attrs

(dc: muutils.misc.classes.IsDataclass) -> tuple[typing.Any]

View Source on GitHub

Returns a tuple of all fields used for equality comparison, including the type of the dataclass itself. The type is included to preserve the unequal equality behavior of instances of different dataclasses whose fields are identical. Essentially used to generate a hashable dataclass representation for equality comparison even if it’s not frozen.

def dataclass_set_equals

(
    coll1: Iterable[muutils.misc.classes.IsDataclass],
    coll2: Iterable[muutils.misc.classes.IsDataclass]
) -> bool

View Source on GitHub

Compares 2 collections of dataclass instances as if they were sets. Duplicates are ignored in the same manner as a set. Unfrozen dataclasses can’t be placed in sets since they’re not hashable. Collections of them may be compared using this function.

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.misc.freezing

View Source on GitHub

class FrozenDict(builtins.dict):

View Source on GitHub

Inherited Members

class FrozenList(builtins.list):

View Source on GitHub

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

def append

(self, value)

View Source on GitHub

Append object to the end of the list.

def extend

(self, iterable)

View Source on GitHub

Extend list by appending elements from the iterable.

def insert

(self, index, value)

View Source on GitHub

Insert object before index.

def remove

(self, value)

View Source on GitHub

Remove first occurrence of value.

Raises ValueError if the value is not present.

def pop

(self, index=-1)

View Source on GitHub

Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

def clear

(self)

View Source on GitHub

Remove all items from list.

Inherited Members

def freeze

(instance: Any) -> Any

View Source on GitHub

recursively freeze an object in-place so that its attributes and elements cannot be changed

messy in the sense that sometimes the object is modified in place, but you can’t rely on that. always use the return value.

the gelidum package is a more complete implementation of this idea

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.misc.func

View Source on GitHub

def process_kwarg

(
    kwarg_name: str,
    processor: Callable[[~T_process_in], ~T_process_out]
) -> Callable[[Callable[~FuncParamsPreWrap, ~ReturnType]], Callable[~FuncParams, ~ReturnType]]

View Source on GitHub

Decorator that applies a processor to a keyword argument.

The underlying function is expected to have a keyword argument (with name kwarg_name) of type T_out, but the caller provides a value of type T_in that is converted via processor.

Parameters:

Returns:

def validate_kwarg

(
    kwarg_name: str,
    validator: Callable[[~T_kwarg], bool],
    description: str | None = None,
    action: muutils.errormode.ErrorMode = ErrorMode.Except
) -> Callable[[Callable[~FuncParams, ~ReturnType]], Callable[~FuncParams, ~ReturnType]]

View Source on GitHub

Decorator that validates a specific keyword argument.

Parameters:

Returns:

Modifies:

Usage:

@validate_kwarg("x", lambda val: val > 0, "Invalid {kwarg_name}: {value}")
def my_func(x: int) -> int:
    return x

assert my_func(x=1) == 1

Raises:

def replace_kwarg

(
    kwarg_name: str,
    check: Callable[[~T_kwarg], bool],
    replacement_value: ~T_kwarg,
    replace_if_missing: bool = False
) -> Callable[[Callable[~FuncParams, ~ReturnType]], Callable[~FuncParams, ~ReturnType]]

View Source on GitHub

Decorator that replaces a specific keyword argument value by identity comparison.

Parameters:

Returns:

Modifies:

Usage:

@replace_kwarg("x", None, "default_string")
def my_func(*, x: str | None = None) -> str:
    return x

assert my_func(x=None) == "default_string"

def is_none

(value: Any) -> bool

View Source on GitHub

def always_true

(value: Any) -> bool

View Source on GitHub

def always_false

(value: Any) -> bool

View Source on GitHub

def format_docstring

(
    **fmt_kwargs: Any
) -> Callable[[Callable[~FuncParams, ~ReturnType]], Callable[~FuncParams, ~ReturnType]]

View Source on GitHub

Decorator that formats a function’s docstring with the provided keyword arguments.

def typed_lambda

(
    fn: Callable[[Unpack[LambdaArgs]], ~ReturnType],
    in_types: ~LambdaArgsTypes,
    out_type: type[~ReturnType]
) -> Callable[[Unpack[LambdaArgs]], ~ReturnType]

View Source on GitHub

Wraps a lambda function with type hints.

Parameters:

Returns:

Usage:

add = typed_lambda(lambda x, y: x + y, (int, int), int)
assert add(1, 2) == 3

Raises:

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.misc.hashing

View Source on GitHub

def stable_hash

(s: str | bytes) -> int

View Source on GitHub

Returns a stable hash of the given string. not cryptographically secure, but stable between runs

def stable_json_dumps

(d) -> str

View Source on GitHub

def base64_hash

(s: str | bytes) -> str

View Source on GitHub

Returns a base64 representation of the hash of the given string. not cryptographically secure

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.misc.numerical

View Source on GitHub

def shorten_numerical_to_str

(
    num: int | float,
    small_as_decimal: bool = True,
    precision: int = 1
) -> str

View Source on GitHub

shorten a large numerical value to a string 1234 -> 1K

precision guaranteed to 1 in 10, but can be higher. reverse of str_to_numeric

def str_to_numeric

(
    quantity: str,
    mapping: None | bool | dict[str, int | float] = True
) -> int | float

View Source on GitHub

Convert a string representing a quantity to a numeric value.

The string can represent an integer, python float, fraction, or shortened via shorten_numerical_to_str.

Examples:

>>> str_to_numeric("5")
5
>>> str_to_numeric("0.1")
0.1
>>> str_to_numeric("1/5")
0.2
>>> str_to_numeric("-1K")
-1000.0
>>> str_to_numeric("1.5M")
1500000.0
>>> str_to_numeric("1.2e2")
120.0

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.misc.sequence

View Source on GitHub

def empty_sequence_if_attr_false

(itr: Iterable[Any], attr_owner: Any, attr_name: str) -> Iterable[Any]

View Source on GitHub

Returns itr if attr_owner has the attribute attr_name and it boolean casts to True. Returns an empty sequence otherwise.

Particularly useful for optionally inserting delimiters into a sequence depending on an TokenizerElement attribute.

Parameters:

Returns:

def flatten

(it: Iterable[Any], levels_to_flatten: int | None = None) -> Generator

View Source on GitHub

Flattens an arbitrarily nested iterable. Flattens all iterable data types except for str and bytes.

Returns

Generator over the flattened sequence.

Parameters

def list_split

(lst: list, val: Any) -> list[list]

View Source on GitHub

split a list into sublists by val. similar to “a_b_c”.split(“_“)

>>> list_split([1,2,3,0,4,5,0,6], 0)
[[1, 2, 3], [4, 5], [6]]
>>> list_split([0,1,2,3], 0)
[[], [1, 2, 3]]
>>> list_split([1,2,3], 0)
[[1, 2, 3]]
>>> list_split([], 0)
[[]]

def list_join

(lst: list, factory: Callable) -> list

View Source on GitHub

add a new instance of factory() between each element of lst

>>> list_join([1,2,3], lambda : 0)
[1,0,2,0,3]
>>> list_join([1,2,3], lambda: [time.sleep(0.1), time.time()][1])
[1, 1600000000.0, 2, 1600000000.1, 3]

def apply_mapping

(
    mapping: Mapping[~_AM_K, ~_AM_V],
    iter: Iterable[~_AM_K],
    when_missing: Literal['except', 'skip', 'include'] = 'skip'
) -> list[typing.Union[~_AM_K, ~_AM_V]]

View Source on GitHub

Given an iterable and a mapping, apply the mapping to the iterable with certain options

Gotcha: if when_missing is invalid, this is totally fine until a missing key is actually encountered.

Note: you can use this with <a href="../kappa.html#Kappa">muutils.kappa.Kappa</a> if you want to pass a function instead of a dict

Parameters:

Returns:

return type is one of: - list[_AM_V] if when_missing is "skip" or "except" - list[Union[_AM_K, _AM_V]] if when_missing is "include"

Raises:

def apply_mapping_chain

(
    mapping: Mapping[~_AM_K, Iterable[~_AM_V]],
    iter: Iterable[~_AM_K],
    when_missing: Literal['except', 'skip', 'include'] = 'skip'
) -> list[typing.Union[~_AM_K, ~_AM_V]]

View Source on GitHub

Given an iterable and a mapping, chain the mappings together

Gotcha: if when_missing is invalid, this is totally fine until a missing key is actually encountered.

Note: you can use this with <a href="../kappa.html#Kappa">muutils.kappa.Kappa</a> if you want to pass a function instead of a dict

Parameters:

Returns:

return type is one of: - list[_AM_V] if when_missing is "skip" or "except" - list[Union[_AM_K, _AM_V]] if when_missing is "include"

Raises:

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.misc.string

View Source on GitHub

def sanitize_name

(
    name: str | None,
    additional_allowed_chars: str = '',
    replace_invalid: str = '',
    when_none: str | None = '_None_',
    leading_digit_prefix: str = ''
) -> str

View Source on GitHub

sanitize a string, leaving only alphanumerics and additional_allowed_chars

Parameters:

Returns:

def sanitize_fname

(fname: str | None, **kwargs) -> str

View Source on GitHub

sanitize a filename to posix standards

def sanitize_identifier

(fname: str | None, **kwargs) -> str

View Source on GitHub

sanitize an identifier (variable or function name)

def dict_to_filename

(
    data: dict,
    format_str: str = '{key}_{val}',
    separator: str = '.',
    max_length: int = 255
)

View Source on GitHub

def dynamic_docstring

(**doc_params)

View Source on GitHub

docs for muutils v0.8.7

Contents

miscellaneous utilities for ML pipelines

API Documentation

View Source on GitHub

muutils.mlutils

miscellaneous utilities for ML pipelines

View Source on GitHub

def get_device

(device: Union[str, torch.device, NoneType] = None) -> torch.device

View Source on GitHub

Get the torch.device instance on which torch.Tensors should be allocated.

def set_reproducibility

(seed: int = 42)

View Source on GitHub

Improve model reproducibility. See https://github.com/NVIDIA/framework-determinism for more information.

Deterministic operations tend to have worse performance than nondeterministic operations, so this method trades off performance for reproducibility. Set use_deterministic_algorithms to True to improve performance.

def chunks

(it, chunk_size)

View Source on GitHub

Yield successive chunks from an iterator.

def get_checkpoint_paths_for_run

(
    run_path: pathlib.Path,
    extension: Literal['pt', 'zanj'],
    checkpoints_format: str = 'checkpoints/model.iter_*.{extension}'
) -> list[tuple[int, pathlib.Path]]

View Source on GitHub

get checkpoints of the format from the run_path

note that checkpoints_format should contain a glob pattern with: - unresolved “{extension}” format term for the extension - a wildcard for the iteration number

def register_method

(
    method_dict: dict[str, typing.Callable[..., typing.Any]],
    custom_name: Optional[str] = None
) -> Callable[[~F], ~F]

View Source on GitHub

Decorator to add a method to the method_dict

def pprint_summary

(summary: dict)

View Source on GitHub

docs for muutils v0.8.7

Contents

utilities for working with notebooks

Submodules

API Documentation

View Source on GitHub

muutils.nbutils

utilities for working with notebooks

View Source on GitHub

def mm

(graph)

View Source on GitHub

for plotting mermaid.js diagrams

docs for muutils v0.8.7

Contents

shared utilities for setting up a notebook

API Documentation

View Source on GitHub

muutils.nbutils.configure_notebook

shared utilities for setting up a notebook

View Source on GitHub

class PlotlyNotInstalledWarning(builtins.UserWarning):

View Source on GitHub

Base class for warnings generated by user code.

Inherited Members

class UnknownFigureFormatWarning(builtins.UserWarning):

View Source on GitHub

Base class for warnings generated by user code.

Inherited Members

def universal_savefig

(fname: str, fmt: str | None = None) -> None

View Source on GitHub

def setup_plots

(
    plot_mode: Literal['ignore', 'inline', 'widget', 'save'] = 'inline',
    fig_output_fmt: str | None = 'pdf',
    fig_numbered_fname: str = 'figure-{num}',
    fig_config: dict | None = None,
    fig_basepath: str | None = None,
    close_after_plotshow: bool = False
) -> None

View Source on GitHub

Set up plot saving/rendering options

def configure_notebook

(
    *args,
    seed: int = 42,
    device: Any = None,
    dark_mode: bool = True,
    plot_mode: Literal['ignore', 'inline', 'widget', 'save'] = 'inline',
    fig_output_fmt: str | None = 'pdf',
    fig_numbered_fname: str = 'figure-{num}',
    fig_config: dict | None = None,
    fig_basepath: str | None = None,
    close_after_plotshow: bool = False
) -> torch.device | None

View Source on GitHub

Shared Jupyter notebook setup steps

Parameters:

Returns:

def plotshow

(
    fname: str | None = None,
    plot_mode: Optional[Literal['ignore', 'inline', 'widget', 'save']] = None,
    fmt: str | None = None
)

View Source on GitHub

Show the active plot, depending on global configs

docs for muutils v0.8.7

Contents

fast conversion of Jupyter Notebooks to scripts, with some basic and hacky filtering and formatting.

API Documentation

View Source on GitHub

muutils.nbutils.convert_ipynb_to_script

fast conversion of Jupyter Notebooks to scripts, with some basic and hacky filtering and formatting.

View Source on GitHub

def disable_plots_in_script

(script_lines: list[str]) -> list[str]

View Source on GitHub

Disable plots in a script by adding cursed things after the import statements

def convert_ipynb

(
    notebook: dict,
    strip_md_cells: bool = False,
    header_comment: str = '#%%',
    disable_plots: bool = False,
    filter_out_lines: Union[str, Sequence[str]] = ('%', '!')
) -> str

View Source on GitHub

Convert Jupyter Notebook to a script, doing some basic filtering and formatting.

Arguments

- `notebook: dict`: Jupyter Notebook loaded as json.
- `strip_md_cells: bool = False`: Remove markdown cells from the output script.
- `header_comment: str = r'#%%'`: Comment string to separate cells in the output script.
- `disable_plots: bool = False`: Disable plots in the output script.
- `filter_out_lines: str|typing.Sequence[str] = ('%', '!')`: comment out lines starting with these strings (in code blocks).
    if a string is passed, it will be split by char and each char will be treated as a separate filter.

Returns

- `str`: Converted script.

def process_file

(
    in_file: str,
    out_file: str | None = None,
    strip_md_cells: bool = False,
    header_comment: str = '#%%',
    disable_plots: bool = False,
    filter_out_lines: Union[str, Sequence[str]] = ('%', '!')
)

View Source on GitHub

def process_dir

(
    input_dir: Union[str, pathlib.Path],
    output_dir: Union[str, pathlib.Path],
    strip_md_cells: bool = False,
    header_comment: str = '#%%',
    disable_plots: bool = False,
    filter_out_lines: Union[str, Sequence[str]] = ('%', '!')
)

View Source on GitHub

Convert all Jupyter Notebooks in a directory to scripts.

Arguments

- `input_dir: str`: Input directory.
- `output_dir: str`: Output directory.
- `strip_md_cells: bool = False`: Remove markdown cells from the output script.
- `header_comment: str = r'#%%'`: Comment string to separate cells in the output script.
- `disable_plots: bool = False`: Disable plots in the output script.
- `filter_out_lines: str|typing.Sequence[str] = ('%', '!')`: comment out lines starting with these strings (in code blocks).
    if a string is passed, it will be split by char and each char will be treated as a separate filter.

docs for muutils v0.8.7

Contents

display mermaid.js diagrams in jupyter notebooks by the mermaid.ink/img service

API Documentation

View Source on GitHub

muutils.nbutils.mermaid

display mermaid.js diagrams in jupyter notebooks by the mermaid.ink/img service

View Source on GitHub

def mm

(graph)

View Source on GitHub

for plotting mermaid.js diagrams

docs for muutils v0.8.7

Contents

quickly print a sympy expression in latex

API Documentation

View Source on GitHub

muutils.nbutils.print_tex

quickly print a sympy expression in latex

View Source on GitHub

(
    expr: sympy.core.expr.Expr,
    name: str | None = None,
    plain: bool = False,
    rendered: bool = True
)

View Source on GitHub

function for easily rendering a sympy expression in latex

docs for muutils v0.8.7

Contents

turn a folder of notebooks into scripts, run them, and make sure they work.

made to be called as

python -m muutils.nbutils.run_notebook_tests --notebooks-dir <notebooks_dir> --converted-notebooks-temp-dir <converted_notebooks_temp_dir>

API Documentation

View Source on GitHub

muutils.nbutils.run_notebook_tests

turn a folder of notebooks into scripts, run them, and make sure they work.

made to be called as

python -m <a href="">muutils.nbutils.run_notebook_tests</a> --notebooks-dir <notebooks_dir> --converted-notebooks-temp-dir <converted_notebooks_temp_dir>

View Source on GitHub

class NotebookTestError(builtins.Exception):

View Source on GitHub

Common base class for all non-exit exceptions.

Inherited Members

def run_notebook_tests

(
    notebooks_dir: pathlib.Path,
    converted_notebooks_temp_dir: pathlib.Path,
    CI_output_suffix: str = '.CI-output.txt',
    run_python_cmd: Optional[str] = None,
    run_python_cmd_fmt: str = '{python_tool} run python',
    python_tool: str = 'poetry',
    exit_on_first_fail: bool = False
)

View Source on GitHub

Run converted Jupyter notebooks as Python scripts and verify they execute successfully.

Takes a directory of notebooks and their corresponding converted Python scripts, executes each script, and captures the output. Failures are collected and reported, with optional early exit on first failure.

Parameters:

Returns:

Modifies:

Raises:

Usage:

>>> run_notebook_tests(
...     notebooks_dir=Path("notebooks"),
...     converted_notebooks_temp_dir=Path("temp/converted"),
...     python_tool="poetry"
... )
### testing notebooks in 'notebooks'
### reading converted notebooks from 'temp/converted'
Running 1/2: temp/converted/notebook1.py
    Output in temp/converted/notebook1.CI-output.txt
    {SUCCESS_STR} Run completed with return code 0

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.parallel

View Source on GitHub

class ProgressBarFunction(typing.Protocol):

View Source on GitHub

a protocol for a progress bar function

ProgressBarFunction

(*args, **kwargs)

View Source on GitHub

def spinner_fn_wrap

(x: Iterable, **kwargs) -> List

View Source on GitHub

spinner wrapper

def map_kwargs_for_tqdm

(kwargs: dict) -> dict

View Source on GitHub

map kwargs for tqdm, cant wrap because the pbar dissapears?

def no_progress_fn_wrap

(x: Iterable, **kwargs) -> Iterable

View Source on GitHub

fallback to no progress bar

def set_up_progress_bar_fn

(
    pbar: Union[muutils.parallel.ProgressBarFunction, Literal['tqdm', 'spinner', 'none', None]],
    pbar_kwargs: Optional[Dict[str, Any]] = None,
    **extra_kwargs
) -> Tuple[muutils.parallel.ProgressBarFunction, dict]

View Source on GitHub

set up the progress bar function and its kwargs

Parameters:

Returns:

Raises:

def run_maybe_parallel

(
    func: Callable[[~InputType], ~OutputType],
    iterable: Iterable[~InputType],
    parallel: Union[bool, int],
    pbar_kwargs: Optional[Dict[str, Any]] = None,
    chunksize: Optional[int] = None,
    keep_ordered: bool = True,
    use_multiprocess: bool = False,
    pbar: Union[muutils.parallel.ProgressBarFunction, Literal['tqdm', 'spinner', 'none', None]] = 'tqdm'
) -> List[~OutputType]

View Source on GitHub

a function to make it easier to sometimes parallelize an operation

the maximum number of processes is given by the min(len(iterable), multiprocessing.cpu_count())

Parameters:

Returns:

Raises:

docs for muutils v0.8.7

Contents

decorator spinner_decorator and context manager SpinnerContext to display a spinner

using the base Spinner class while some code is running.

API Documentation

View Source on GitHub

muutils.spinner

decorator spinner_decorator and context manager SpinnerContext to display a spinner

using the base Spinner class while some code is running.

View Source on GitHub

Define a generic type for the decorated function

class SpinnerConfig:

View Source on GitHub

SpinnerConfig

(working: List[str] = <factory>, success: str = '✔️', fail: str = '❌')

def is_ascii

(self) -> bool

View Source on GitHub

whether all characters are ascii

def eq_lens

(self) -> bool

View Source on GitHub

whether all working characters are the same length

def is_valid

(self) -> bool

View Source on GitHub

whether the spinner config is valid

def from_any

(
    cls,
    arg: Union[str, List[str], muutils.spinner.SpinnerConfig, dict]
) -> muutils.spinner.SpinnerConfig

View Source on GitHub

class Spinner:

View Source on GitHub

displays a spinner, and optionally elapsed time and a mutable value while a function is running.

Parameters:

Deprecated Parameters:

Methods:

Usage:

As a context manager:

with SpinnerContext() as sp:
    for i in range(1):
        time.sleep(0.1)
        spinner.update_value(f"Step {i+1}")

As a decorator:

@spinner_decorator
def long_running_function():
    for i in range(1):
        time.sleep(0.1)
        spinner.update_value(f"Step {i+1}")
    return "Function completed"

Spinner

(
    *args,
    config: Union[str, List[str], muutils.spinner.SpinnerConfig, dict] = 'default',
    update_interval: float = 0.1,
    initial_value: str = '',
    message: str = '',
    format_string: str = '\r{spinner} ({elapsed_time:.2f}s) {message}{value}',
    output_stream: <class 'TextIO'> = <_io.StringIO object>,
    format_string_when_updated: Union[str, bool] = False,
    spinner_chars: Union[str, Sequence[str], NoneType] = None,
    spinner_complete: Optional[str] = None,
    **kwargs: Any
)

View Source on GitHub

format string to use when the value is updated

for measuring elapsed time

to stop the spinner

the thread running the spinner

whether the value has been updated since the last display

width of the terminal, for padding with spaces

def spin

(self) -> None

View Source on GitHub

Function to run in a separate thread, displaying the spinner and optional information

def update_value

(self, value: Any) -> None

View Source on GitHub

Update the current value displayed by the spinner

def start

(self) -> None

View Source on GitHub

Start the spinner

def stop

(self, failed: bool = False) -> None

View Source on GitHub

Stop the spinner

class NoOpContextManager(typing.ContextManager):

View Source on GitHub

A context manager that does nothing.

NoOpContextManager

(*args, **kwargs)

View Source on GitHub

class SpinnerContext(Spinner, typing.ContextManager):

View Source on GitHub

displays a spinner, and optionally elapsed time and a mutable value while a function is running.

Parameters:

Deprecated Parameters:

Methods:

Usage:

As a context manager:

with SpinnerContext() as sp:
    for i in range(1):
        time.sleep(0.1)
        spinner.update_value(f"Step {i+1}")

As a decorator:

@spinner_decorator
def long_running_function():
    for i in range(1):
        time.sleep(0.1)
        spinner.update_value(f"Step {i+1}")
    return "Function completed"

Inherited Members

def spinner_decorator

(
    *args,
    config: Union[str, List[str], muutils.spinner.SpinnerConfig, dict] = 'default',
    update_interval: float = 0.1,
    initial_value: str = '',
    message: str = '',
    format_string: str = '{spinner} ({elapsed_time:.2f}s) {message}{value}',
    output_stream: <class 'TextIO'> = <_io.StringIO object>,
    mutable_kwarg_key: Optional[str] = None,
    spinner_chars: Union[str, Sequence[str], NoneType] = None,
    spinner_complete: Optional[str] = None,
    **kwargs
) -> Callable[[~DecoratedFunction], ~DecoratedFunction]

View Source on GitHub

displays a spinner, and optionally elapsed time and a mutable value while a function is running.

Parameters:

Deprecated Parameters:

Methods:

Usage:

As a context manager:

with SpinnerContext() as sp:
    for i in range(1):
        time.sleep(0.1)
        spinner.update_value(f"Step {i+1}")

As a decorator:

@spinner_decorator
def long_running_function():
    for i in range(1):
        time.sleep(0.1)
        spinner.update_value(f"Step {i+1}")
    return "Function completed"

docs for muutils v0.8.7

Contents

StatCounter class for counting and calculating statistics on numbers

cleaner and more efficient than just using a Counter or array

API Documentation

View Source on GitHub

muutils.statcounter

StatCounter class for counting and calculating statistics on numbers

cleaner and more efficient than just using a Counter or array

View Source on GitHub

def universal_flatten

(
    arr: Union[Sequence[Union[float, int, Sequence[Union[float, int, ForwardRef('NumericSequence')]]]], float, int],
    require_rectangular: bool = True
) -> Sequence[Union[float, int, ForwardRef('NumericSequence')]]

View Source on GitHub

flattens any iterable

class StatCounter(collections.Counter):

View Source on GitHub

Counter, but with some stat calculation methods which assume the keys are numerical

works best when the keys are ints

def validate

(self) -> bool

View Source on GitHub

validate the counter as being all floats or ints

def min

(self)

View Source on GitHub

minimum value

def max

(self)

View Source on GitHub

maximum value

def total

(self)

View Source on GitHub

Sum of the counts

View Source on GitHub

return the keys

def percentile

(self, p: float)

View Source on GitHub

return the value at the given percentile

this could be log time if we did binary search, but that would be a lot of added complexity

def median

(self) -> float

View Source on GitHub

def mean

(self) -> float

View Source on GitHub

return the mean of the values

def mode

(self) -> float

View Source on GitHub

def std

(self) -> float

View Source on GitHub

return the standard deviation of the values

def summary

(
    self,
    typecast: Callable = <function StatCounter.<lambda>>,
    *,
    extra_percentiles: Optional[list[float]] = None
) -> dict[str, typing.Union[float, int]]

View Source on GitHub

return a summary of the stats, without the raw data. human readable and small

def serialize

(
    self,
    typecast: Callable = <function StatCounter.<lambda>>,
    *,
    extra_percentiles: Optional[list[float]] = None
) -> dict

View Source on GitHub

return a json-serializable version of the counter

includes both the output of summary and the raw data:

{
    "StatCounter": { <keys, values from raw data> },
    "summary": self.summary(typecast, extra_percentiles=extra_percentiles),
}


### `def load` { #StatCounter.load }
```python
(cls, data: dict) -> muutils.statcounter.StatCounter

View Source on GitHub

load from a the output of <a href="#StatCounter.serialize">StatCounter.serialize</a>

def from_list_arrays

(
    cls,
    arr,
    map_func: Callable = <class 'float'>
) -> muutils.statcounter.StatCounter

View Source on GitHub

calls map_func on each element of universal_flatten(arr)

Inherited Members

docs for muutils v0.8.7

Contents

utilities for getting information about the system, see SysInfo class

API Documentation

View Source on GitHub

muutils.sysinfo

utilities for getting information about the system, see SysInfo class

View Source on GitHub

class SysInfo:

View Source on GitHub

getters for various information about the system

def python

() -> dict

View Source on GitHub

details about python version

def pip

() -> dict

View Source on GitHub

installed packages info

def pytorch

() -> dict

View Source on GitHub

pytorch and cuda information

def platform

() -> dict

View Source on GitHub

def git_info

(with_log: bool = False) -> dict

View Source on GitHub

def get_all

(
    cls,
    include: Optional[tuple[str, ...]] = None,
    exclude: tuple[str, ...] = ()
) -> dict

View Source on GitHub

docs for muutils v0.8.7

API Documentation

View Source on GitHub

muutils.tensor_info

View Source on GitHub

Symbols for different formats

characters for sparklines in different formats

def array_info

(A: Any, hist_bins: int = 5) -> Dict[str, Any]

View Source on GitHub

Extract statistical information from an array-like object.

Parameters:

Returns:

def generate_sparkline

(
    histogram: numpy.ndarray,
    format: Literal['unicode', 'latex', 'ascii'] = 'unicode',
    log_y: bool = False
) -> str

View Source on GitHub

Generate a sparkline visualization of the histogram.

Parameters:

Returns:

def array_summary

(
    array,
    fmt: Literal['unicode', 'latex', 'ascii'] = <muutils.tensor_info._UseDefaultType object>,
    precision: int = <muutils.tensor_info._UseDefaultType object>,
    stats: bool = <muutils.tensor_info._UseDefaultType object>,
    shape: bool = <muutils.tensor_info._UseDefaultType object>,
    dtype: bool = <muutils.tensor_info._UseDefaultType object>,
    device: bool = <muutils.tensor_info._UseDefaultType object>,
    requires_grad: bool = <muutils.tensor_info._UseDefaultType object>,
    sparkline: bool = <muutils.tensor_info._UseDefaultType object>,
    sparkline_bins: int = <muutils.tensor_info._UseDefaultType object>,
    sparkline_logy: bool = <muutils.tensor_info._UseDefaultType object>,
    colored: bool = <muutils.tensor_info._UseDefaultType object>,
    eq_char: str = <muutils.tensor_info._UseDefaultType object>,
    as_list: bool = <muutils.tensor_info._UseDefaultType object>
) -> Union[str, List[str]]

View Source on GitHub

Format array information into a readable summary.

Parameters:

Returns:

docs for muutils v0.8.7

Contents

utilities for working with tensors and arrays.

notably:

API Documentation

View Source on GitHub

muutils.tensor_utils

utilities for working with tensors and arrays.

notably:

View Source on GitHub

dict mapping python, numpy, and torch types to jaxtyping types

def jaxtype_factory

(
    name: str,
    array_type: type,
    default_jax_dtype=<class 'jaxtyping.Float'>,
    legacy_mode: Union[muutils.errormode.ErrorMode, str] = ErrorMode.Warn
) -> type

View Source on GitHub

usage:

ATensor = jaxtype_factory("ATensor", torch.Tensor, jaxtyping.Float)
x: ATensor["dim1 dim2", np.float32]

def numpy_to_torch_dtype

(dtype: Union[numpy.dtype, torch.dtype]) -> torch.dtype

View Source on GitHub

convert numpy dtype to torch dtype

list of all the python, numpy, and torch numerical types I could think of

mapping from string representations of types to their type

mapping from string representations of types to specifically torch types

def pad_tensor

(
    tensor: jaxtyping.Shaped[Tensor, 'dim1'],
    padded_length: int,
    pad_value: float = 0.0,
    rpad: bool = False
) -> jaxtyping.Shaped[Tensor, 'padded_length']

View Source on GitHub

pad a 1-d tensor on the left with pad_value to length padded_length

set rpad = True to pad on the right instead

def lpad_tensor

(
    tensor: torch.Tensor,
    padded_length: int,
    pad_value: float = 0.0
) -> torch.Tensor

View Source on GitHub

pad a 1-d tensor on the left with pad_value to length padded_length

def rpad_tensor

(
    tensor: torch.Tensor,
    pad_length: int,
    pad_value: float = 0.0
) -> torch.Tensor

View Source on GitHub

pad a 1-d tensor on the right with pad_value to length pad_length

def pad_array

(
    array: jaxtyping.Shaped[ndarray, 'dim1'],
    padded_length: int,
    pad_value: float = 0.0,
    rpad: bool = False
) -> jaxtyping.Shaped[ndarray, 'padded_length']

View Source on GitHub

pad a 1-d array on the left with pad_value to length padded_length

set rpad = True to pad on the right instead

def lpad_array

(
    array: numpy.ndarray,
    padded_length: int,
    pad_value: float = 0.0
) -> numpy.ndarray

View Source on GitHub

pad a 1-d array on the left with pad_value to length padded_length

def rpad_array

(
    array: numpy.ndarray,
    pad_length: int,
    pad_value: float = 0.0
) -> numpy.ndarray

View Source on GitHub

pad a 1-d array on the right with pad_value to length pad_length

def get_dict_shapes

(d: dict[str, torch.Tensor]) -> dict[str, tuple[int, ...]]

View Source on GitHub

given a state dict or cache dict, compute the shapes and put them in a nested dict

def string_dict_shapes

(d: dict[str, torch.Tensor]) -> str

View Source on GitHub

printable version of get_dict_shapes

class StateDictCompareError(builtins.AssertionError):

View Source on GitHub

raised when state dicts don’t match

Inherited Members

class StateDictKeysError(StateDictCompareError):

View Source on GitHub

raised when state dict keys don’t match

Inherited Members

class StateDictShapeError(StateDictCompareError):

View Source on GitHub

raised when state dict shapes don’t match

Inherited Members

class StateDictValueError(StateDictCompareError):

View Source on GitHub

raised when state dict values don’t match

Inherited Members

def compare_state_dicts

(
    d1: dict,
    d2: dict,
    rtol: float = 1e-05,
    atol: float = 1e-08,
    verbose: bool = True
) -> None

View Source on GitHub

compare two dicts of tensors

Parameters:

Raises:

docs for muutils v0.8.7

Contents

timeit_fancy is just a fancier version of timeit with more options

API Documentation

View Source on GitHub

muutils.timeit_fancy

timeit_fancy is just a fancier version of timeit with more options

View Source on GitHub

class FancyTimeitResult(typing.NamedTuple):

View Source on GitHub

return type of timeit_fancy

FancyTimeitResult

(
    timings: ForwardRef('StatCounter'),
    return_value: ForwardRef('T'),
    profile: ForwardRef('Union[pstats.Stats, None]')
)

Create new instance of FancyTimeitResult(timings, return_value, profile)

Alias for field number 0

Alias for field number 1

Alias for field number 2

Inherited Members

def timeit_fancy

(
    cmd: Union[Callable[[], ~T], str],
    setup: Union[str, Callable[[], Any]] = <function <lambda>>,
    repeats: int = 5,
    namespace: Optional[dict[str, Any]] = None,
    get_return: bool = True,
    do_profiling: bool = False
) -> muutils.timeit_fancy.FancyTimeitResult

View Source on GitHub

Wrapper for timeit to get the fastest run of a callable with more customization options.

Approximates the functionality of the %timeit magic or command line interface in a Python callable.

Parameters

Returns

FancyTimeitResult, which is a NamedTuple with the following fields:

docs for muutils v0.8.7

Contents

experimental utility for validating types in python, see validate_type

API Documentation

View Source on GitHub

muutils.validate_type

experimental utility for validating types in python, see validate_type

View Source on GitHub

class IncorrectTypeException(builtins.TypeError):

View Source on GitHub

Inappropriate argument type.

Inherited Members

class TypeHintNotImplementedError(builtins.NotImplementedError):

View Source on GitHub

Method or function hasn’t been implemented yet.

Inherited Members

class InvalidGenericAliasError(builtins.TypeError):

View Source on GitHub

Inappropriate argument type.

Inherited Members

def validate_type

(value: Any, expected_type: Any, do_except: bool = False) -> bool

View Source on GitHub

Validate that a value is of the expected_type

Parameters

Returns

Raises

use typeguard for a more robust solution: https://github.com/agronholm/typeguard

def get_fn_allowed_kwargs

(fn: Callable) -> Set[str]

View Source on GitHub

Get the allowed kwargs for a function, raising an exception if the signature cannot be determined.