Skip to main content

Parse binary data using declarative field layout and native Python properties

Project description

Build Status

ByteField

A Python library for parsing/manipulating binary data with easily accessible Python properties inspired by Django. The library is still in development. ByteField supports:

  • Variable length fields
  • Nested structures
  • Parsing only accessed fields

Quick example

ByteField allows to define binary data layout declaratively which then maps to underlying bytes:

from bytefield import *

class Header(ByteStruct):
    magic = StringField(length=5)
    length = IntegerField()
    array = ArrayField(shape=None, elem_field_type=IntegerField)
    floating = FloatField()

header = Header(magic='bytes', floating=3.14)
header.length = 3
header.array = list(range(1, header.length + 1))
print(header.data)

Output:

bytearray(b'bytes\x03\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\xc3\xf5H@')`

Example: parse a JPEG header

You can embed other structure declarations inside structures:

from bytefield import *

class RGB(ByteStruct):
    r = IntegerField(signed=False, size=1)
    g = IntegerField(signed=False, size=1)
    b = IntegerField(signed=False, size=1)

class Marker(ByteStruct):
    marker = IntegerField(size=2, signed=False)
    length = IntegerField(size=2, signed=False)
    identifier = StringField(length=5, encoding='ascii')
    version = IntegerField(size=2, signed=False)
    density = IntegerField(size=1, signed=False)
    x_density = IntegerField(size=2, signed=False)
    y_density = IntegerField(size=2, signed=False)
    x_thumbnail = IntegerField(size=2, signed=False)
    y_thumbnail = IntegerField(size=2, signed=False)
    thumb_data = ArrayField(shape=None, elem_field_type=RGB)

class JPEGHeader(ByteStruct):
    soi = IntegerField(size=2, signed=False)
    marker = StructField(Marker)

with open('image.jpg', 'rb') as f:
    # Parse the JPEG header
    header = JPEGHeader(f.read())

    # Resize the thumbnail data
    header.marker.resize(
        Marker.thumb_data_field, header.marker.x_thumbnail * header.marker.y_thumbnail
    )

    # Display the thumbnail
    display_thumbnail(header.marker.thumb_data)

Writing custom struct logic

You can create high-level structures which define their own behavior depending on the data contained within the struct:

from bytefield import *

class DynamicFloatArray(ByteStruct):
    length = IntegerField(signed=False)
    array_data = ArrayField(None, FloatField)

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # When instantiated, resize the array according to its length
        self.resize(DynamicFloatArray.array_data_field, self.length)

data = bytearray(b'\x03\x00\x00\x00\x00\x00\x80?\x00\x00\x00@\x00\x00@@')
print(DynamicFloatArray(data))

Output:

[DynamicFloatArray object at 0x1c88e709e50]
length (int): 3
array_data (ndarray): [1.0 2.0 3.0]

Variable fields

Bytefield supports fields with unknown type/size:

from bytefiel import *

TYPE_INTEGER = 0
TYPE_FLOAT = 1
TYPE_STRING = 2

class DynamicString(ByteStruct):
    length = IntegerField(signed=False)
    str = StringField(None)

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.resize(DynamicString.str_field, self.length)

class Content(ByteStruct):
    content_type = IntegerField(signed=False, size=2)
    content_data = VariableField()  # a variable field that will be resized when parsing the struct

    def __init__(self, data: bytearray = None, *args, **kwargs):
        super().__init__(data, *args, **kwargs)
        resize_bytes = not bool(data)
        if self.content_type == TYPE_INTEGER:
            self.resize(Content.content_data_field, IntegerField(), resize_bytes=resize_bytes)
        elif self.content_type == TYPE_FLOAT:
            self.resize(Content.content_data_field, FloatField(), resize_bytes=resize_bytes)
        elif self.content_type == TYPE_STRING:
            self.resize(Content.content_data_field, StructField(DynamicString), resize_bytes=resize_bytes)

write = Content()
write.content_type = TYPE_STRING
write.resize(Content.content_data_field, StructField(DynamicString), resize_bytes=True)
write.content_data.str = 'content'
write.content_data.length = len(write.content_data.str)

read = Content(write.data)
print(f'{write.data} is parsed to:\n{read}')

Output

bytearray(b'\x02\x00\x07\x00\x00\x00content') is parsed to:
[Content object at 0x1c1846888b0]
content_type (int): 2
content_data (DynamicString):
        length (int): 7
        str (str): content

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bytefield-1.0.2.tar.gz (17.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page