Input / Output

The dh_segment.io module implements input / output functions and classes.

Input functions for tf.Estimator

Input function

input_fn(input_data, params[, …])

Input_fn for estimator

Data augmentation

data_augmentation_fn(input_image, label_image)

Applies data augmentation to both images and label images.

extract_patches_fn(image, patch_shape, offsets)

Will cut a given image into patches.

rotate_crop(image, rotation[, crop, …])

Rotates and crops the images.

Resizing function

resize_image(image, size[, interpolation])

Resizes the image

load_and_resize_image(filename, channels[, …])

Loads an image from its filename and resizes it to the desired output size.

Tensorflow serving functions

serving_input_filename(resized_size)

serving_input_image()


PAGE XML and JSON import / export

PAGE classes

PAGE.Point(y, x)

Point (x,y) class.

PAGE.Text([text_equiv, alternatives, score])

Text entity produced by a transcription system.

PAGE.Border([coords, id])

Region containing the page.

PAGE.TextRegion([id, coords, text_lines, …])

Region containing text lines.

PAGE.TextLine([id, coords, baseline, text, …])

Region corresponding to a text line.

PAGE.GraphicRegion([id, coords, …])

Region containing simple graphics.

PAGE.TableRegion([id, coords, rows, …])

Tabular data in any form.

PAGE.SeparatorRegion(id[, coords, …])

Lines separating columns or paragraphs.

PAGE.GroupSegment([id, coords, segment_ids, …])

Set of regions that make a bigger region (group).

PAGE.Metadata([creator, created, …])

Metadata information.

PAGE.Page(**kwargs)

Class following PAGE-XML object.

Abstract classes

PAGE.BaseElement

Base page element class.

PAGE.Region([id, coords, custom_attribute])

Region base class.

Parsing and helpers

PAGE.parse_file(filename)

Parses the files to create the corresponding Page object.

PAGE.json_serialize(dict_to_serialize[, …])

Serialize a dictionary in order to export it.


VGG Image Annotator helpers

VIA objects

via.WorkingItem

A container for annotated images.

via.VIAttribute

A container for VIA attributes.

Creating masks with VIA annotations

via.load_annotation_data(via_data_filename)

Load the content of via annotation files.

via.export_annotation_dict(annotation_dict, …)

Export the annotations to json file.

via.get_annotations_per_file(via_dict, name_file)

From VIA json content, get annotations relative to the given name_file.

via.parse_via_attributes(via_attributes)

Parses the VIA attribute dictionary and returns a list of VIAttribute instances

via.get_via_attributes(annotation_dict[, …])

Gets the attributes of the annotated data and returns a list of VIAttribute.

via.collect_working_items(via_annotations, …)

Given VIA annotation input, collect all info on WorkingItem object.

via.create_masks(masks_dir, working_items, …)

For each annotation, create a corresponding binary mask and resize it (h = 2000).

Formatting in VIA JSON format

via.create_via_region_from_coordinates(…)

Formats coordinates to a VIA region (dict).

via.create_via_annotation_single_image(…)

Returns a dictionary item {key: annotation} in VIA format to further export to .json file


dh_segment.io.input_fn(input_data, params, input_label_dir=None, data_augmentation=False, batch_size=5, make_patches=False, num_epochs=1, num_threads=4, image_summaries=False)

Input_fn for estimator

Parameters
  • input_data (Union[str, List[str]]) – input data. It can be a directory containing the images, it can be a list of image filenames, or it can be a path to a csv file.

  • params (dict) – params from utils.Params object

  • input_label_dir (Optional[str]) – directory containing the label images

  • data_augmentation (bool) – boolean, if True will scale, roatate, … the images

  • batch_size (int) – size of the bach

  • make_patches (bool) – bool, whether to make patches (crop image in smaller pieces) or not

  • num_epochs (int) – number of epochs to cycle trough data (set it to None for infinite repeat)

  • num_threads (int) – number of thread to use in parallele when usin tf.data.Dataset.map

  • image_summaries (bool) – boolean, whether to make tf.Summary to watch on tensorboard

Returns

fn

dh_segment.io.serving_input_filename(resized_size)
dh_segment.io.serving_input_image()
dh_segment.io.data_augmentation_fn(input_image, label_image, flip_lr=True, flip_ud=True, color=True)

Applies data augmentation to both images and label images. Includes left-right flip, up-down flip and color change.

Parameters
  • input_image (tensorflow.Tensor) – images to be augmented [B, H, W, C]

  • label_image (tensorflow.Tensor) – corresponding label images [B, H, W, C]

  • flip_lr (bool) – option to flip image in left-right direction

  • flip_ud (bool) – option to flip image in up-down direction

  • color (bool) – option to change color of images

Return type

(tensorflow.Tensor, tensorflow.Tensor)

Returns

the tuple (augmented images, augmented label images) [B, H, W, C]

dh_segment.io.rotate_crop(image, rotation, crop=True, minimum_shape=[0, 0], interpolation='NEAREST')

Rotates and crops the images.

Parameters
  • image (tensorflow.Tensor) – image to be rotated and cropped [H, W, C]

  • rotation (float) – angle of rotation (in radians)

  • crop (bool) – option to crop rotated image to avoid black borders due to rotation

  • minimum_shape (Tuple[int, int]) – minimum shape of the rotated image / cropped image

  • interpolation (str) – which interpolation to use NEAREST or BILINEAR

Return type

tensorflow.Tensor

Returns

dh_segment.io.resize_image(image, size, interpolation='BILINEAR')

Resizes the image

Parameters
  • image (tensorflow.Tensor) – image to be resized [H, W, C]

  • size (int) – size of the resized image (in pixels)

  • interpolation (str) – which interpolation to use, NEAREST or BILINEAR

Return type

tensorflow.Tensor

Returns

resized image

dh_segment.io.load_and_resize_image(filename, channels, size=None, interpolation='BILINEAR')

Loads an image from its filename and resizes it to the desired output size.

Parameters
  • filename (str) – string tensor

  • channels (int) – number of channels for the decoded image

  • size (Optional[int]) – number of desired pixels in the resized image, tf.Tensor or int (None for no resizing)

  • interpolation (str) –

  • return_original_shape – returns the original shape of the image before resizing if this flag is True

Return type

tensorflow.Tensor

Returns

decoded and resized float32 tensor [h, w, channels],

dh_segment.io.extract_patches_fn(image, patch_shape, offsets)

Will cut a given image into patches.

Parameters
  • image (tensorflow.Tensor) – tf.Tensor

  • patch_shape (Tuple[int, int]) – shape of the extracted patches [h, w]

  • offsets (Tuple[int, int]) – offset to add to the origin of first patch top-right coordinate, useful during data augmentation to have slighlty different patches each time. This value will be multiplied by [h/2, w/2] (range values [0,1])

Return type

tensorflow.Tensor

Returns

patches [batch_patches, h, w, c]

dh_segment.io.local_entropy(tf_binary_img, sigma=3)
Parameters
  • tf_binary_img (tensorflow.Tensor) –

  • sigma (float) –

Return type

tensorflow.Tensor

Returns

class dh_segment.io.PAGE.BaseElement

Base page element class. (Abstract)

classmethod check_tag(tag)
classmethod full_tag()
Return type

str

tag = None
class dh_segment.io.PAGE.Border(coords=None, id=None)

Region containing the page. It is the border of the actual page of the document (if the scanned image contains parts not belonging to the page).

Variables

coords – coordinates of the Border region

classmethod from_dict(dictionary)
Return type

Border

classmethod from_xml(e)
Return type

Border

tag = 'Border'
to_dict(non_serializable_keys=[])
Return type

dict

to_xml()
Return type

Element

class dh_segment.io.PAGE.GraphicRegion(id=None, coords=None, custom_attribute=None)

Region containing simple graphics. Company logos for example should be marked as graphic regions.

Variables
  • id – identifier of the GraphicRegion

  • coords – coordinates of the GraphicRegion

classmethod from_dict(dictionary)

From a seralized dictionary creates a dictionary of the atributes (non serialized)

Parameters

dictionary (dict) – serialized dictionary

Return type

GraphicRegion

Returns

non serialized dictionary

classmethod from_xml(e)

Creates a dictionary from a XML structure in order to create the inherited objects

Parameters

etree_element – a xml etree

Return type

GraphicRegion

Returns

a dictionary with keys ‘id’ and ‘coords’

tag = 'GraphicRegion'
to_xml(name_element='GraphicRegion')

Converts a Region object to a xml structure

Parameters

name_element – name of the object (optional)

Return type

Element

Returns

a etree structure

class dh_segment.io.PAGE.GroupSegment(id=None, coords=None, segment_ids=None, custom_attribute=None)

Set of regions that make a bigger region (group). GroupSegment is a region containing several TextLine and that form a bigger region. It is used mainly to make line / column regions. Only for JSON export (no PAGE XML correspondence).

Variables
  • id – identifier of the GroupSegment

  • coords – coordinates of the GroupSegment

  • segment_ids – list of the regions ids belonging to the group

classmethod from_dict(dictionary)

From a seralized dictionary creates a dictionary of the atributes (non serialized)

Parameters

dictionary (dict) – serialized dictionary

Return type

GroupSegment

Returns

non serialized dictionary

class dh_segment.io.PAGE.Metadata(creator=None, created=None, last_change=None, comments=None)

Metadata information.

Variables
  • creator – name of the process of person that created the exported file

  • created – time of creation of the file

  • last_change – time of last modification of the file

  • comments – comments on the process

classmethod from_dict(dictionary)
Return type

Metadata

classmethod from_xml(e)
Return type

Metadata

tag = 'Metadata'
to_dict()
to_xml()
Return type

Element

class dh_segment.io.PAGE.Page(**kwargs)

Class following PAGE-XML object. This class is used to represent the information of the processed image. It is possible to export this info as PAGE-XML or JSON format.

Variables
  • image_filename – filename of the image

  • image_width – width of the original image

  • image_height – height of the original image

  • text_regions – list of TextRegion

  • graphic_regions – list of GraphicRegion

  • page_borderBorder of the page

  • separator_regions – list of SeparatorRegion

  • table_regions – list of TableRegion

  • metadataMetadata of the image and process

  • line_groups – list of GroupSegment forming lines

  • column_groups – list of GroupSegment forming columns

draw_baselines(img_canvas, color=(255, 0, 0), thickness=2, endpoint_radius=4, autoscale=True)

Given an image, draws the TextLines.baselines.

Parameters
  • img_canvas (ndarray) – 3 channel image in which the region will be drawn. The image is modified inplace.

  • color (Tuple[int, int, int]) – (R, G, B) value color

  • thickness (int) – the thickness of the line

  • endpoint_radius (int) – the radius of the endpoints of line s(first and last coordinates of line)

  • autoscale (bool) – whether to scale the coordinates to the size of img_canvas. If True, it will use the dimensions provided in Page.image_width and Page.image_height to compute the scaling ratio

draw_column_groups(img_canvas, color=(0, 255, 0), fill=False, thickness=5, autoscale=True)

It will draw column groups (in case of a table). This is only valid when parsing JSON files.

Parameters
  • img_canvas (ndarray) – 3 channel image in which the region will be drawn. The image is modified inplace

  • color (Tuple[int, int, int]) – (R, G, B) value color

  • fill (bool) – either to fill the region (True) of only draw the external contours (False)

  • thickness (int) – in case fill=False the thickness of the line

  • autoscale (bool) – whether to scale the coordinates to the size of img_canvas. If True, it will use the dimensions provided in Page.image_width and Page.image_height to compute the scaling ratio

draw_graphic_regions(img_canvas, color=(255, 0, 0), fill=True, thickness=3, autoscale=True)

Given an image, draws the GraphicRegions, either fills it (fill=True) or draws the contours (fill=False)

Parameters
  • img_canvas (ndarray) – 3 channel image in which the region will be drawn. The image is modified inplace.

  • color (Tuple[int, int, int]) – (R, G, B) value color

  • fill (bool) – either to fill the region (True) of only draw the external contours (False)

  • thickness (int) – in case fill=True the thickness of the line

  • autoscale (bool) – whether to scale the coordinates to the size of img_canvas. If True, it will use the dimensions provided in Page.image_width and Page.image_height to compute the scaling ratio

draw_line_groups(img_canvas, color=(0, 255, 0), fill=False, thickness=5, autoscale=True)

It will draw line groups. This is only valid when parsing JSON files.

Parameters
  • img_canvas (ndarray) – 3 channel image in which the region will be drawn. The image is modified inplace.

  • color (Tuple[int, int, int]) – (R, G, B) value color

  • fill (bool) – either to fill the region (True) of only draw the external contours (False)

  • thickness (int) – in case fill=False the thickness of the line

  • autoscale (bool) – whether to scale the coordinates to the size of img_canvas. If True, it will use the dimensions provided in Page.image_width and Page.image_height to compute the scaling ratio

draw_lines(img_canvas, color=(255, 0, 0), thickness=2, fill=True, autoscale=True)

Given an image, draws the polygons containing text lines, i.e TextLines.coords

Parameters
  • img_canvas (ndarray) – 3 channel image in which the region will be drawn. The image is modified inplace.

  • color (Tuple[int, int, int]) – (R, G, B) value color

  • thickness (int) – the thickness of the line

  • fill (bool) – if True fills the polygon

  • autoscale (bool) – whether to scale the coordinates to the size of img_canvas. If True, it will use the dimensions provided in Page.image_width and Page.image_height to compute the scaling ratio

draw_page_border(img_canvas, color=(255, 0, 0), fill=True, thickness=5, autoscale=True)

Given an image, draws the page border, either fills it (fill=True) or draws the contours (fill=False)

Parameters
  • img_canvas – 3 channel image in which the region will be drawn. The image is modified inplace.

  • color (Tuple[int, int, int]) – (R, G, B) value color

  • fill (bool) – either to fill the region (True) of only draw the external contours (False)

  • thickness (int) – in case fill=True the thickness of the line

  • autoscale (bool) – whether to scale the coordinates to the size of img_canvas. If True, it will use the dimensions provided in Page.image_width and Page.image_height to compute the scaling ratio

draw_separator_lines(img_canvas, color=(0, 255, 0), thickness=3, filter_by_id='', autoscale=True)

Given an image, draws the SeparatorRegion.

Parameters
  • img_canvas (ndarray) – 3 channel image in which the region will be drawn. The image is modified inplace.

  • color (Tuple[int, int, int]) – (R, G, B) value color

  • thickness (int) – thickness of the line

  • filter_by_id (str) – string to filter the lines by id. For example vertical/horizontal lines can be filtered if ‘vertical’ or ‘horizontal’ is mentioned in the id.

  • autoscale (bool) – whether to scale the coordinates to the size of img_canvas. If True, it will use the dimensions provided in Page.image_width and Page.image_height to compute the scaling ratio

draw_text(img_canvas, color=(255, 0, 0), thickness=5, font=cv2.FONT_HERSHEY_SIMPLEX, font_scale=1.0, autoscale=True)

Writes the text of the TextLine on the given image.

Parameters
  • img_canvas (ndarray) – 3 channel image in which the region will be drawn. The image is modified inplace

  • color (Tuple[int, int, int]) – (R, G, B) value color

  • thickness (int) – the thickness of the characters

  • font – the type of font (cv2 constant)

  • font_scale (float) – the scale of font

  • autoscale (bool) – whether to scale the coordinates to the size of img_canvas. If True, it will use the dimensions provided in Page.image_width and Page.image_height to compute the scaling ratio

draw_text_regions(img_canvas, color=(255, 0, 0), fill=True, thickness=3, autoscale=True)

Given an image, draws the TextRegions, either fills it (fill=True) or draws the contours (fill=False)

Parameters
  • img_canvas (ndarray) – 3 channel image in which the region will be drawn. The image is modified inplace.

  • color (Tuple[int, int, int]) – (R, G, B) value color

  • fill (bool) – either to fill the region (True) of only draw the external contours (False)

  • thickness (int) – in case fill=True the thickness of the line

  • autoscale (bool) – whether to scale the coordinates to the size of img_canvas. If True, it will use the dimensions provided in Page.image_width and Page.image_height to compute the scaling ratio

classmethod from_dict(dictionary)
Return type

Page

classmethod from_xml(e)
Return type

Page

tag = 'Page'
to_json()
Return type

dict

to_xml()
Return type

Element

write_to_file(filename, creator_name='dhSegment', comments='')

Export Page object to json or page-xml format. Will assume the format based on the extension of the filename, if there is no extension will export as an xml file.

Parameters
  • filename (str) – filename of the file to be exported

  • creator_name (str) – name of the creator (process or person) creating the file

  • comments (str) – optionnal comment to add to the metadata of the file.

Return type

None

class dh_segment.io.PAGE.Point(y, x)

Point (x,y) class.

Variables
  • y – vertical coordinate

  • x – horizontal coordinate

classmethod array_to_list(array)

Converts an np.array to a list of coordinates

Parameters

array (ndarray) – an array of coordinates. Must be of shape (N, 2)

Return type

list

Returns

list of coordinates, shape (N,2)

classmethod array_to_point(array)

Converts an np.array to a list of Point

Parameters

array (ndarray) – an array of coordinates. Must be of shape (N, 2)

Return type

list

Returns

list of Point

classmethod cv2_to_point_list(cv2_array)

Converts an opencv-formatted set of coordinates to a list of Point

Parameters

cv2_array (ndarray) – opencv-formatted set of coordinates, shape (N,1,2)

Return type

List[Point]

Returns

list of Point

classmethod list_from_xml(etree_elem)

Converts a PAGEXML-formatted set of coordinates to a list of Point

Parameters

etree_elem (Element) – etree XML element containing a set of coordinates

Return type

List[Point]

Returns

a list of coordinates as Point

classmethod list_point_to_string(list_points)

Converts a list of Point to a string ‘x,y’

Parameters

list_points (List[Point]) – list of coordinates with Point format

Return type

str

Returns

a string with the coordinates

classmethod list_to_cv2poly(list_points)

Converts a list of Point to opencv format set of coordinates

Parameters

list_points (List[Point]) – set of coordinates

Return type

ndarray

Returns

opencv-formatted set of points, shape (N,1,2)

classmethod list_to_point(list_coords)

Converts a list of coordinates to a list of Point

Parameters

list_coords (list) – list of coordinates, shape (N, 2)

Return type

List[Point]

Returns

list of Point

classmethod point_to_list(points)

Converts a list of Point to a list of coordinates

Parameters

points (List[Point]) – list of Points

Return type

list

Returns

list of shape (N,2)

to_dict()
class dh_segment.io.PAGE.Region(id=None, coords=None, custom_attribute=None)

Region base class. (Abstract) This is the superclass for all the extracted regions

Variables
  • id – identifier of the Region

  • coords – coordinates of the Region

  • custom_attribute – Any custom attribute that may be linked with the region (usually this is added in PAGEXML files, not in JSON files)

classmethod from_dict(dictionary)

From a seralized dictionary creates a dictionary of the atributes (non serialized)

Parameters

dictionary (dict) – serialized dictionary

Return type

dict

Returns

non serialized dictionary

classmethod from_xml(etree_element)

Creates a dictionary from a XML structure in order to create the inherited objects

Parameters

etree_element (Element) – a xml etree

Return type

dict

Returns

a dictionary with keys ‘id’ and ‘coords’

tag = 'Region'
to_dict(non_serializable_keys=[])

Converts a Region object to a dictionary.

Parameters

non_serializable_keys (List[str]) – list of keys that can’t be directly serialized and that need some internal serialization

Return type

dict

Returns

a dictionary with the atributes of the object serialized

to_xml(name_element=None)

Converts a Region object to a xml structure

Parameters

name_element (Optional[str]) – name of the object (optional)

Return type

Element

Returns

a etree structure

class dh_segment.io.PAGE.SeparatorRegion(id, coords=None, custom_attribute=None)

Lines separating columns or paragraphs. Separators are lines that lie between columns and paragraphs and can be used to logically separate different articles from each other.

Variables
  • id – identifier of the SeparatorRegion

  • coords – coordinates of the SeparatorRegion

classmethod from_dict(dictionary)

From a seralized dictionary creates a dictionary of the atributes (non serialized)

Parameters

dictionary (dict) – serialized dictionary

Return type

SeparatorRegion

Returns

non serialized dictionary

classmethod from_xml(e)

Creates a dictionary from a XML structure in order to create the inherited objects

Parameters

etree_element – a xml etree

Return type

SeparatorRegion

Returns

a dictionary with keys ‘id’ and ‘coords’

tag = 'SeparatorRegion'
to_xml(name_element='SeparatorRegion')

Converts a Region object to a xml structure

Parameters

name_element – name of the object (optional)

Return type

Element

Returns

a etree structure

class dh_segment.io.PAGE.TableRegion(id=None, coords=None, rows=None, columns=None, embedded_text=None, custom_attribute=None)

Tabular data in any form. Tabular data is represented with a table region. Rows and columns may or may not have separator lines; these lines are not separator regions.

Variables
  • id – identifier of the TableRegion

  • coords – coordinates of the TableRegion

  • rows – number of rows in the table

  • columns – number of columns in the table

  • embedded_text – if text is embedded in the table

classmethod from_dict(dictionary)

From a seralized dictionary creates a dictionary of the atributes (non serialized)

Parameters

dictionary (dict) – serialized dictionary

Return type

TableRegion

Returns

non serialized dictionary

classmethod from_xml(e)

Creates a dictionary from a XML structure in order to create the inherited objects

Parameters

etree_element – a xml etree

Return type

TableRegion

Returns

a dictionary with keys ‘id’ and ‘coords’

tag = 'TableRegion'
to_xml(name_element='TableRegion')

Converts a Region object to a xml structure

Parameters

name_element – name of the object (optional)

Return type

Element

Returns

a etree structure

class dh_segment.io.PAGE.Text(text_equiv=None, alternatives=None, score=None)

Text entity produced by a transcription system.

Variables
  • text_equiv – the transcription of the text

  • alternatives – alternative transcriptions

  • score – the confidence of the transcription output by the transcription system

to_dict()
Return type

dict

class dh_segment.io.PAGE.TextLine(id=None, coords=None, baseline=None, text=None, line_group_id=None, column_group_id=None, custom_attribute=None)

Region corresponding to a text line.

Variables
  • id – identifier of the TextLine

  • coords – coordinates of the Texline line

  • baseline – coordinates of the Texline baseline

  • textText class containing the transcription of the TextLine

  • line_group_id – identifier of the line group the instance belongs to

  • column_group_id – identifier of the column group the instance belongs to

  • custom_attribute – Any custom attribute that may be linked with the region (usually this is added in PAGEXML files, not in JSON files)

classmethod from_array(cv2_coords=None, baseline_coords=None, text_equiv=None, id=None)
classmethod from_dict(dictionary)

From a seralized dictionary creates a dictionary of the atributes (non serialized)

Parameters

dictionary (dict) – serialized dictionary

Return type

TextLine

Returns

non serialized dictionary

classmethod from_xml(etree_element)

Creates a dictionary from a XML structure in order to create the inherited objects

Parameters

etree_element (Element) – a xml etree

Return type

TextLine

Returns

a dictionary with keys ‘id’ and ‘coords’

scale_baseline_points(ratio)

Scales the points of the baseline by a factor ratio.

Parameters

ratio (float) – factor to rescale the baseline coordinates

tag = 'TextLine'
to_dict(non_serializable_keys=[])

Converts a Region object to a dictionary.

Parameters

non_serializable_keys (List[str]) – list of keys that can’t be directly serialized and that need some internal serialization

Returns

a dictionary with the atributes of the object serialized

to_xml(name_element='TextLine')

Converts a Region object to a xml structure

Parameters

name_element – name of the object (optional)

Return type

Element

Returns

a etree structure

class dh_segment.io.PAGE.TextRegion(id=None, coords=None, text_lines=None, text_equiv='', region_type=None, custom_attribute=None)

Region containing text lines. It can represent a paragraph or a page for instance.

Variables
  • id – identifier of the TextRegion

  • coords – coordinates of the TextRegion

  • text_equiv – the resulting text of the Text contained in the TextLines

  • text_lines – a list of TextLine objects

  • region_type – the type of a TextRegion (can be any string). Example : header, paragraph, page-number…

  • custom_attribute – Any custom attribute that may be linked with the region (usually this is added in PAGEXML files, not in JSON files)

classmethod from_dict(dictionary)

From a seralized dictionary creates a dictionary of the atributes (non serialized)

Parameters

dictionary (dict) – serialized dictionary

Return type

TextRegion

Returns

non serialized dictionary

classmethod from_xml(e)

Creates a dictionary from a XML structure in order to create the inherited objects

Parameters

etree_element – a xml etree

Return type

TextRegion

Returns

a dictionary with keys ‘id’ and ‘coords’

sort_text_lines(top_to_bottom=True)

Sorts TextLine from top to bottom according to their mean y coordinate (centroid)

Parameters

top_to_bottom (bool) – order lines from top to bottom of image, default=True

Return type

None

tag = 'TextRegion'
to_dict(non_serializable_keys=[])

Converts a Region object to a dictionary.

Parameters

non_serializable_keys (List[str]) – list of keys that can’t be directly serialized and that need some internal serialization

Returns

a dictionary with the atributes of the object serialized

to_xml(name_element='TextRegion')

Converts a Region object to a xml structure

Parameters

name_element – name of the object (optional)

Return type

Element

Returns

a etree structure

dh_segment.io.PAGE.get_unique_tags_from_xml_text_regions(xml_filename, tag_pattern='{type:.*;}')

Get a list of all the values of labels/tags

Parameters
  • xml_filename (str) – filename of the xml file

  • tag_pattern (str) – regular expression pattern to look for in TextRegion.custom_attribute

Returns

dh_segment.io.PAGE.json_serialize(dict_to_serialize, non_serializable_keys=[])

Serialize a dictionary in order to export it.

Parameters
  • dict_to_serialize (dict) – dictionary to serialize

  • non_serializable_keys (List[str]) – keys that are not directly seriazable sucha as python objects

Return type

dict

Returns

the serialized dictionnary

dh_segment.io.PAGE.parse_file(filename)

Parses the files to create the corresponding Page object. The files can be a .xml or a .json.

Parameters

filename (str) – file to parse (either json of page xml)

Return type

Page

Returns

Page object containing all the parsed elements

dh_segment.io.PAGE.save_baselines(filename, baselines, ratio=(1, 1), predictions_shape=None)
Parameters
  • filename (str) – filename to save baselines to

  • baselines – list of baselines

  • ratio (Tuple[int, int]) – ratio of prediction shape over original shape

  • predictions_shape (Optional[Tuple[int, int]]) – shape of the masks output by the network

Return type

Page

Returns

class dh_segment.io.via.VIAttribute

A container for VIA attributes.

Parameters
  • name (str) – The name of attribute

  • type (str) – The type of the annotation (dropdown, markbox, …)

  • options (list) – The options / labels possible for this attribute.

property name

Alias for field number 0

property options

Alias for field number 2

property type

Alias for field number 1

class dh_segment.io.via.WorkingItem

A container for annotated images.

Parameters
  • collection (str) – name of the collection

  • image_name (str) – name of the image

  • original_x (int) – original image x size (width)

  • original_y (int) – original image y size (height)

  • reduced_x (int) – resized x size

  • reduced_y (int) – resized y size

  • iiif (str) – iiif url

  • annotations (dict) – VIA ‘region_attributes’

property annotations

Alias for field number 7

property collection

Alias for field number 0

property iiif

Alias for field number 6

property image_name

Alias for field number 1

property original_x

Alias for field number 2

property original_y

Alias for field number 3

property reduced_x

Alias for field number 4

property reduced_y

Alias for field number 5

dh_segment.io.via.collect_working_items(via_annotations, collection_name, images_dir=None, via_version=2)

Given VIA annotation input, collect all info on WorkingItem object. This function will take care of separating images from local files and images from IIIF urls.

Parameters
  • via_annotations (dict) – via annotations (‘regions’ field)

  • images_dir (Optional[str]) – directory where to find the images

  • collection_name (str) – name of the collection

  • via_version (int) – version of the VIA tool used to produce the annotations (1 or 2)

Return type

List[WorkingItem]

Returns

list of WorkingItem

dh_segment.io.via.convert_via_region_page_text_region(working_item, structure_label)
Parameters
  • working_item (WorkingItem) –

  • structure_label (str) –

Return type

Page

Returns

dh_segment.io.via.create_masks(masks_dir, working_items, via_attributes, collection, contours_only=False)

For each annotation, create a corresponding binary mask and resize it (h = 2000). Only valid for VIA 2.0. Several annotations of the same class on the same image produce one image with several masks.

Parameters
  • masks_dir (str) – where to output the masks

  • working_items (List[WorkingItem]) – infos to work with

  • via_attributes (List[VIAttribute]) – VIAttributes computed by get_via_attributes function.

  • collection (str) – name of the nollection

  • contours_only (bool) – creates the binary masks only for the contours of the object (thickness of contours : 20 px)

Return type

dict

Returns

annotation_summary, a dictionary containing a list of labels per image

dh_segment.io.via.create_via_annotation_single_image(img_filename, via_regions, file_attributes=None)

Returns a dictionary item {key: annotation} in VIA format to further export to .json file

Parameters
  • img_filename (str) – path to the image

  • via_regions (List[dict]) – regions in VIA format (output from create_via_region_from_coordinates)

  • file_attributes (Optional[dict]) – file attributes (usually None)

Return type

Dict[str, dict]

Returns

dictionary item with key and annotations in VIA format

dh_segment.io.via.create_via_region_from_coordinates(coordinates, region_attributes, type_region)

Formats coordinates to a VIA region (dict).

Parameters
  • coordinates (<built-in function array>) – (N, 2) coordinates (x, y)

  • region_attributes (dict) – dictionary with keys : name of labels, values : values of labels

  • type_region (str) – via region annotation type (‘rect’, ‘polygon’)

Return type

dict

Returns

a region in VIA style (dict/json)

dh_segment.io.via.export_annotation_dict(annotation_dict, filename)

Export the annotations to json file.

Parameters
  • annotation_dict (dict) – VIA annotations

  • filename (str) – filename to export the data (json file)

Return type

None

Returns

dh_segment.io.via.get_annotations_per_file(via_dict, name_file)

From VIA json content, get annotations relative to the given name_file.

Parameters
  • via_dict (dict) – VIA annotations content (originally json)

  • name_file (str) – the file to look for (it can be a iiif path or a file path)

Return type

dict

Returns

dict

dh_segment.io.via.get_via_attributes(annotation_dict, via_version=2)

Gets the attributes of the annotated data and returns a list of VIAttribute.

Parameters
  • annotation_dict (dict) – json content of the VIA exported file

  • via_version (int) – either 1 or 2 (for VIA v 1.0 or VIA v 2.0)

Return type

List[VIAttribute]

Returns

A list containing VIAttributes

dh_segment.io.via.load_annotation_data(via_data_filename, only_img_annotations=False, via_version=2)

Load the content of via annotation files.

Parameters
  • via_data_filename (str) – via annotations json file

  • only_img_annotations (bool) – load only the images annotations (‘_via_img_metadata’ field)

  • via_version (int) –

Return type

dict

Returns

the content of json file containing the region annotated

dh_segment.io.via.parse_via_attributes(via_attributes)

Parses the VIA attribute dictionary and returns a list of VIAttribute instances

Parameters

via_attributes (dict) – attributes from VIA annotation (‘_via_attributes’ field)

Return type

List[VIAttribute]

Returns

list of VIAttribute