Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format...

27
1 Multi-Frame Matrix Capture Common File Format (MFMC-CFF) Technical Specification University of Bristol, Ultrasonics and NDT Group Martin Mienczakowski, January 2015 Version 0.5 Draft for General Comment

Transcript of Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format...

Page 1: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

1

Multi-Frame Matrix Capture Common File Format (MFMC-CFF) Technical Specification

University of Bristol, Ultrasonics and NDT Group

Martin Mienczakowski, January 2015

Version 0.5 Draft for General Comment

Page 2: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

2

Contents

1 Introduction 3

1.1 Background 3

1.2 Aims of The File Format 3

2 File Format 4

2.1 Downselection of File Format 4

3 Technical Specification 6

3.1 Definitions 6

3.1.1 Probe Geometry 6

3.1.2 Waveform 7

3.1.3 FMC Sequence 8

3.1.4 General Setup 9

3.1.5 File Format Specific 10

3.2 File Structure 11

3.2.1 General File Structure 11

3.2.2 Essential Parameters 11

3.2.3 MFMC Data 18

3.2.4 User Defined Parameters 21

4 Examples & Recommendations 22

4.1 Usage of Hyper-slabs / Chunking & Storage of Data to Enable Single Frame Recall 22

4.2 Allowable Differences Between Fields / Frames Within The Format 23

5 Tables 24

6 Acknowledgements 26

7 References 27

Page 3: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

3

1 Introduction

1.1 BACKGROUND

A project is on-going at the University of Bristol to improve the uptake of Full Matrix Capture (FMC) by researchers, equipment manufacturers and industrial end-users. FMC is the recording of coherent signals from distributed transmit / receive combinations with appropriate phase information that allows reconstruction and signal processing post acquisition.

It has been proposed that the impact of FMC would be enhanced if there were a Multi-Frame Matrix Capture Common File Format (MFMC-CFF) allowing the transfer of experimental data between organisations. MFMC is taken to mean the capture of multiple matrices of data, this could include full matrix capture but does not exclude, half matrix capture or the capturing of other part matrix methods. This file format will consist of a framework / specification with a library which allows fundamental important parameters to be stored alongside experimental data and user defined parameters.

In this project a number of existing file formats have been considered against potentially developing a completely new file format. Following the analysis conducted it has been shown that a new file format based upon HDF5, offers the best balance between functionality, ease of adoption and development time.

The aim of this document is to define the method by which MFMC data may be stored in order to allow ease of transfer between different organisations and/or data acquisition or data processing systems. This document also contains examples and recommendations of how to work efficiently within the framework for a number of scenarios.

Following this technical specification, demonstration software will be developed to display the concepts in practice. This software will be freely available to all project participants.

1.2 AIMS OF THE FILE FORMAT

The guiding principles of the file format are to:

• Enable the sharing of experimental data in a standard way

• Provide standard variable names for commonly measured parameters

• Ensure that a basic image can be constructed from a data set by defining a set of essential parameters

• Allow more complex images to be constructed by providing space for user defined parameters

• Provide flexibility in the definition of the format to account for novel acquisition / processing methods

• Consideration of conceivable future developments

Further specific requirements have been defined in a requirements document [1], which has been reviewed by the project participants.

Page 4: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

4

2 File Format

2.1 DOWNSELECTION OF FILE FORMAT

In order to decide the technical direction for the development of the MFMC-CFF it was necessary to evaluate a number of existing formats and compare these with the potential development of a completely new bespoke format for the application.

The formats that were considered are as follows:

• AVI / movie formats

• DICONDE

• HDF5

• Multi-Page TIFF / BIG TIFF

• Ontario Power Group custom file format

• Rolls-Royce Marine custom file format

• University of Strathclyde custom file format

• XML type file format

• Bespoke file format

To evaluate the formats a questionnaire was developed using the requirements document to define the areas of critical importance to the project. The questionnaire was completed drawing upon the experience of project participants where relevant. Finally a comparison was made between the different file formats [Table 5.1].

Analysis of the table shows that the only file format to satisfy all of the requirements is a bespoke format based upon the HDF5 framework.

HDF5 is a subset of a family of file formats (Hierarchical Data Format) designed to organise and store large amounts of complex scientific data. Having been initially developed by the National Center for Supercomputing Applications (NCSA) in the USA the format is currently supported by the not for profit HDF Group. The current standard is the result of over 20 years of development and has been in its current form since 1998 (with minor version upgrades). The format has found usage in a variety of fields from the financial sector through to space exploration [2]. The HDF5 data model, file format, API, library, and tools are open source and distributed without charge. In contrast some of the other formats examined have or may have costs for access of the specification as well as any software libraries. The absence of these costs in the case of HDF5 would greatly ease the process of adoption of this format compared to others. With these libraries comes a level of support and expertise from a community that has used the format for a number of years. The aims of the HDF Group include long-term availability and support for HDF technologies, and by extension, long-term accessibility of data stored using HDF technologies as well as taking advantage of innovations in the data management field.

The basis of the format is the data model, which stores data in hierarchies very much like the file system format users of personal computers will be familiar with. The format is self-describing allowing relationships between data to be drawn, however, in contrast to other self describing formats (for example XML) the data is stored in a binary format, it also has natively the ability to allow access to one part of a file without first reading in all preceding data.

There is a high level Application Programming Interface (API) that has C, C++, Java and Fortran interfaces supported by the HDF Group. Support for the file format is already included in a number of commercially available applications including MATLAB [3] and support for other applications for instance LABVIEW is provided by third party libraries [4]. The high level API greatly simplifies the interaction with the low level file read / write, insulating the user from the more complex concepts whilst still enjoying the benefits provided by them. From an end user perspective the file contains only:

• Datasets (which are arrays of data)

• Groups (which are structures that may contain datasets and other groups).

A full technical discussion of the low level functions is beyond the scope of this document, a full description is provided by the HDF Group [5]. It is however, appropriate to note that the concepts used at a low level make it well suited to large and complex datasets.

Although there hasn’t been widespread use of HDF5 for NDT data (although there has been some limited uptake i.e. The University of Strathclyde), the format is flexible and has been proven on large and complex datasets. The framework isn’t complex or rigid and hence it could readily be applied to MFMC data. Some of the other file formats examined could be modified to accept FMC data but this

Page 5: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

5

would involve modification of an existing format or standard and thus could introduce a significant time delay before such changes or additions were made.

The format satisfies the requirement for having the ability to store header information separate from the data. As a bespoke format only the parameters required would be in this header, other established formats have by their nature been tailored to specific user requirements and hence contain data within the header, which would be classified as “Custom data” within the terms of the requirements document. Use of a bespoke format in this application removes the need to rewrite the existing formats or provide a work-around to maintain compliance with an existing framework.

This format provides for 64 bit data values, and has the ability to have higher numbers of bits should there be native support on the machine in question for such data types. The data stored within the file may be appended to and the format also has natively the ability to read part of the dataset without first reading in all preceding data. This data may also be stored using compression algorithms that are built into the libraries for writing HDF5 files.

There are freely available viewers which will read data stored in the HDF5 format – these viewers allow the exploration of raw data as well as providing the ability to visualise the data in different formats (as an image for example).

Following the analysis conducted it has been shown that a file format based upon HDF5, offers the best balance between functionality, ease of adoption and development time.

Page 6: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

6

3 Technical Specification

3.1 DEFINITIONS

3.1.1 Probe Geometry

Element Location (x, y or z axis)

The distance between the Element Centre location (defined as the centroid of the element) and the Probe Datum Point along the axis in question.

Global Element Number

Each probe element shall be assigned a unique global element number (in addition to the local probe element number). The purpose of this requirement is to ensure that there is no possibility that an element from one probe could be confused with an element from another probe in post-processing.

Local Element Number

A unique reference number for each element within a probe. This number may be reused on different probes i.e. there could be two elements within the system numbered 1 on different probes.

Probe

Ultrasonic transmitting / receiving device which may contain one or more elements, used to collect MFMC data.

Probe Datum Point

Reference point that is defined by the user at an arbitrary point on the probe. It is from this point that location measurements are taken during scanning.

Probe Coordinate System

The PCS x, y and z axes are defined as follows:

X axis: for a rigid, flat 1D or 1.5D linear array, x is in the direction of largest number of elements. For a 2D array, or a curved or flexible array, the x axis is as defined by the manufacturer or user.

Y axis: for a rigid, flat 1D or 1.5D linear array, y is in the direction of smallest number of elements and is orthogonal to the x axis. For a 2D array, or a curved or flexible array, the y axis is as defined by the manufacturer or user but must be orthogonal to the x axis.

Z axis: for any array, this is orthogonal to both the x and y axes.

All axes must share a common unit of length, be signed and obey the right-hand rule. Although in large part the definition of the probe coordinate system is the decision of the manufacturer or user of the file format, it is advisable to select a system that simplifies the storage and interpretation of data. I.e. aligned with the axes and orientations of the individual probe elements.

Probe Location

The location in m of the probe in each of the cardinal axes in the global coordinate system for a single scan location (a single field or frame).

Probe Number

The unique reference number for each probe within the system. These shall be assigned in ascending numerical order starting with probe 1, probe number 1.

Probe Orientation

The rotation in degrees of the probe about each of the cardinal axes in the global coordinate system for a single scan location (a single field or frame).

Page 7: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

7

Figure 3.1 Probe Definitions

3.1.2 Waveform

Number of Points

The number of unique data samples within each individual Scan line.

Sample

A single data point within the Scan line, i.e. one measured value.

Sampling Rate

The frequency at which samples are obtained from the signal. This is assumed to be the same for and constant within all scanlines.

Scan line

A pitch-catch combination containing the raw data (response) when transmitting from element / aperture i to element / aperture j.

Vertical Resolution

The difference in volts between adjacent quantisation levels on the y-axis of the MFMC data.

Zero Level

The quantisation level equivalent to 0V.

Page 8: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

8

Figure 3.2 Waveform Definitions

3.1.3 FMC Sequence

Field (of Data)

An ensemble of Scan lines collected continuously without pause. This could be as the probe is moving or from a single point in space. Where the specification refers to frames of data, a user might equally prefer to store single fields of data and this is allowable within the format. The only condition is that all parameters defined within “Essential Parameters” must be common between fields/frames. Further discussion on this point is contained within section 4.2

Frame

The complete set of Scan line lines from all pulse-echo combinations specified by the experiment. This may be collected continuously (in which case the field is the same size as the frame) or it may be the result of combining many fields of data. Where the specification refers to frames of data, a user might equally prefer to store single fields of data and this is allowable within the format. The only condition is that all parameters defined within “Essential Parameters” must be common between fields/frames. Further discussion on this point is contained within section 4.2

Full Matrix Capture

FMC is the recording of coherent signals from distributed transmit / receive combinations with appropriate phase information that allows reconstruction and signal processing post acquisition.

Multi-Frame Matrix Capture

Multi-Frame Matrix Capture is taken to mean the capture of multiple matrices of data, this could include full matrix capture but does not exclude, half matrix capture or the capturing of other part matrix methods. This file format will consist of a framework / specification with a library which allows fundamental important parameters to be stored alongside experimental data and user defined parameters

Page 9: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

9

Figure 3.3 Relationship Between Fields and Frames of Data

3.1.4 General Setup

Coupling Medium

The medium used to acoustically couple the probe to the specimen medium. In some experimental scenarios there may not be a coupling medium present.

Global Coordinate System

The GCS x, y and z axes are defined as follows:

X axis: User defined.

Y axis: The direction orthogonal to the x axis.

Z axis: this is orthogonal to both the x and y axes.

All axes must share a common unit of length, be signed and obey the right-hand rule. This is the coordinate system in which the location of the probe datum point and probe orientation is specified.

Specimen Medium

The medium is that being inspected within the experiment.

Page 10: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

10

Figure 3.4 General Setup Definitions

3.1.5 File Format Specific

Common Variable Names

The following definitions are constant throughout the technical specification: f = number of frames within the scan (this may be increased after initial storage) g = number of global elements m = number of separate probes n = maximum number of elements in any probe used p = number of data acquisitions in a waveform (number of points per scanline) s = maximum number of scanlines in any frame

Dataset

An array of data of one data type only. This array could consist of a single element or be a multi-dimensional array of many elements.

Data Type

The format of the data contained within an array. The two data types allowed within the file format for MFMC raw data storage are signed and unsigned integers stored within whole numbers of bytes, i.e. 8-bit, 16-bit etc. For other parameters the type of data required i.e. integer is specified but the precision may be defined by the user dependent upon their requirements,

Numbering

All numbering i.e. element number, probe reference number etc. shall start at one and increment by one for each additional element, probe etc. within the system.

Variable Name Language

The language used for variable names in the mandatory part of the file format is English (UK).

Page 11: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

11

3.2 FILE STRUCTURE

3.2.1 General File Structure

The file will conform to the HDF5 structure and the data contained therein will be compatible with the publicly available HDF5 libraries. The file will make use of the group / dataset functionality provided by the standard to order the data in a logical fashion. There will be a group for each of the three categories: 1) Essential Parameters, 2) User-Defined Parameters and 3) MFMC Data. Within these groups there will be datasets. The Essential Parameters group will be split further into further sub-groups, “Probe Geometry”, “Probe Setup”, “Ultrasonic Setup”, “Waveform Encoding” and “General Setup”. Each dataset will contain only one type of data and will correspond to a particular parameter. A graphical representation of this structure is as below:

Figure 3.5 MFMC-CFF File Structure

The three groups shall reside at the top-level of the file and shall be named:

• ESSENTIAL_PARAMETERS

• USER_PARAMETERS

• MFMC_DATA

3.2.2 Essential Parameters

Within the Essential Parameters group the following datasets must be present in the format prescribed below. The precision of the data type is to be defined by the user, taking into account the accuracy of the measurement for the specific setup in question and the requirement to use the smallest number of bits possible for each data type in order minimise the size of the MFMC files.

Mandatory means even if the parameter is not expected to be used it has to be set to a correct value (to an accuracy decided by the operator). E.g. Y element coordinates on a 1D array.

Nominal means that a ‘starting’ value is useful for image reconstruction, but this can be modified later in order to optimise the image reconstruction. E.g. Specimen velocity.

Optional means that a flag will be provided to indicate whether the parameter is valid or not, parameters of this type are not essential for the majority of reconstruction algorithms.

3.2.2.1 Probe Geometry

Geometry array sizes

The sizes of different dimensions in the arrays of parameters in the file

Mandatory

Data type 1D array (m,n,g,p,f,s) integer

Dataset name GEOMETRY_ARRAY_SIZES

Array of integers giving the sizes of arrays in the headers in this section.

MFMC CFF File

Essential

Parameter

Probe Geometrye.g Element

Locations

Ultrasonic Setup

etc.

e.g Amplifier

Gain

MFMC Data Raw FMC Data

User-defined

Parameters

Unspecified

'custom'

information

Page 12: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

12

Element Locations

3D Element location relative to origin of Probe Coordinate System (PCS) (All elements)

Mandatory

Data type 2D array (g x 3) float

Dataset name ELEMENT_LOCATIONS

Array of floating point values (in m) g columns by 3 rows. The columns shall be ordered in ascending global element number order – i.e. column 1 will represent global element number 1, column 2 will represent global element 2 and so on. Row 1 shall represent the x-axis distance between the Element Centre location (defined as the centroid of the element) and the Probe Datum Point for that probe, row 2 the y distance and row 3 the z distance.

Element Shape

Element shape Mandatory

Data type 1D array (g x 1) Integer

Dataset name ELEMENT_SHAPE

The element shape defined as follows: 0 – ellipse, 1 – rectangle, 2 – other. If the axes of the ellipse or the edges of the rectangle are not aligned to the probe coordinate system the shape is treated as “other”. Column 1 shall represent global element number 1, column 2 will represent global element 2 and so on.

Element Dimensions

Element dimensions Mandatory

Data type 2D array (g x 2) float

Dataset name ELEMENT_DIMENSIONS

Array of floating point values (in m) n columns by 2 rows. The columns shall be ordered in ascending global number order – i.e. column 1 will represent local element number 1, column 2 will represent local element 2 and so on. The two values in each column depend on the shape chosen above:

Ellipse - value 1 is the x-axis width of the ellipse and value 2 is the y-axis width of the ellipse.

Rectangle – value 1 is the x-axis width, value 2 is the y-axis width.

Other - value 1 is the approximate (or maximum) x-axis width, value 2 is the approximate (or maximum) y-axis width.

Figure 3.6 Element Dimensions

Page 13: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

13

Element Orientation

Element orientation relative to PCS (three angles)

Mandatory

Data type 2D array (g x 3) float

Dataset name ELEMENT_ORIENTATIONS

Array of floating point values g columns by 3 rows. The columns shall be ordered in ascending global element number order – i.e. column 1 will represent global element number 1, column 2 will represent global element 2 and so on. Row 1 shall represent the rotation about the x axis in degrees (relative to the PCS), row 2 the rotation about the y axis and row 3 the rotation about the z axis.

Figure 3.7 Element Orientation

Element Focal Length in Water

Focal length in water due to element curvature (0 if planar)

Mandatory

Data type 1D array (m x 1) float

Dataset name NATURAL_FOCAL_LENGTH

The natural focal length of each probe in water measured in m. A value of NaN will be taken to mean that the element is flat, (in IEEE Standard 754 this is represented by an exponent of all 1s and a fraction of all 0s). If additional information is required beyond focal length then this should be stored in the “User-Defined Parameters” section of the file.

Multiple Probes

All above for multiple probes – different transmit and receivers

Mandatory

Data type Integer

Dataset name NUMBER_OF_PROBES

Set to the number of probes in operation.

Element Look-up Table

In the case of multiple probes, assignment of global element numbers to all elements in all probes.

Mandatory

Data type 2D array (g x 2) integer

Dataset name GLOBAL_ELEMENT_REFERENCE_TABLE

This array has g columns, the index representing the global element number. It has two rows, the first is the local element number, the second is the probe number.

3.2.2.2 Probe Setup

Probe Centre Frequency

Page 14: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

14

Probe centre frequency Mandatory

Data type 1D array (m probes) float

Dataset name PROBE_CENTRE_FREQUENCY

The centre frequency of each probe in Hz.

Probe Bandwidth

Probe bandwidth Optional

Data type 1D array (m probes) float

Dataset name PROBE_BANDWIDTH

The -3 dB bandwidth of the each probe in Hz. Set to NaN if not required, (in IEEE Standard 754 this is represented by an exponent of all 1s and a fraction of all 0s).

3.2.2.3 Ultrasonic Setup

Receiver Amplifier Gain

Receiver amplifier gain Optional

Data type Float

Dataset name RECEIVER_AMPLIFIER_GAIN

The total gain applied to the received signal in dB. Set to NaN if not required, (in IEEE Standard 754 this is represented by an exponent of all 1s and a fraction of all 0s).

Acquisition Distance Amplitude Correction

Distance Amplitude Correction at acquisition (full description of DAC curve)

Mandatory

Data type 1D array (p) – floats

Dataset name DAC_CURVE

Array of floating-point values (in acquisition dB) in p columns. Each value represents the gain that was applied at that time point. The time base and time point spacing will be the same as the data-acquisition time base. 0 dB for all points should only be stated if there was no DAC applied. It is assumed that the same correction was applied for each element. If this is not the case or a method of DAC creation was employed that cannot be specified in this way, then this should be included in the “User Parameters” area.

Filter Type

Filter type Mandatory

Data type Integer

Dataset name FILTER_TYPE

The filter type defined as 0 – no filter, 1 – low pass, 2 – high pass, 3 - band pass, 4 – user specified.

Filter Centre Frequency

Filter centre frequency Mandatory

Data type Float

Dataset name FILTER_CENTRE_FREQUENCY

The centre frequency of the probe in Hz. Set to NaN if not required, (in IEEE Standard 754 this is represented by an exponent of all 1s and a fraction of all 0s).

Filter Bandwidth

Filter bandwidth Mandatory

Data type Float

Dataset name FILTER_BANDWIDTH

The -3dB bandwidth of the filter in Hz. Set to NaN if not required, (in IEEE Standard 754 this is represented by an exponent of all 1s and a fraction of all 0s).

Number of Averages

Page 15: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

15

Level of time averaging (number of averages, default - 1)

Mandatory

Data type Integer

Dataset name NUMBER_OF_AVERAGES

The number of time averages performed on each Scan line. Set to 1 if none are taken.

Transmit Apertures

Transmit Aperture flag Mandatory

Data type Integer

Dataset name TRANSMIT_APERTURE_FLAG

Indicates that a transmit aperture is in use: 0 - Single element transmit, 1 – simple apertures, 2 – user defined apertures. Simple apertures are defined as groups of consecutive elements on the same probe as shown in figure 3.8. If user-defined apertures are used then they should be defined in the “User-Defined Parameters” section of the file.

Figure 3.8 Simple aperture definition

Simple Transmit Aperture Definitions

Transmit ‘virtual elements’. Optional

Data type 1D array (2 x 1) integer

Dataset name TRANSMIT_APERTURE_DEFINITIONS

If simple apertures are defined above then this array defines the apertures. The first column defines the number of elements within the aperture, the second column defines the step size in elements between adjacent apertures. If defining an aperture the size of the array a step size of 1 must still be input to avoid creating divide by zero errors in processing code.

Receive Apertures

Transmit Aperture flag Mandatory

Data type Integer

Dataset name RECEIVE_APERTURE_FLAG

Indicates that a receive aperture is in use: 0 - single element receive, 1 – simple apertures, 2 – user defined apertures. Simple apertures are defined as groups of consecutive elements on the same probe as shown in figure 3.8. If user-defined apertures are used then they should be defined in the “User-Defined Parameters” section of the file.

Simple Receive Aperture Definitions

Transmit ‘virtual elements’. Optional

Data type 1D array (2 x 1) integer

Dataset name RECEIVE_APERTURE_DEFINITIONS

If simple apertures are defined above then this array defines the apertures. The first column defines the number of elements within the aperture, the second column defines the step size in elements between adjacent apertures.

Page 16: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

16

3.2.2.4 Waveform Encoding

Bits Resolution

Bits resolution Mandatory

Data type Integer

Dataset name NUMBER_OF_BITS_RESOLUTION

The number of real bits resolution used for the analogue-to-digital conversion

Data Type Class

Data Type Class Mandatory

Data type 1D Array (2 x 1) Integer

Dataset name DATA_TYPE

The first column is the type of the MFMC data: 0 - signed integer, 1 - unsigned integer, the second column is the number of bits used for storage of a single MFMC data point. The data must be stored within whole numbers of bytes, i.e. 8-bit, 16-bit etc.

Byte Ordering

Byte Ordering Mandatory

Data type Integer

Dataset name BYTE_ORDERING

The ordering of the bytes in the MFMC data: 0 - little endian, 1 – big endian. This is the byte order that was used to store the data when it was sent to the HDF5 library. When data is read via the HDF5 library, it is possible to ask for it to be converted to either byte order provided the correct order was specified for the stored data when the file was written.

Zero Level

Level representing no signal Mandatory

Data type Integer

Dataset name ZERO_LEVEL

The level in the MFMC data corresponding to no signal. This should be unitless and in the format in which it is stored. For example if 0V is set to the middle of an unsigned 8-bit integer this value should be set to 127.

Vertical Resolution

Vertical resolution (Volts per digitisation level) Mandatory

Data type Float

Dataset name VERTICAL_RESOLUTION

The difference in volts between adjacent quantisation levels on the y-axis of the MFMC data. Set to NaN if not required, (in IEEE Standard 754 this is represented by an exponent of all 1s and a fraction of all 0s).

Data Sampling Interval

Data sampling rate Mandatory

Data type Float

Dataset name DATA_SAMPLING_INTERVAL

The time between adjacent time points in a scan line in seconds.

Scan Line Start Time

Time of first point in waveform Mandatory

Data type Float

Dataset name START_TIME

The time in seconds of the first sample in each Scan line.

Page 17: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

17

Number of Time Points

Number of time points Mandatory

Data type Integer

Dataset name NUMBER_OF_TIME_POINTS

The total number of time points (p) in each transmit-receive scanline (Scan line).

3.2.2.5 FMC Sequence

Number of Scanlines

Number of scanlines per frame Mandatory

Data type Integer

Dataset name NUMBER_OF_SCAN_LINES

The maximum number of scanlines (s) present in any frame of the scan.

Number of Bytes per Scan Line

Number bytes per scan line Mandatory

Data type Integer

Dataset name NUMBER_OF_BYTES_PER_SCAN_LINE

The total number of bytes required per scan line in the frame. Normally this is the number of points in a scanline (p) multiplied by the number of bytes per sample, however, in some cases it may differ such as when delimiters are used.

3.2.2.6 General Setup

Nominal Coupling Medium Velocity

Nominal coupling medium velocity (shear and compression)

Nominal

Data type 1D array (2) float

Dataset name COUPLING_VELOCITIES

Element 1 of the array is the nominal compression velocity of the coupling medium in m/s, element 2 is the nominal shear velocity in m/s. For contact arrays set to specimen medium velocity, if further information is required to be stored about the surface this should be stored in the “User-Defined Parameters” section of the file.

Nominal Specimen Medium Velocity

Nominal specimen medium velocity (shear and compression)

Mandatory

Data type 1D array (2) float

Dataset name SPECIMEN_VELOCITIES

Element 1 of the array is the nominal compression velocity of the specimen medium in m/s, element 2 is the nominal shear velocity in m/s. In the case of anisotropic material this array must still be populated with a relevant value (i.e. one which would allow a simple B-scan to be produced for quality assurance) however, direction dependent velocity data may also be included in the “User-Defined Parameters” area of the file.

Number of Frames

Number of frames Mandatory

Data type Integer

Dataset name NUMBER_OF_FRAMES

The number of frames in the scan. This is the current value and may be amended if the scan is extended after initial acquisition / data storage.

Page 18: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

18

Scan Extent

Nominal scan extent (X,Y,Z) (for creating arrays at analysis time)

Optional

Data type 2D array (2 columns x 3 rows) float

Dataset name SCAN_EXTENT

The first column represents the scan start location in m relative to the GCS datum. The second column represents the scan end location. The first row represents the distance from the probe datum to the global datum in the x direction, the second row represents the y direction and the third row represents the z axis.

Dead Elements

Dead Elements Mandatory

Data type 1D array (g) Integer

Dataset name DEAD_ELEMENTS

Value set to TRUE if an element is dead, columns are probes and rows are in ascending global element number – i.e. column 1 is element 1, column 2 is element 2 and so on.

MFMC CFF Version

MFMC Common File Format version Number Mandatory

Data type String

Dataset name MFMC_VERSION

The version number of the technical specification to which the file conforms. For this version of the technical specification the string shall be “0.5 Draft For General Comment”.

3.2.3 MFMC Data

Within the group.”MFMC_DATA” the raw MFMC data along with frame dependent information from the scan will be stored. Within this section the specification refers to frames of data, a user might equally prefer to store single fields of data and this is allowable within the format. The only condition is that all parameters defined within “Essential Parameters” must be common between fields/frames. Further discussion on this point is contained within section 4.2. The datasets within this section must use chunked data in order to enable extendability and single frame recall (see section 4.1).

3.2.3.1 Raw MFMC Data

There will be a dataset provided for the storage of MFMC data,

Actual waveform data. This may be meaningful raw data that can be used to create an image or raw data as received from data acquisition. If excitation was a coded sequence or chirp, the decoding, pulse compression, or deconvolution information must be included in the “User-Defined Parameters” section of the file.

Data type 2D array (p x (s * f)) data type as specified in “DATA_TYPE” (section 3.2.2.3)

Dataset name MFMC_DATA

Use of chunking / hyper-slabbing as demonstrated in section 4.1 is mandatory and will significantly improve data retrieval rate, enable single frame recall and extendibility of files. Scanlines are stored contiguously with all 's' scanlines for frame 1 stored first, followed by frame 2. Data must be stored within whole numbers of bytes, i.e. 8-bit, 16-bit etc.

Page 19: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

19

Figure 3.8 Layout of Data in Raw MFMC Data Dataset

For storage the matrix is flattened from top-left corner, left-to-right then down, i.e. frame 1 scanline 1 then frame 1 scanline 2 and so on.

3.2.3.2 Parameters

The following parameters will be populated within the group “MFMC_DATA”

Firing information Mandatory

Data type 2D array (3 x (s * f)) Integer

Dataset name FIRING_INFORMATION

A 2 dimensional array, the rows are ordered in ascending scanline number, first for frame 1, then for frame 2 and so on. For the columns, column 1 is firing sequence number (required in multiplexing for example), column 2 is transmit element number (global element number for single element firing but becomes aperture number in the case of aperture firing) and column 3 is receive element number (global element number for single element firing but becomes aperture number in the case of aperture reception). Where a scanline is not valid / present within the frame columns 1, 2 and 3 should be set to NaN.

Figure 3.9 Layout of Data in Firing Information Dataset

Page 20: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

20

Probe orientation (three angles relative to x, y and z axes in GCS), relationship between two coordinate systems

Mandatory

Data type 2D array (6 x (f * m)) float

Dataset name PROBE_ORIENTATIONS

Each row represents the rotation in degrees of the probe about each of the cardinal axes in the GCS at the start and end locations of the frame. The rows shall be ordered in ascending frame number, first for probe 1, then for probe 2 and so on. Column 1 of the array shall represent the x axis rotation in degrees at the start of the frame, column 2 the y axis and column 3 the z axis. column 4 of the array shall represent the x axis rotation in degrees at the end of the frame, column 5 the y axis and column 6 the z axis.

Figure 3.10 Layout of Data in Probe Orientation Dataset

Expected location of the specimen surface Mandatory

Data type 1D array ((f * m) x 1) float

Dataset name PROBE_DATUM_STANDOFF

Each column represents the measurement in m from the probe datum to the specimen surface, this should be the measured distance if known, otherwise the nominal measurement. The columns shall be ordered in ascending field / frame number, first for probe 1, then for probe 2 and so on.

Frame start and end location Mandatory

Data type 2D array (6 x (f * m)) float f = NUMBER_OF_FRAMES (section 3.2.2.3) or

number of fields

Dataset name START_END_LOCATIONS

Each column represents the start and end locations of each prober for a single frame of data (measured in the global coordinate system). The rows shall be ordered in ascending frame number, first for probe 1, then for probe 2 and so on. Column 1 of the array shall represent the x distance in m at the start of the frame, column 2 the y distance and column 3 the z distance. Column 4 of the array shall represent the x distance in m at the end of the frame / field, column 5 the y distance and column 6 the z distance. It is possible that fields of data have been combined to form one frame of data, these fields may have their own start / end locations. If this is the case this field specific data should be stored in the “User-Defined Parameters” section of the file format.

Page 21: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

21

Figure 3.10 Layout of Data in Frame Start and End Locations Dataset

3.2.4 User Defined Parameters

This section if free to allow users to input their own user specific meta-data. The structure of the parameters and their format is at the discretion of the user. However, if the parameters in this area are to be shared outside the generating organisation then the following is advisable but not mandatory:

• The general MFMC CFF structure of the same data types grouped into datasets which correspond to unique parameters is followed

• Dataset names should make it clear what data is stored therein

• No encoding of data i.e. it should be readable by users with an arbitrary HDF5 reader

• No raw MFMC data is stored in this area

Page 22: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

22

4 Examples & Recommendations

4.1 USAGE OF HYPER-SLABS / CHUNKING & STORAGE OF DATA TO ENABLE SINGLE FRAME RECALL

In order to enable single frame recall chunking of data within the MFMC data set must be enabled. By default this is not enabled within the HDF5 library. Unless specified by the user the HDF5 library will make a “best guess” at an appropriate chunk size, for optimum performance this must be specified by the user. Within the library provided by the University of Bristol this is calculated automatically but this explanation is included for those wishing to develop their own low level functions. Chunking is a process whereby a dataset is grouped into logical sections, the physical locations of which within the file are contained in a look up table in the file header. Chunking of datasets allows the recall of single frames of data and is a necessary requirement to have extendibility of the files created (which in itself is a requirement of this project). Chunking must therefore be used on all datasets that could increase in size (i.e. all datasets within 3.2.3).

When creating a chunked data set it is necessary to define a chunk size. This is critical to the application since a size too small will cause a marked decrease in read / write efficiency, whilst a size too large will also cause an overhead with retrieving the specific data of interest from the dataset. When setting the chunk size it is also necessary to consider the size of the likely extensions to the file. If it is reasonable to assume that many frames / scan locations will be added then a large chunk size may be chosen, however, if a smaller increase is anticipated then a smaller chunk size would be more appropriate. With consideration to the factors described the following chunk sizes are recommended:

Variable Chunk Size

MFMC_DATA One frame

FIRING_INFORMATION Entire initial scan

PROBE_LOCATIONS Entire initial scan

PROBE_ORIENTATIONS Entire initial scan

PROBE_DATUM_STANDOFF Entire initial scan

START_END_LOCATIONS Entire initial scan

Table 4.1 Recommended Chunk Sizes

Page 23: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

23

4.2 ALLOWABLE DIFFERENCES BETWEEN FIELDS / FRAMES WITHIN THE FORMAT

There are occasions when fields / frames may differ within a single scan. An example of this would be when multiplexing is employed. In this case probe geometry, probe setup, ultrasonic setup, waveform encoding and general setup will be the same between fields / frames but other information e.g. firing information, number of scanlines or apertures may differ. This is allowable within the format and should be implemented as follows.

All parameters from the “Essential Parameters” section of the file must be shared between fields / frames within the file. I.e. Sampling frequency, delay, duration of acquisition etc. must be constant. All frames should be the same size in memory, i.e. the same number of time points, the same number of scanlines.

The FMC sequence group from “Essential Parameters”, however deserves further examination:

Parameter Notes

NUMBER_OF_SCANLINES This should be set to the maximum number of scanlines in any frame in the file. All frames in the file must contain this number of scanlines. If any frame has less scanlines than this, it must have blank scanlines inserted to make it the same size.

TRANSMIT_ELEMENTS, RECEIVE_ELEMENTS If the same apertures are used in all frames then this should be populated in the normal way. If different apertures are used in different frames then this should be set to “2” representing user-defined. The aperture information should then be stored in the “User-Defined Parameters” section of the file.

NUMBER_OF_BYTES_PER_SCAN_LINE If the above requirement on scanlines is followed this should be the same between frames.

Table 4.2 FMC Sequence Group

Similarly the “MFMC Data” Group also requires some analysis:

Dataset Notes

MFMC_DATA All fields/frames must be the same size and hence have the same number of scanlines and time points.

FIRING_INFORMATION It is allowable to have different firing sequences for different frames. The firing sequence should be recorded for each frame in the manner defined in the specification. If there are fewer firings in one frame than the maximum number of scanlines, resulting in blank scanlines being added to the frame, the transmit and receive numbers of this array should be set to NaN.

PROBE_ORIENTATIONS, PROBE_DATUM_STAND, OFF_START_END_LOCATIONS

These datasets should be populated as normal with the data relevant to each field / frame.

Table 4.3 MFMC Data Group

A worked example demonstrating this in practice may be found within the appendix.

Page 24: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

24

5 Tables

5.1 File Format Comparison

File Format

Is the form

at company specific?

Is a

la

rge

pa

rt o

f th

e f

orm

at

alr

ea

dy w

ritt

en

?

Are

th

ere

pu

bli

cly

ava

ila

ble

lib

rari

es

tha

t n

ew

ad

op

ters

co

uld

use

?

Is there a cost for access for new

adopters?

How long has the form

at be used for?

Is there an international standard for

this file

form

at?

Is there recognition of this file form

at

by large organisations?

Is there on-going technical support

available for this file

form

at?

Is the data viewable without specialis

t

software?

Can m

ultiple frames of FMC data be

stored in this form

at?

Is there a file size lim

it and, if so,

what is it?

Can the file

s be expanded after initial

save?

Avi / Movie No Yes Yes Potentially 20+ years Yes Yes Yes Yes Yes No Yes

DICONDE No

Yes UT -

TFM to be

added Yes - DICOM

Potentially -

Membership

of ASTM UT > 4 years

Supported

by ASTM Yes

Yes - 3rd

Party

Yes - some

work to add

FMC Yes No Yes

HDF5 No Yes Yes No 13 years Yes Yes Yes Yes Yes No Yes

TIFF No Yes Yes No 22 years Yes Yes Yes Yes Yes 4Gb Yes

BIG TIFF No Yes Some No 3 years No Limited No Yes Yes No Yes

OPG File Format Yes Yes No N/A 5 Years No No In OPG only No Yes No Yes

Rolls-Royce Marine Yes Yes

To be

negotiated

To be

negotiated 6 years No

Yes -

Arraygen Yes by RR No Yes No

Could be

added

Strathclyde No Yes Yes No 7 years Yes Yes for HDF5 Yes Yes Yes No Yes

XML No Yes Yes No 5 Years Yes Yes

Yes - 3rd

Party Yes Yes No Potentially

New Format No No No No 0 years No No Potentially No Yes No Yes

Page 25: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

25

Can m

ultip

le s

equential file

s b

e s

tore

d

whils

t each o

ne r

eta

ins independence

(i.e

. all p

ara

mete

rs r

equired f

or

reconstr

uction s

tore

d w

ithin

each f

ile)

What

is t

he m

axim

um

num

ber

of

bits

per

sam

ple

that

may b

e s

tore

d?

Can y

ou e

xtr

act

a s

pecific

fra

me o

f

data

without

readin

g in a

ll p

revio

us

fram

es in t

he f

ile? E

.g.

file

poin

ters

to

fram

e in t

he h

eader

Is t

here

pro

vis

ion f

or

data

com

pre

ssio

n w

ithin

the f

ile f

orm

at?

Does t

he f

orm

at

allow

for

a h

eader

of

mandato

ry d

ata

?

Are

all e

ssen

tial p

ara

mete

rs d

efined

in r

equirem

ents

inclu

ded in t

he

header?

Does t

he f

orm

at

have t

he a

bility t

o n

ot

pro

vid

e a

valu

e f

or

the essen

tial

para

mete

rs i.e

. could

you m

ake t

hem

optional?

Are

there

extr

a p

ara

mete

rs s

pecifie

d

in t

he h

eader

of

the f

orm

at

that

are

not

essen

tial in t

his

requirem

ents

docum

ent?

Does t

he f

orm

at

allo

w f

or

an a

rea

where

custo

m d

ata

can b

e s

tore

d

(with n

o d

ata

siz

e r

estr

ictions?)

Are

there

restr

ictions o

n t

he v

ariable

type t

hat

can b

e s

tore

d i.e

. in

teger,

str

ing e

tc?

Could

the f

ile f

orm

at

be a

dapte

d t

o

meet

the r

equirem

ents

?

If s

o p

lease p

rovid

e a

n e

stim

ate

of

the

level of

eff

ort

required t

o a

chie

ve t

his

?

Note

s

Not natively Machine set Yes Yes

Not in the

header Yes Yes Yes Yes No Yes Months

Format for movie file formats is extremely rigid, making the format work for this

application could be technically difficult

Yes Unlimited

Yes -

capability

within

DICOM

Image

compression Yes Yes Yes Yes

Yes but may

limit

exchangabili

ty

No - all data

types

covered by

standard Yes ????

ASTM E09.11 is very happy to receive contributions from interested parties. It should be

possible for RCNDE to suggest the modifications. I attached a pdf that is called DICOM

cookbook and provides a good overview of DICOM.

Further I will include the chairman of the DICONDE committee to this conversation: Pat

Howard [email protected]

Yes Machine set Yes Yes Yes Yes Yes No Yes No Yes Weeks

Libraries exist in a variety of formats and are freely available.

HDF group which control the format also provide consultancy & technical support which

could be used by users.

One of stakeholders to this project already use an implementation of HDF5.

Not natively 24 Yes Yes Yes Yes Yes Yes

Yes but

limited size No Yes MonthsTIFF standard is not suitable for this application due to file size limitations.

Not natively 24 Yes Yes Yes Yes Yes Yes

Yes but

limited size No Yes Months

BIGTIFF is but although a standard was proposed in 2011 this has not as yet been

internationally agreed.

Yes 16 Yes No Yes Yes No Yes Yes No Yes Weeks

Yes 16 Yes

Could be

added Yes No

Could be

added Yes Yes No Yes Weeks

Yes Machine set Yes Yes Yes Yes Yes Yes Yes No Yes Weeks

The data is saved as a Matlab structure;

The structure comes with corresponding code package that has FMC simulation, display

and processing capabilities; TFM and TFM-PCF is implemented for CUDA-capable

computers;

It is possible that for a public version, some capability will have to be disabled – details to

be negotiated with interested parties.

I am sure that most of the functionality can be made open-source.

Not natively Machine set No Not natively Yes Yes Yes No Yes No Yes Months

Format is designed to be read by both machines and humans explaining the preference

for plain text storage. Binary is possible to reduce file size but no agreed standard agrees

for binary.

Some technical challenges to be resolved, i.e. expandability, sequential files.

It is used for large data sets in some organisations – some evidence of usage in the

financial sector for example.

Yes Machine set Yes Yes Yes Yes Yes No Yes No Yes 6 Months

A new file format could be tailored to meet the requirements of an arbitrary technical

specification but the development time required for this as well as issues around

technical support and acceptance of this format make this format unattractive to this

project.

Page 26: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

26

6 Acknowledgements

There have been a great number of contributors to this project and I would like to acknowledge the contributions made by the following organisations:

• Diagnostic Sonar

• EDF

• Electric Power Research Institute

• GE Measurement & Control

• Imperial College

• Olympus NDT

• Ontario Power Group

• Rolls-Royce Aerospace

• Rolls-Royce Marine

• TWI

• University of Bristol

• University of Strathclyde

I would also like to thank Professor Paul Wilcox and Professor Robert Smith from Bristol University, Dr Joseph Jackson and Dr Jerzy Dziewierz from the University of Strathclyde and David Lines from Diagnostic Sonar Limited for providing specific detailed feedback on the early drafts of this specification.

Page 27: Multi-Frame Matrix Capture Common File Format … · Multi-Frame Matrix Capture Common File Format ... There is a high level Application Programming Interface (API ... interaction

27

7 References

[1]. Mienczakowski M, “Multi-Frame Matrix Capture Common File Format (MFMC-CFF) Requirements Capture”, Version 1, September 2014.

[2]. HDF Group, “Who Uses HDF5?” [online] , March 2013, Available from: http://www.hdfgroup.org/users.html

[3]. The Mathworks Inc., “HDF5 in MATLAB” [online], 2014, Available from: http://www.mathworks.co.uk/help/matlab/hdf5-files.html

[4]. National Instruments Corporation, “HDF5 DataPlugin”, 7th May 2014, Available from: http://www.ni.com/example/31546/en/

[5]. The HDF Group, “HDF5 File Organization and Data Model” [online], May 2014 (release 1.8.13), Available from: http://www.hdfgroup.org/HDF5/doc/H5.intro.html#Intro-FileOrg