Data

The geoh5 format allows storing data (values) on different parts of an Object. The data types currently supported by geoh5py are

  • Float

  • Integer

  • Text

  • Colormap

  • Well log

data

[1]:
from geoh5py.workspace import Workspace
import numpy as np

# Re-use the previous workspace
workspace = Workspace("my_project.geoh5")

# Get the curve from previous section
curve = workspace.get_entity("Curve")[0]

Float

Numerical float data can be attached to the various elements making up object. Data can be added to an Object entity using the add_data method.

[2]:
curve.add_data({
    "my_cell_values": {
        "association":"CELL",
        "values": np.random.randn(curve.n_cells)
    }
})
[2]:
<geoh5py.data.float_data.FloatData at 0x7f3e644be3d0>

The association can be one of:

  • OBJECT: Single element characterizing the parent object

  • VERTEX: Array of values associated with the parent object vertices

  • CELL: Array of values associated with the parent object cells

The length and order of the array of values must be consistent with the corresponding element of association. If the association argument is omited, geoh5py will attempt to assign the data to the correct part based on the shape of the data values, either object.n_values or object.n_cells

[3]:
# Add multiple data vectors on a single call
data = {}
for ii in range(8):
    data[f"Period:{ii}"] = {
        "association":"VERTEX",
        "values": (ii+1) * np.cos(ii*curve.vertices[:, 0]*np.pi/curve.vertices[:, 0].max()/4.)
    }

data_list = curve.add_data(data)
print([obj.name for obj in data_list])
['Period:0', 'Period:1', 'Period:2', 'Period:3', 'Period:4', 'Period:5', 'Period:6', 'Period:7']

The newly created data is directly added to the project’s geoh5 file and available for visualization:

adddata

Integer

Same implementation as for Float data type but with values provided as integer (int32).

Text

Text (string) data can only be associated to the object itself.

[4]:
curve.add_data({
    "my_comment": {
        "association":"OBJECT",
        "values": "hello_world"
    }
})
[4]:
<geoh5py.data.text_data.TextData at 0x7f3e6449b250>

Colormap

The colormap data type can be used to store or customize the color palette used by Geoscience ANALYST.

[5]:
from geoh5py.data.color_map import ColorMap

# Create some data on a grid2D entity.
grid = workspace.get_entity("Grid2D")[0]

# Add data
radius = grid.add_data({
    "radial": {"values": np.linalg.norm(grid.centroids, axis=1)}
})

colormap

[6]:
# Create a simple colormap that spans the data range
nc = 10
rgba = np.vstack([
    np.linspace(radius.values.min(), radius.values.max(), nc), # Values
    np.linspace(0, 255, nc), # Red
    np.linspace(255, 0, nc), # Green
    np.linspace(125, 15, nc), # Blue,
    np.ones(nc) * 255, # Alpha,
]).T

We now have an array that contains a range of integer values for red, green, blue and alpha (RGBA) over the span of the data values. This array can be used to implicitly create a ColorMap from the EntityType.

[7]:
# Assign the colormap to the data type
radius.entity_type.color_map = rgba

The resulting ColorMap stores the values to geoh5 as a numpy.recarray with fields for Value, Red, Green, Blue and Alpha.

[8]:
radius.entity_type.color_map._values
[8]:
rec.array([( 50.03124024,   0, 255, 125, 255),
           ( 55.62664299,  28, 226, 112, 255),
           ( 61.22204575,  56, 198, 100, 255),
           ( 66.8174485 ,  85, 170,  88, 255),
           ( 72.41285126, 113, 141,  76, 255),
           ( 78.00825401, 141, 113,  63, 255),
           ( 83.60365676, 170,  85,  51, 255),
           ( 89.19905952, 198,  56,  39, 255),
           ( 94.79446227, 226,  28,  27, 255),
           (100.38986503, 255,   0,  15, 255)],
          dtype=[('Value', '<f8'), ('Red', 'u1'), ('Green', 'u1'), ('Blue', 'u1'), ('Alpha', 'u1')])

colormap

Files

Raw files can be added to groups and objects and stored as blob (bytes) data in geoh5.

[9]:
file_data = grid.add_file("./data.ipynb")

filename

The information can easily be re-exported out to disk with the save method.

[10]:
file_data.save_file(path="./temp", name="new_name.ipynb")

Well Data

In the case of Drillhole objects, data are always stored as from-to interval values.

Depth Data

Depth data are used to represent measurements recorded at discrete depths along the well path. A depth attribute is required on creation. Depth markers are converted internally to from-to intervals by adding a small depth values defined by the collocation_distance. If the Drillhole object already holds depth data at the same location, geoh5py will group the datasets under the same PropertyGroup.

[12]:
well = workspace.get_entity("Drillhole")[0]
depths_A = np.arange(0, 50.) # First list of depth

# Second list slightly offsetted on the first few depths
depths_B = np.arange(0.01, 50.01)

# Add both set of log data with 0.5 m tolerance
well.add_data({
    "my_log_values": {
            "depth": depths_A,
            "values": np.random.randn(depths_A.shape[0]),
    },
    "log_wt_tolerance": {
            "depth": depths_B,
            "values": np.random.randn(depths_B.shape[0]),
    }
})
[12]:
[<abc.FloatDataConcatenated at 0x7f3e39564850>,
 <abc.FloatDataConcatenated at 0x7f3e644ee8d0>]

DHlog

Interval (From-To) Data

Interval data are defined by constant values bounded by a start (FROM) and an end (TO) depth. A from-to attribute defined as a numpy.ndarray (nD, 2) is expected on creation. Subsequent data are appended to the same interval PropertyGroup if the from-to values match within the collocation distance parameter. Users can control the tolerance for matching intervals by supplying a collocation_distance argument in meters, or by setting the default on the drillhole entity (default_collocation_distance = 1e-2 meters).

[13]:
# Define a from-to array
from_to = np.vstack([
    [0.25, 25.5],
    [30.1, 55.5],
    [56.5, 80.2]
])

# Add some reference data
well.add_data({
    "interval_values": {
        "values": np.asarray([1, 2, 3]),
        "from-to": from_to,
        "value_map": {
            1: "Unit_A",
            2: "Unit_B",
            3: "Unit_C"
        },
        "type": "referenced",
    }
})

# Add float data on the same intervals
well.add_data({
    "random_values": {
        "values": np.random.randn(from_to.shape[0]),
        "from-to": from_to,
    }
})
/home/docs/checkouts/readthedocs.org/user_builds/geoh5py/conda/v0.6.1/lib/python3.7/site-packages/geoh5py/data/integer_data.py:66: UserWarning: Values provided in int64 are converted to int32 for PrimitiveTypeEnum.REFERENCED data 'interval_values.'
  f"Values provided in {values.dtype} are converted to int32 for "
[13]:
<abc.FloatDataConcatenated at 0x7f3e37506790>

DHinterval

Get data

Just like any Entity, data can be retrieved from the Workspace using the get_entity method. For convenience, Objects also have a get_data_list and get_data method that focusses only on their respective children Data.

[14]:
my_list = curve.get_data_list()
print(my_list, curve.get_data(my_list[0]))
['Period:0', 'Period:1', 'Period:2', 'Period:3', 'Period:4', 'Period:5', 'Period:6', 'Period:7', 'my_cell_values', 'my_comment'] [<geoh5py.data.float_data.FloatData object at 0x7f3e3955d450>]

Property Groups

Data entities sharing the same parent Object and association can be linked within a property_groups and made available through profiling. This can be used to group data that would normally be stored as 2D array.

[15]:
# Add another VERTEX data and create a group with previous
curve.add_data_to_group([obj.name for obj in data_list], "my_trig_group")
[15]:
<geoh5py.groups.property_group.PropertyGroup at 0x7f3e3955d610>

propgroups

[16]:
workspace.close()