2023-07-05 09:58:06 +02:00
{
"cells": [
{
"cell_type": "markdown",
"id": "c6ec59ca-b58c-443c-9a98-25b824705bb5",
"metadata": {},
"source": [
"*This notebook provides an example on how to create a SeisBench dataset from an xls event catalog and a folder with mseed files*\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "a00a8204-932b-4488-85a0-6eea6f306523",
"metadata": {},
"outputs": [],
"source": [
"import seisbench\n",
"import seisbench.data as sbd\n",
"import seisbench.util as sbu\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"from pathlib import Path\n",
"import obspy\n",
"import os\n",
"from pathlib import Path\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "5e44f9bb-4ae8-412c-a14d-3cc0885504c6",
"metadata": {},
"source": [
"# Creating a dataset\n",
"\n"
]
},
2023-10-12 14:27:53 +02:00
{
"cell_type": "markdown",
"id": "70c64dc6-e4dd-4c01-939d-a28914866f5d",
"metadata": {},
"source": [
"##### The catalog has a custom format with the following properties: \n",
"###### 'Datetime', 'X', 'Y', 'Depth', 'Mw', 'Phases', 'mseed_name'\n",
"###### Phases is a string with detected phases seperated by comma: <Phase> <Station> <Datetime> e.g. \"Pg BRDW 2020-01-01 10:09:44.400, Sg BRDW 2020-01-01 10:09:45.696\""
]
},
2023-07-05 09:58:06 +02:00
{
"cell_type": "code",
"execution_count": 2,
"id": "143d04f7-e00a-4724-895e-f3dad72896e0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Datetime</th>\n",
" <th>X</th>\n",
" <th>Y</th>\n",
" <th>Depth</th>\n",
" <th>Mw</th>\n",
" <th>Phases</th>\n",
" <th>mseed_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020-01-01 10:09:42.200</td>\n",
" <td>5.582503e+06</td>\n",
" <td>5.702646e+06</td>\n",
" <td>0.7</td>\n",
" <td>2.469231</td>\n",
" <td>Pg BRDW 2020-01-01 10:09:44.400, Sg BRDW 2020-...</td>\n",
" <td>20200101100941.mseed</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
2023-09-26 10:50:46 +02:00
" Datetime X Y Depth Mw \\\n",
"0 2020-01-01 10:09:42.200 5.582503e+06 5.702646e+06 0.7 2.469231 \n",
2023-07-05 09:58:06 +02:00
"\n",
" Phases mseed_name \n",
"0 Pg BRDW 2020-01-01 10:09:44.400, Sg BRDW 2020-... 20200101100941.mseed "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
2023-09-26 10:50:46 +02:00
"input_path = str(Path.cwd().parent) + \"/datasets/igf/\"\n",
2023-07-05 09:58:06 +02:00
"catalog = pd.read_excel(input_path + \"Catalog_20_21.xlsx\", index_col=0)\n",
"catalog.head(1)"
]
},
2023-10-12 14:27:53 +02:00
{
"cell_type": "code",
"execution_count": 4,
"id": "03257d45-299d-4ed1-bc64-03303d2a9873",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Pg BRDW 2020-01-01 10:09:44.400, Sg BRDW 2020-01-01 10:09:45.696, Pg GROD 2020-01-01 10:09:45.206, Sg GROD 2020-01-01 10:09:46.655, Pg GUZI 2020-01-01 10:09:45.116, Sg GUZI 2020-01-01 10:09:46.561, Pg JEDR 2020-01-01 10:09:44.920, Sg JEDR 2020-01-01 10:09:46.285, Pg MOSK2 2020-01-01 10:09:45.417, Sg MOSK2 2020-01-01 10:09:46.921, Pg NWLU 2020-01-01 10:09:45.686, Sg NWLU 2020-01-01 10:09:47.175, Pg PCHB 2020-01-01 10:09:45.213, Sg PCHB 2020-01-01 10:09:46.565, Pg PPOL 2020-01-01 10:09:44.755, Sg PPOL 2020-01-01 10:09:46.069, Pg RUDN 2020-01-01 10:09:44.502, Sg RUDN 2020-01-01 10:09:45.756, Pg RYNR 2020-01-01 10:09:43.442, Sg RYNR 2020-01-01 10:09:44.394, Pg RZEC 2020-01-01 10:09:46.075, Sg RZEC 2020-01-01 10:09:47.587, Pg SGOR 2020-01-01 10:09:45.817, Sg SGOR 2020-01-01 10:09:47.284, Pg TRBC2 2020-01-01 10:09:44.833, Sg TRBC2 2020-01-01 10:09:46.095, Pg TRN2 2020-01-01 10:09:44.488, Sg TRN2 2020-01-01 10:09:45.698, Pg TRZS 2020-01-01 10:09:46.232, Sg TRZS 2020-01-01 10:09:47.727, Pg ZMST 2020-01-01 10:09:43.592, Sg ZMST 2020-01-01 10:09:44.553, Pg LUBW 2020-01-01 10:09:43.119, Sg LUBW 2020-01-01 10:09:43.929'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"catalog.Phases[0]"
]
},
2023-07-05 09:58:06 +02:00
{
"cell_type": "markdown",
"id": "fe0627b1-6fa0-4b5a-8a60-d80626b5c9be",
"metadata": {},
"source": [
"#### SeisBench dataset format \n",
"\n",
"A dataset consists of 2 components: \n",
"* a metadata file, called `metadata.csv`, with properties of assosiated waveforms\n",
"* a waveforms file, called `waveforms.hdf5`, containing the raw waveforms\n",
"\n",
"\n",
"A dataset is created with `WaveformDataWriter` provided by SeisBench\n",
"\n",
"### Define train/val/test split\n",
"\n",
"Strategy: \n",
"Assign chronologically 70% picks to train, 15% to val, 15% to test sets. \n",
"(Note: Counting picks based on `Phases` column in the `Catalog_20_21.xlsx` produces sets with slightly different proportion as not all traces are available.)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "07ab344c-f03f-49aa-8fa2-537fbb154716",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: xlabel='Datetime'>"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmgAAAEmCAYAAADWVWzIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABK5UlEQVR4nO3deViU5f4/8PcMyyD7vsmqKIsiLijimkliomVaP02PYlkeO+rXJU9mdTLKsiwzbdHsdLLF0uyUllsiiisq7qICgiiyDDsMIAwwc//+8DDjpJkoMDPwfl2XV87z3DPzeT4Nw9tnuR+JEEKAiIiIiAyGVN8FEBEREZEuBjQiIiIiA8OARkRERGRgGNCIiIiIDAwDGhEREZGBYUAjIiIiMjAMaEREREQGhgGNiIiIyMCY6rsAfVKr1cjLy4ONjQ0kEom+yyEiIqI2TAiByspKeHp6Qiq9+z6ydh3Q8vLy4O3tre8yiIiIqB25fv06vLy87jqmXQc0GxsbADcbZWtrq+dqiIiIqC1TKBTw9vbW5I+7adcBrfGwpq2tLQMaERERtYp7Oa2KFwkQERERGRgGNCIiIiIDw4BGREREZGAY0IiIiIgMDAMaEREREYDc8hrUq9T6LgMAAxoRERG1c1XKBjy7PhmD39uL3RcK9F0OgHY+zQYRERG1X6ezy7Aq4TIS04p0lsX08NBjVTcxoBEREVG789PJHCzcfFbz2M1WhtnDAjAl0k9/Rd2CAY2IiIjajbLqOsT9dgFbzuQBAILcbTB3eBeM7O5uUPflZkAjIiKidmH94Sy8szMVdQ03LwR4oldHLH+yB8xMDO+UfAY0IiIiatPKb9RhTWImPj9wBQBgY2GKf0YHYkp/X4Paa3YrBjQiIiJqs+IvFuD5b05oHk8I98a740MNNpg1YkAjIiKiNiezqApzN55GSq5Cs+zVUcF4brC/wYczoInzoC1btgx9+/aFjY0NXF1dMXbsWKSlpemMqa2txaxZs+Dk5ARra2uMHz8eBQW6c4pkZ2cjJiYGlpaWcHV1xT//+U80NDTojElMTETv3r0hk8kQEBCA9evX31bPp59+Cj8/P1hYWCAiIgLHjx9vyuYQERFRG1JUqcTyXakY+dEBDF+xXxPOBgU4Y+fcwXh+SCejCGdAEwPa/v37MWvWLBw9ehTx8fGor6/HiBEjUF1drRkzf/58/Pbbb9i8eTP279+PvLw8jBs3TrNepVIhJiYGdXV1OHLkCL7++musX78er7/+umZMVlYWYmJiMGzYMJw5cwbz5s3Dc889h99//10zZtOmTViwYAGWLFmCU6dOISwsDNHR0SgsLHyQfhAREZERir9YgL5v78FniZlIlVcCAIYFumDVxJ74dno/BHvY6rnCppEIIcT9PrmoqAiurq7Yv38/hgwZgoqKCri4uOD777/Hk08+CQBITU1FcHAwkpKS0L9/f+zcuROjR49GXl4e3NzcAABr167FokWLUFRUBHNzcyxatAjbt29HSkqK5r0mTpyI8vJy7Nq1CwAQERGBvn374pNPPgEAqNVqeHt7Y86cOXj55ZfvqX6FQgE7OztUVFTA1ta4/scRERERoGxQYeHmc/jt7M1pMyQSYFwvL0wb4IdQLzs9V6erKbnjga4rraioAAA4OjoCAE6ePIn6+npERUVpxgQFBcHHxwdJSUkAgKSkJISGhmrCGQBER0dDoVDgwoULmjG3vkbjmMbXqKurw8mTJ3XGSKVSREVFacYQERFR26VSC3yy9zL6v5OgCWcWZlJcenMkVvy/MIMLZ0113xcJqNVqzJs3DwMHDkT37t0BAHK5HObm5rC3t9cZ6+bmBrlcrhlzazhrXN+47m5jFAoFampqUFZWBpVKdccxqampf1qzUqmEUqnUPFYoFH86loiIiAyPSi3w9vZL2HomFyXVdQAAJytzPBrqjpdGBsHCzETPFTaP+w5os2bNQkpKCg4dOtSc9bSoZcuWIS4uTt9lEBER0X04eqUEb227iAt5N3ewWMtMsejRIDzVx6vNBLNG9xXQZs+ejW3btuHAgQPw8vLSLHd3d0ddXR3Ky8t19qIVFBTA3d1dM+aPV1s2XuV565g/XvlZUFAAW1tbdOjQASYmJjAxMbnjmMbXuJPFixdjwYIFmscKhQLe3t5N2HIiIiJqTRfzFPj++DUcu1KKy4VVmuV/H9IJ86K6ooN52wpmjZp0DpoQArNnz8Yvv/yCvXv3wt/fX2d9nz59YGZmhoSEBM2ytLQ0ZGdnIzIyEgAQGRmJ8+fP61xtGR8fD1tbW4SEhGjG3PoajWMaX8Pc3Bx9+vTRGaNWq5GQkKAZcycymQy2trY6f4iIiMiwCCGQX1GDn0/l4G9fHsN3R7NxubAKEgkwIsQN8fOHYPGo4DYbzoAm7kGbNWsWvv/+e2zduhU2Njaac8bs7OzQoUMH2NnZYfr06ViwYAEcHR1ha2uLOXPmIDIyEv379wcAjBgxAiEhIZgyZQqWL18OuVyO1157DbNmzYJMJgMAzJw5E5988gleeuklPPvss9i7dy9+/PFHbN++XVPLggULEBsbi/DwcPTr1w8fffQRqqur8cwzzzRXb4iIiKgVFVUqseuCHGsTM5FbXqOzbsVTYRga6AJna5meqmtdTZpm488md/vqq68wbdo0ADcnqn3xxRfxww8/QKlUIjo6Gp999pnOocdr167hhRdeQGJiIqysrBAbG4t3330XpqbavJiYmIj58+fj4sWL8PLywr/+9S/NezT65JNP8P7770Mul6Nnz55YvXo1IiIi7nnjOc0GERGR/tXWqzB342n8fkH31KUQD1sMDHDC34d2bhPBrCm544HmQTN2DGhERET6VdegxvPfnMD+9CLNsr/198E/o4Ng18FMj5U1v6bkDt6Lk4iIiPTi5LVSPP/NSZT+b7qMf0YH4oWhnSGVGsftmFoSAxoRERG1qn2phVgRn6ZzI/OFI7pi1rAAPVZlWBjQiIiIqFUUKmrx9+9O4nR2OYCbt2WKCfXAjCGd0MPLXq+1GRoGNCIiImoxtfUqfHf0Gn47l4+z18s1y5/u54PnB/ujk4u1/oozYAxoRERE1OzUaoFvj17Dx3szUFylvc2ik5U5nuzjhcWjgvVYneFjQCMiIqJmVVuvwhu/XsDG5OuaZdMH+WNc747o5mncNzFvLQxoRERE1CzKquuwKuEydpzPR2Hlzb1mfk6W+HZ6BLwdLfVcnXFhQCMiIqIHUn6jDl8cvIIvDmahrkENAPCws8CLIwIxvnfHP53onv4cAxoRERE1WUVNPY5eKcEvp3Kx51IBGtQ3573vYGaCd8eHIrqbOyzM2u69MlsaAxoRERHdszR5Jf7501mcy6m4bd30Qf6YObQzXGyM/7ZM+saARkRERH+prkGND3an4d8Hr+B/O8vg42iJMG97BLnb4KlwL7jaWOi3yDaEAY2IiIjuSAiB+IsF+PFEDk5cK0X5jXoAwIDOTngtJgQhnryPdUthQCMiIqLbZBRWYt6mMzq3Y3K0Msf/PRyA2AF+PPG/hTGgERERkcblgkq8vvUCkq6UaJb19rHHv0aHILSjHUxNpHqsrv1gQCMiIiJU1NTjs30Z+PboNdyoUwEAwrzs8NroEIT7OnCPWStjQCMiImrHLhdU4ouDV7DldB7qVDfnMOvkYoVFI4PwSLAbpFIGM31gQCMiImqHrpfewP9tPI3T2eWaZb5Olpg5tDOe6uPFQ5l6xoBGRETUjtTWqzDnh9OIv1igWfZwkCtmDu2Mvn48lGkoGNCIiIjaOGWDCmevV+DnUzk6NzB3tZHh3fGheDjITY/V0Z0woBEREbVBQgicy6nAljO52HomD6XVdZp1Egmw4qkwPNGL98k0VAxoREREbUhdgxrfJF3Ff0/l4lK+dg4zZ2sZune0hZ+TFZ4f0gkd7TvosUr6KwxoREREbURJlRJjPzuM66U1mmX9/B3xRK+OPPHfyDCgERERGbnLBZX48lAWfj6Vq5kqY3AXZ7z/ZBjc7Xh/TGPEgEZERGSEGlRq/PtQFrafy8f53ArN8kA3Gyx/sgfCvO31Vxw9MAY
"text/plain": [
"<Figure size 700x300 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"catalog['pick_count'] = catalog.Phases.apply(lambda x: x.count(\"Pg\"))\n",
"catalog.index = catalog.Datetime\n",
"catalog = catalog.sort_index()\n",
"catalog['pick_count_cumsum'] = catalog.pick_count.cumsum()\n",
"catalog.pick_count_cumsum.plot(figsize=(7,3))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4fabb4c3-4056-4bad-94e6-f907977762a6",
"metadata": {},
"outputs": [],
"source": [
"train_th = 0.7 * catalog.pick_count_cumsum[-1]\n",
"dev_th = 0.85 * catalog.pick_count_cumsum[-1]\n",
"\n",
"catalog['split'] = 'test'\n",
"for i, event in catalog.iterrows(): \n",
" if event['pick_count_cumsum'] < train_th: \n",
" catalog.loc[i, 'split'] = 'train' \n",
" elif event['pick_count_cumsum'] < dev_th: \n",
" catalog.loc[i, 'split'] = 'dev' \n",
" else:\n",
" break"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "35254721-fe1e-447c-9195-84695868f1d7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6996718237224566"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"catalog[catalog.split == 'train'].pick_count.sum() / catalog.pick_count.sum()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "95d297ed-0da7-4985-954e-645a8a89b6a0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.149929676511955"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"catalog[catalog.split == 'dev'].pick_count.sum() / catalog.pick_count.sum()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "28451050-6b1c-4fe6-a905-799383515d5b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.15039849976558836"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"catalog[catalog.split == 'test'].pick_count.sum() / catalog.pick_count.sum()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "be679508-478a-4195-b5e7-a9fea9bbf724",
"metadata": {},
"outputs": [],
"source": [
"def get_event_params(event): \n",
" event_params = {\n",
" 'source_origin_time': event.Datetime, \n",
" 'source_latitude_deg': event.Y, \n",
" 'source_longitude_deg': event.X, \n",
" 'source_depth_km': event.Depth, \n",
" 'source_magnitude': event.Mw, \n",
" 'split': event.split\n",
" }\n",
" return event_params\n",
"\n",
"def get_event_picks(event): \n",
" \n",
" picks = [ann.split(' ') for ann in event.Phases.split(', ')]\n",
" picks = pd.DataFrame(picks, columns = ['pick', 'station', 'date', 'hour'])\n",
" picks.index = pd.DatetimeIndex(picks.date + ' ' + picks.hour, tz= \"UTC\")\n",
"\n",
" return picks\n",
"\n",
"def get_mseed(fname):\n",
" return obspy.read(fname)\n",
"\n",
"\n",
"def get_trace_params(trace): \n",
" trace_params = {\n",
" \"station_network_code\": trace.stats.network,\n",
" \"station_code\": trace.stats.station,\n",
" \"trace_channel\": trace.stats.channel\n",
" }\n",
" return trace_params\n",
" \n",
"def get_waves_timestamps(station, phases_string): \n",
" \n",
" p_ts = None\n",
" s_ts = None\n",
"\n",
" return p_ts, s_ts\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "77373c87-019d-4c7d-90b3-54e36874750a",
"metadata": {},
"outputs": [],
"source": [
"output_path = input_path + \"seisbench_format/\"\n",
"metadata_path = output_path + \"metadata.csv\"\n",
"waveforms_path = output_path + \"waveforms.hdf5\"\n",
"train = 0.7\n",
"dev = 0.15\n",
"test = 0.15"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "6f098f39-85aa-43e0-90e8-b66c90a11d31",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
2023-09-26 10:50:46 +02:00
"Traces converted: 35784it [01:01, 578.58it/s]\n"
2023-07-05 09:58:06 +02:00
]
}
],
"source": [
"with sbd.WaveformDataWriter(metadata_path, waveforms_path) as writer:\n",
"\n",
" # Define data format\n",
" writer.data_format = {\n",
" \"dimension_order\": \"CW\",\n",
" \"component_order\": \"ZNE\",\n",
" }\n",
" \n",
" for event in catalog.itertuples():\n",
" # if \"2020-03-03 05:04:43\" not in str(event.Datetime): \n",
" # continue\n",
" event_params = get_event_params(event)\n",
" event_picks = get_event_picks(event)\n",
" if pd.isna(event.mseed_name): \n",
" continue\n",
" if os.path.exists(input_path + \"mseeds/mseeds_2020/\" + event.mseed_name):\n",
" mseed_path = input_path + \"mseeds/mseeds_2020/\" + event.mseed_name \n",
2023-09-26 10:50:46 +02:00
" elif os.path.exists(input_path + \"mseeds/mseeds_2021/\" + event.mseed_name):\n",
2023-07-05 09:58:06 +02:00
" mseed_path = input_path + \"mseeds/mseeds_2021/\" + event.mseed_name \n",
2023-09-26 10:50:46 +02:00
" else: \n",
" continue\n",
2023-07-05 09:58:06 +02:00
" \n",
" \n",
" stream = get_mseed(mseed_path)\n",
" \n",
" for pick_time, pick in event_picks.iterrows():\n",
" waveforms = stream.select(station=pick.station)\n",
" if len(waveforms) == 0:\n",
" # No waveform data available\n",
" continue\n",
" \n",
" trace_params = get_trace_params(waveforms[0])\n",
" \n",
" sampling_rate = waveforms[0].stats.sampling_rate\n",
" # Check that the traces have the same sampling rate\n",
" assert all(trace.stats.sampling_rate == sampling_rate for trace in waveforms)\n",
" \n",
" actual_t_start, data, _ = sbu.stream_to_array(\n",
" waveforms,\n",
" component_order=writer.data_format[\"component_order\"],\n",
" )\n",
" \n",
" trace_params[\"trace_sampling_rate_hz\"] = sampling_rate\n",
" trace_params[\"trace_start_time\"] = str(actual_t_start)\n",
"\n",
" pick_time = obspy.core.utcdatetime.UTCDateTime(pick_time)\n",
" pick_idx = (pick_time - actual_t_start) * sampling_rate\n",
"\n",
" trace_params[f\"trace_{pick.pick}_arrival_sample\"] = int(pick_idx)\n",
" # sample = (pick.time - actual_t_start) * sampling_rate\n",
" # trace_params[f\"trace_{pick.phase_hint}_arrival_sample\"] = int(sample)\n",
" # trace_params[f\"trace_{pick.phase_hint}_status\"] = pick.evaluation_mode\n",
" \n",
" writer.add_trace({**event_params, **trace_params}, data)\n",
2023-09-26 10:50:46 +02:00
"\n",
" # break\n",
2023-07-05 09:58:06 +02:00
" \n",
" "
]
},
{
"cell_type": "markdown",
"id": "a7a66d99-4dfa-4c3a-937b-6df437eb8833",
"metadata": {},
"source": [
"### Load converted dataset"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "cdb07cfb-96c5-4444-81c8-1362fa3ceea8",
"metadata": {},
"outputs": [],
"source": [
2023-09-26 10:50:46 +02:00
"data = sbd.WaveformDataset(output_path, sampling_rate=100)\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "33c77509-7aab-4833-a372-16030941395d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Unnamed dataset - 35784 traces\n"
]
}
],
"source": [
"print(data)"
2023-07-05 09:58:06 +02:00
]
},
{
"cell_type": "markdown",
"id": "4d3440c7-318b-41f3-8035-48ce3cd9a764",
"metadata": {},
"source": [
"#### Plot sample"
]
},
{
"cell_type": "code",
2023-09-26 10:50:46 +02:00
"execution_count": 13,
2023-07-05 09:58:06 +02:00
"id": "1753f65e-fe5d-4cfa-ab42-ae161ac4a253",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2023-09-26 10:50:46 +02:00
"<matplotlib.lines.Line2D at 0x14d6c12d0>"
2023-07-05 09:58:06 +02:00
]
},
2023-09-26 10:50:46 +02:00
"execution_count": 13,
2023-07-05 09:58:06 +02:00
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABOgAAAGsCAYAAABnzpg0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOy9ebhlVX3m/649nOGORU0UyCDgiCK0xCZ00kYjgdik022042OMGhIx2pAIJOiPtEEbkmii4NQgzpCgUYygTALFqMikBRRDMdVEQc3DHc+0p/X7Y6+19lp7OMO95w5V9f08Tz117rn77L32Pnvve9Z73vf7ZZxzDoIgCIIgCIIgCIIgCIIgFgRroQdAEARBEARBEARBEARBEAczJNARBEEQBEEQBEEQBEEQxAJCAh1BEARBEARBEARBEARBLCAk0BEEQRAEQRAEQRAEQRDEAkICHUEQBEEQBEEQBEEQBEEsICTQEQRBEARBEARBEARBEMQCQgIdQRAEQRAEQRAEQRAEQSwgzkIP4EAiiiJs27YNw8PDYIwt9HAIgiAIgiAIgiAIgiCIBYJzjqmpKRx++OGwrPYeORLo+si2bdtw5JFHLvQwCIIgCIIgCIIgCIIgiEXCSy+9hCOOOKLtMiTQ9ZHh4WEA8YEfGRlZ4NEQBEEQxP7LntoeHPeV44znNvz1BiwfXL5AIyIIgiAIgiCI3picnMSRRx6p9KJ2kEDXR2SsdWRkhAQ6giAIgpgFLbsFVMznhkeGMTJIf18JgiAIgiCI/YtuyqBRkwiCIAiCIAiCIAiCIAiCWEBIoCMIgiAIgiAIgiAIgiCIBYQEOoIgCIIgCIIgCIIgCIJYQEigIwiCIAiCIAiCIAiCIIgFhAQ6giAIgiAIgiAIgiAIglhASKAjCIIgCIIgCIIgCIIgiAWEBDqCIAiCIAiCIAiCIAiCWEBIoCMIgiAIgiAIgiAIgiCIBYQEOoIgCIIgCIIgCIIgCIJYQEigIwiCIAiCIAiCIAiCIIgFhAQ6giAIgiAIgiAIgiAIglhASKAjCIIgCIIgCIIgCIIgiAWEBDqCIAiCIAiCIAiCIAiCWEBIoCMIgiAIgiAIgiAWDX7kY/3YenDOF3ooBEEQ8wYJdARBEARBEARBEMSi4R8e+ge868Z34eaNNy/0UAiCIOYNEugIgiAIgiAIgiCIRcP1L1wPAPj6E19f4JEQBEHMHyTQEQRBEARBEARBEIsCPda6tLJ0AUdCEAQxv5BARxAEQRAEQRAEQSwK9jT2qMej5dEFHAlBEMT8QgIdQRAEQRAEQRAEsSjY29yrHjf8xgKOhCAIYn4hgY4gCIIgCIIgCIJYFDSDpno81hpbwJEQBEHMLyTQEQRBEARBEARBEIuCelBXj8db4ws3EIIgiHmGBDqCIAiCIAiCIAhiUaA76Mab4ws3EIIgiHmGBDqCIAiCIAiCIAhiUdAIkrpzXuQh4tECjoYgCGL+IIGOIAiCIAiCIAiCWBToDjoA8EJvgUZCEAQxv5BARxAEQRAEQRAEQSwKdAcdALTC1gKNhCAIYn5ZcIHula98JRhjmX/nnHMOAOBtb3tb5ncf/ehHjXVs2bIFZ555JgYGBrBy5UpceOGFCILAWObee+/Fm9/8ZpTLZbzqVa/C1VdfnRnLFVdcgVe+8pWoVCo45ZRT8Mgjj8zZfhMEQRAEQRAEQRAmzZAcdARBHJwsuED3q1/9Ctu3b1f/Vq9eDQD4X//rf6llzj77bGOZf/mXf1G/C8MQZ555JjzPwwMPPIBrrrkGV199NS6++GK1zKZNm3DmmWfi7W9/Ox5//HGcd955+PCHP4zbb79dLfPDH/4QF1xwAT796U/j0UcfxYknnogzzjgDu3btmoejQBAEQRAEQRAEQdT9uvEzOegIgjhYWHCBbsWKFVi1apX6d/PNN+O4447D7/zO76hlBgYGjGVGRkbU7+644w6sW7cO1157LU466SS8853vxKWXXoorrrgCnhd/23LVVVfhmGOOwWWXXYbXv/71OPfcc/Ge97wHX/ziF9V6Lr/8cpx99tk466yzcPzxx+Oqq67CwMAAvvOd7xSOvdVqYXJy0vhHEARBEARBEARBzAxy0BEEcbCy4AKdjud5uPbaa/Hnf/7nYIyp57/3ve9h+fLleOMb34iLLroI9XryrcqDDz6IE044AYceeqh67owzzsDk5CSefvpptcxpp51mbOuMM87Agw8+qLa7Zs0aYxnLsnDaaaepZfL47Gc/i9HRUfXvyCOPnN0BIAiCIAiCIAiCOIihGnQEQRysOAs9AJ2f/OQnGB8fx5/92Z+p5/7kT/4ERx99NA4//HA88cQT+OQnP4nnnnsO119/PQBgx44dhjgHQP28Y8eOtstMTk6i0WhgbGwMYRjmLvPss88Wjveiiy7CBRdcoH6enJwkkY4gCIIgCIIgCGKGpLu4kkBHEMTBwqIS6L797W/jne98Jw4//HD13Ec+8hH1+IQTTsBhhx2Gd7zjHdiwYQOOO+64hRimolwuo1wuL+gYCIIgCIIgiMXB5372LPbVWvjnd7/JSIMQBNE9aYGOIq4EQRwsLJqI64svvog777wTH/7wh9sud8oppwAA1q9fDwBYtWoVdu7caSwjf161alXbZUZGRlCtVrF8+XLYtp27jFwHQRAEQRAEQRRR9wJcdd8GXPfrl7F1vNH5BQRB5JKuQUcOOoIgDhYWjUD33e9+FytXrsSZZ57ZdrnHH38cAHDYYYcBAE499VQ8+eSTRrfV1atXY2RkBMcff7xa5q677jLWs3r1apx66qkAgFKphJNPPtlYJooi3HXXXWoZgiAIgiAIgihi23giKkTRAg6EIPZz0oIcOegIgjhYWBQCXRRF+O53v4sPfehDcJwkdbthwwZceumlWLNmDTZv3owbb7wRH/zgB/HWt74Vb3rTmwAAp59+Oo4//nh84AMfwNq1a3H77bfjU5/6FM455xwVP/3oRz+KjRs34hOf+ASeffZZXHnllbjuuutw/vnnq21dcMEF+OY3v4lrrrkGzzzzDD72sY+hVqvhrLPOmt+DQRAEQRAEQex36K65ZhAu4EgIYv8mLdCRg44giIOFRVGD7s4778SWLVvw53/+58bzpVIJd955J770pS+hVqvhyCOPxLvf/W586lOfUsvYto2bb74ZH/vYx3DqqadicHAQH/rQh3DJJZeoZY455hjccsstOP/88/HlL38ZRxxxBL71rW/hjDPOUMu8973vxe7du3HxxRdjx44dOOmkk3DbbbdlGkcQBEEQBEEQRJptukDnk0BHEDNFOuZsZiPkIQl0BEEcNCwKge70008H5zzz/JFHHon77ruv4+uPPvpo3HrrrW2Xedvb3obHHnus7TLnnnsuzj333I7bIwiCIAiCIAid7RNJxLXpU8aVIGaKFOSGS8MYb41TxJUgFhHjzXE8tfcp/JfD/wsstigCmQcUdEQJgiAIgiAIYpborjly0BHEzGkFsUA3UhqJfyYHHUEsGj542wfxsTs/hutfuH6hh3JAQgIdQRAEQRAEQcwSL0hccyTQEcTM0R10AOBFC++g45zjqrVXYfWLqxd6KASxoGya2AQAuGXjLQs8kgOTRRFxJQiCIAiCIIj9mUBr3doMKOJKEDNFRlqlQLcYHHTr9q3DFY9fAQB47AOPwbFoGk0c3IScvoiaC8hBRxAEQRAEQRCzxA+SesrkoCOImSMFuSF3CAAQRMFCDgcA0PCTJjDPjz2/gCMhiMVBGNHfubmABDqCIAiCIAiCmCW+5qBrkUBHEDMi4pGKtA66gwAAP/QXckgAgEaQCHRP7XlqAUdCEIsDP1r46/JAhAQ6giAIgiAIgpglfqg76CjiShAzQe/YKgW6gC+8g64W1NTjKW9qAUdCEIsDirjODSTQEQRBEARBEMQsCUJqEkEQs0WvN6cEukUWcV0MNfEIYiGIePJ3bjFclwciJNARBEEQBEEQxCzxdYEuIIGOIGaCdNBZzELFqQBYHEJAzU8cdCTQEQcretR7MVyXByIk0BEEQRAEQRDELNEjrg2PIq4EMROaYRMAULbLcC0
"text/plain": [
"<Figure size 1500x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig = plt.figure(figsize=(15, 5))\n",
"ax = fig.add_subplot(111)\n",
"ax.plot(data.get_waveforms(0).T)\n",
"ax.axvline(data.metadata[\"trace_Pg_arrival_sample\"].iloc[0], color=\"green\", lw=3)\n",
"# ax.axvline(data.metadata[\"trace_Sg_arrival_sample\"].iloc[0], color=\"black\", lw=3)"
]
},
{
"cell_type": "markdown",
"id": "1110dd5f-a6ff-4cb0-bd94-4904116e3233",
"metadata": {},
"source": [
"#### Check train/dev/test proportions"
]
},
{
"cell_type": "code",
2023-09-26 10:50:46 +02:00
"execution_count": 14,
2023-07-05 09:58:06 +02:00
"id": "bf7dae75-c90b-44f8-a51d-44e8abaaa3c3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training examples: 24738 69.1%\n",
"Development examples: 5508 15.4%\n",
"Test examples: 5538 15.5 %\n"
]
}
],
"source": [
"all_samples = len(data.train()) + len(data.dev()) + len(data.test())\n",
"print(f\"Training examples: {len(data.train())} {len(data.train())/all_samples * 100:.1f}%\" )\n",
"print(f\"Development examples: {len(data.dev())} {len(data.dev())/all_samples * 100:.1f}%\")\n",
"print(f\"Test examples: {len(data.test())} {len(data.test())/all_samples * 100:.1f} %\")"
]
},
{
"cell_type": "code",
2023-09-26 10:50:46 +02:00
"execution_count": 15,
2023-07-05 09:58:06 +02:00
"id": "de82db24-d983-4592-a0eb-f96beecb2f69",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>index</th>\n",
" <th>source_origin_time</th>\n",
" <th>source_latitude_deg</th>\n",
" <th>source_longitude_deg</th>\n",
" <th>source_depth_km</th>\n",
" <th>source_magnitude</th>\n",
" <th>split</th>\n",
" <th>station_network_code</th>\n",
" <th>station_code</th>\n",
" <th>trace_channel</th>\n",
" <th>trace_sampling_rate_hz</th>\n",
" <th>trace_start_time</th>\n",
" <th>trace_Pg_arrival_sample</th>\n",
" <th>trace_name</th>\n",
" <th>trace_Sg_arrival_sample</th>\n",
" <th>trace_chunk</th>\n",
" <th>trace_component_order</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>2020-01-01 10:09:42.200</td>\n",
" <td>5.702646e+06</td>\n",
" <td>5.582503e+06</td>\n",
" <td>0.7</td>\n",
" <td>2.469231</td>\n",
" <td>train</td>\n",
" <td>PL</td>\n",
" <td>BRDW</td>\n",
" <td>EHE</td>\n",
" <td>100.0</td>\n",
" <td>2020-01-01T10:09:36.480000Z</td>\n",
" <td>792.0</td>\n",
" <td>bucket0$0,:3,:2001</td>\n",
" <td>NaN</td>\n",
" <td></td>\n",
" <td>ZNE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>2020-01-01 10:09:42.200</td>\n",
" <td>5.702646e+06</td>\n",
" <td>5.582503e+06</td>\n",
" <td>0.7</td>\n",
" <td>2.469231</td>\n",
" <td>train</td>\n",
" <td>PL</td>\n",
" <td>BRDW</td>\n",
" <td>EHE</td>\n",
" <td>100.0</td>\n",
" <td>2020-01-01T10:09:36.480000Z</td>\n",
" <td>NaN</td>\n",
" <td>bucket0$1,:3,:2001</td>\n",
" <td>921.0</td>\n",
" <td></td>\n",
" <td>ZNE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>2020-01-01 10:09:42.200</td>\n",
" <td>5.702646e+06</td>\n",
" <td>5.582503e+06</td>\n",
" <td>0.7</td>\n",
" <td>2.469231</td>\n",
" <td>train</td>\n",
" <td>PL</td>\n",
" <td>GROD</td>\n",
" <td>EHE</td>\n",
" <td>100.0</td>\n",
" <td>2020-01-01T10:09:36.480000Z</td>\n",
" <td>872.0</td>\n",
" <td>bucket0$2,:3,:2001</td>\n",
" <td>NaN</td>\n",
" <td></td>\n",
" <td>ZNE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>2020-01-01 10:09:42.200</td>\n",
" <td>5.702646e+06</td>\n",
" <td>5.582503e+06</td>\n",
" <td>0.7</td>\n",
" <td>2.469231</td>\n",
" <td>train</td>\n",
" <td>PL</td>\n",
" <td>GROD</td>\n",
" <td>EHE</td>\n",
" <td>100.0</td>\n",
" <td>2020-01-01T10:09:36.480000Z</td>\n",
" <td>NaN</td>\n",
" <td>bucket0$3,:3,:2001</td>\n",
" <td>1017.0</td>\n",
" <td></td>\n",
" <td>ZNE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>2020-01-01 10:09:42.200</td>\n",
" <td>5.702646e+06</td>\n",
" <td>5.582503e+06</td>\n",
" <td>0.7</td>\n",
" <td>2.469231</td>\n",
" <td>train</td>\n",
" <td>PL</td>\n",
" <td>GUZI</td>\n",
" <td>CNE</td>\n",
" <td>100.0</td>\n",
" <td>2020-01-01T10:09:36.476000Z</td>\n",
" <td>864.0</td>\n",
" <td>bucket0$4,:3,:2001</td>\n",
" <td>NaN</td>\n",
" <td></td>\n",
" <td>ZNE</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
2023-09-26 10:50:46 +02:00
" index source_origin_time source_latitude_deg source_longitude_deg \\\n",
"0 0 2020-01-01 10:09:42.200 5.702646e+06 5.582503e+06 \n",
2023-07-05 09:58:06 +02:00
"1 1 2020-01-01 10:09:42.200 5.702646e+06 5.582503e+06 \n",
"2 2 2020-01-01 10:09:42.200 5.702646e+06 5.582503e+06 \n",
"3 3 2020-01-01 10:09:42.200 5.702646e+06 5.582503e+06 \n",
"4 4 2020-01-01 10:09:42.200 5.702646e+06 5.582503e+06 \n",
"\n",
2023-09-26 10:50:46 +02:00
" source_depth_km source_magnitude split station_network_code station_code \\\n",
"0 0.7 2.469231 train PL BRDW \n",
2023-07-05 09:58:06 +02:00
"1 0.7 2.469231 train PL BRDW \n",
"2 0.7 2.469231 train PL GROD \n",
"3 0.7 2.469231 train PL GROD \n",
"4 0.7 2.469231 train PL GUZI \n",
"\n",
2023-09-26 10:50:46 +02:00
" trace_channel trace_sampling_rate_hz trace_start_time \\\n",
"0 EHE 100.0 2020-01-01T10:09:36.480000Z \n",
2023-07-05 09:58:06 +02:00
"1 EHE 100.0 2020-01-01T10:09:36.480000Z \n",
"2 EHE 100.0 2020-01-01T10:09:36.480000Z \n",
"3 EHE 100.0 2020-01-01T10:09:36.480000Z \n",
"4 CNE 100.0 2020-01-01T10:09:36.476000Z \n",
"\n",
2023-09-26 10:50:46 +02:00
" trace_Pg_arrival_sample trace_name trace_Sg_arrival_sample \\\n",
"0 792.0 bucket0$0,:3,:2001 NaN \n",
2023-07-05 09:58:06 +02:00
"1 NaN bucket0$1,:3,:2001 921.0 \n",
"2 872.0 bucket0$2,:3,:2001 NaN \n",
"3 NaN bucket0$3,:3,:2001 1017.0 \n",
"4 864.0 bucket0$4,:3,:2001 NaN \n",
"\n",
" trace_chunk trace_component_order \n",
"0 ZNE \n",
"1 ZNE \n",
"2 ZNE \n",
"3 ZNE \n",
"4 ZNE "
]
},
2023-09-26 10:50:46 +02:00
"execution_count": 15,
2023-07-05 09:58:06 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.metadata.head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "37fe0dd1-ba9b-46ff-9abd-eb40f73649e3",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ccd6908-cff7-42b2-a6a3-51ac0557a7dc",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2023-09-26 10:50:46 +02:00
"version": "3.10.6"
2023-07-05 09:58:06 +02:00
}
},
"nbformat": 4,
"nbformat_minor": 5
}