Added scripts converting mseeds from Bogdanka to seisbench format, extended readme, modidified logging
This commit is contained in:
@@ -2,10 +2,9 @@
|
||||
|
||||
|
||||
This repo contains notebooks and scripts demonstrating how to:
|
||||
- Prepare IGF data for training a seisbench model detecting P phase (i.e. transform mseeds into [SeisBench data format](https://seisbench.readthedocs.io/en/stable/pages/data_format.html)), check the [notebook](utils/Transforming%20mseeds%20to%20SeisBench%20dataset.ipynb).
|
||||
|
||||
- Explore available data, check the [notebook](notebooks/Explore%20igf%20data.ipynb)
|
||||
- Train various cnn models available in seisbench library and compare their performance of detecting P phase, check the [script](scripts/pipeline.py)
|
||||
- Prepare data for training a seisbench model detecting P and S waves (i.e. transform mseeds into [SeisBench data format](https://seisbench.readthedocs.io/en/stable/pages/data_format.html)), check the [notebook](utils/Transforming%20mseeds%20from%20Bogdanka%20to%20Seisbench%20format.ipynb) and the [script](utils/mseeds_to_seisbench.py)
|
||||
- [to update] Explore available data, check the [notebook](notebooks/Explore%20igf%20data.ipynb)
|
||||
- Train various cnn models available in seisbench library and compare their performance of detecting P and S waves, check the [script](scripts/pipeline.py)
|
||||
|
||||
- [to update] Validate model performance, check the [notebook](notebooks/Check%20model%20performance%20depending%20on%20station-random%20window.ipynb)
|
||||
- [to update] Use model for detecting P phase, check the [notebook](notebooks/Present%20model%20predictions.ipynb)
|
||||
@@ -68,31 +67,68 @@ poetry shell
|
||||
WANDB_HOST="https://epos-ai.grid.cyfronet.pl/"
|
||||
WANDB_API_KEY="your key"
|
||||
WANDB_USER="your user"
|
||||
WANDB_PROJECT="training_seisbench_models_on_igf_data"
|
||||
WANDB_PROJECT="training_seisbench_models"
|
||||
BENCHMARK_DEFAULT_WORKER=2
|
||||
|
||||
2. Transform data into seisbench format. (unofficial)
|
||||
* Download original data from the [drive](https://drive.google.com/drive/folders/1InVI9DLaD7gdzraM2jMzeIrtiBSu-UIK?usp=drive_link)
|
||||
* Run the notebook: `utils/Transforming mseeds to SeisBench dataset.ipynb`
|
||||
2. Transform data into seisbench format.
|
||||
|
||||
To utilize functionality of Seisbench library, data need to be transformed to [SeisBench data format](https://seisbench.readthedocs.io/en/stable/pages/data_format.html)). If your data is in the MSEED format, you can use the prepared script `mseeds_to_seisbench.py` to perform the transformation. Please make sure that your data has the same structure as the data used in this project.
|
||||
The script assumes that:
|
||||
* the data is stored in the following directory structure:
|
||||
`input_path/year/station_network_code/station_code/trace_channel.D` e.g.
|
||||
`input_path/2018/PL/ALBE/EHE.D/`
|
||||
* the file names follow the pattern:
|
||||
`station_network_code.station_code..trace_channel.D.year.day_of_year`
|
||||
e.g. `PL.ALBE..EHE.D.2018.282`
|
||||
* events catalog is stored in quakeML format
|
||||
|
||||
Run the script `mseeds_to_seisbench` located in the `utils` directory
|
||||
|
||||
3. Run the pipeline script:
|
||||
```
|
||||
cd utils
|
||||
python mseeds_to_seisbench.py --input_path $input_path --catalog_path $catalog_path --output_path $output_path
|
||||
```
|
||||
If you want to run the script on a cluster, you can use the script `convert_data.sh` as a template (adjust the grant name, computing name and paths) and send the job to queue using sbatch command on login node of e.g. Ares:
|
||||
|
||||
```
|
||||
cd utils
|
||||
sbatch convert_data.sh
|
||||
```
|
||||
|
||||
If your data has a different structure or format, use the notebooks to gain an understanding of the Seisbench format and what needs to be done to transform your data:
|
||||
* [Seisbench example](https://colab.research.google.com/github/seisbench/seisbench/blob/main/examples/01a_dataset_basics.ipynb) or
|
||||
* [Transforming mseeds from Bogdanka to Seisbench format](utils/Transforming mseeds from Bogdanka to Seisbench format.ipynb) notebook
|
||||
|
||||
|
||||
`python pipeline.py`
|
||||
3. Adjust the `config.json` and specify:
|
||||
* `dataset_name` - the name of the dataset, which will be used to name the folder with evaluation targets and predictions
|
||||
* `data_path` - the path to the data in the Seisbench format
|
||||
* `experiment_count` - the number of experiments to run for each model type
|
||||
|
||||
|
||||
4. Run the pipeline script
|
||||
`python pipeline.py`
|
||||
|
||||
The script performs the following steps:
|
||||
* Generates evaluation targets
|
||||
* Generates evaluation targets in `datasets/<dataset_name>/targets` directory.
|
||||
* Trains multiple versions of GPD, PhaseNet and ... models to find the best hyperparameters, producing the lowest validation loss.
|
||||
|
||||
This step utilizes the Weights & Biases platform to perform the hyperparameters search (called sweeping) and track the training process and store the results.
|
||||
The results are available at
|
||||
`https://epos-ai.grid.cyfronet.pl/<your user name>/<your project name>`
|
||||
* Uses the best performing model of each type to generate predictions
|
||||
* Evaluates the performance of each model by comparing the predictions with the evaluation targets
|
||||
* Saves the results in the `scripts/pred` directory
|
||||
*
|
||||
The default settings are saved in config.json file. To change the settings, edit the config.json file or pass the new settings as arguments to the script.
|
||||
For example, to change the sweep configuration file for GPD model, run:
|
||||
`python pipeline.py --gpd_config <new config file>`
|
||||
The new config file should be placed in the `experiments` or as specified in the `configs_path` parameter in the config.json file.
|
||||
The results are available at
|
||||
`https://epos-ai.grid.cyfronet.pl/<WANDB_USER>/<WANDB_PROJECT>`
|
||||
Weights and training logs can be downloaded from the platform.
|
||||
Additionally, the most important data are saved locally in `weights/<dataset_name>_<model_name>/ ` directory:
|
||||
* Weights of the best checkpoint of each model are saved as `<dataset_name>_<model_name>_sweep=<sweep_id>-run=<run_id>-epoch=<epoch_number>-val_loss=<val_loss>.ckpt`
|
||||
* Metrics and hyperparams are saved in <run_id> folders
|
||||
|
||||
* Uses the best performing model of each type to generate predictions. The predictons are saved in the `scripts/pred/<dataset_name>_<model_name>/<run_id>` directory.
|
||||
* Evaluates the performance of each model by comparing the predictions with the evaluation targets.
|
||||
The results are saved in the `scripts/pred/results.csv` file.
|
||||
|
||||
The default settings are saved in config.json file. To change the settings, edit the config.json file or pass the new settings as arguments to the script.
|
||||
For example, to change the sweep configuration file for GPD model, run:
|
||||
`python pipeline.py --gpd_config <new config file>`
|
||||
The new config file should be placed in the `experiments` folder or as specified in the `configs_path` parameter in the config.json file.
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
|
||||
Reference in New Issue
Block a user