The DCLDE 2015 dataset consists of data from multiple deployments of high-frequency acoustic recording packages (Wiggins and Hildebrand, 2007) deployed in the Southern California Bight. Separate sets of development data are provided for mysticetes and odontocetes. The mysticete data have been decimated to 1 and 1.6 kHz bandwidth and the odontocete data bandwidth consists of data with 100 and 160 kHz of bandwidth. Data were selected to cover all four seasons and from multiple locations. If you would like to learn how to access these datasets, please check out Dataset Retrieval.
This full-bandwidth data sets consists of annotated data from multiple odontocete species:
The goal for this dataset is to identify acoustic encounters of a species during times when animals were echolocating. Analysts examined data for echolocation and approximated the start and end times of acoustic encounters. Any period that was separated from another one by five minutes or more was marked as a separate encounter. Whistle activity was not considered. Consequently, while the use of whistle information during echolocation activity is appropriate, reporting a species based on whistles in the absence of echolocation activity will be considered a false positive for this classification task.
The dataset consists of annotated data for specific calls from two mysticete species:
The goal for this dataset is to identify specific blue whale D and fin whale 40 Hz calls.
Acoustic data are provided as wav files, with the filename encoding the site, deployment, and starting timestamp of each file.
High frequency example: CINMS17B_DL37_111226_042730.x.wav
Low frequency files are similar but contain additional fields in the filename related to the decimation.
Data are provided from seven different locations recorded between 2009-2013 offshore Southern California as shown in the figure below. The accompanying table lists the coordinates, and depth of the various sites. Time periods should be inferred directly from the data as the low- and high- frequency datasets sample different times.
Project | Site | Deployment (Preamp) | Depth (m) | Sample Rate (kHz) | Latitude | Longitude |
---|---|---|---|---|---|---|
CINMS | B | 17 (646) 18 (618) |
600 | 200 | 34-17.0 N | 120-01.7 W |
CINMS | C | 18 (645) 19 (669) |
800 | 320 | 34-19.5 N | 120-48.4 W |
DCPP | A | 1 (688) | 65 | 320 | 35-36.7 N | 121-14.5 W |
DCPP | B | 1 (686) | 100 | 320 | 35-09.6 N | 120-53.1 W |
DCPP | C | 1 (682) | 1000 | 200 | 35-24.0 N | 121-33.8 W |
SOCAL | E | 32 (452) 33 (481) |
1300 | 200 | 32-39.4 N | 119-28.4 W |
SOCAL | R | 35 (567) 38 (591) |
1200 | 200 | 33-09.6 N | 120-00.6 W |
Preamplifiers for HARPs have been calibrated and two Matlab routines have been developed and will be provided along with the data to show how to apply the appropriate transfer function. All necessary files (including the Matlab functions) are available for download.
We are using comma separated value files as input to routines that compute the precision and recall as well as coverage and fragmentation for encounters (see Roch et al., 2011 for details). The following species abbreviations should be used:
Abbreviation | Species |
---|---|
Bb | Berardius bairdii - Baird’s beaked whale |
Zc | Ziphius cavirostris - Cuvier’s beaked whale |
Pm | Physeter macrorhynchus - sperm whale |
Lo | Lagenorhynchus obliquidens - Pacific white-sided dolphin |
Gg | Grampus griseus - Risso's dolphin |
UPP | Phocoenidae - unspecified porpoise |
UO | unidentified odontocete |
Bm | Balaenoptera musculus - blue whale |
Bp | Balaenoptera physalus - fin whale |
For encounter level tests, the result file should contain comma separated value (CSV) entries with each line as follows:
project, site, species-abbreviation, start-time, end-time
Time stamps are provided as follows: YYYY-MM-DDTHH:MM:SS with an optional decimal and fractional seconds following the seconds field:
Example for Risso’s dolphin detection at CINMS site B: CINMS, B, Gg, 2011-12-27T15:51:47.0, 2011-12-26T16:59.07.0
Call level results for blue and fin whales are similar, with the addition of a final call name which is either “D” or “40Hz”:
DCPP, C, Bp, 2013-02-04T15:13:15.8, 2013-02-04T15:13:16.3, 40Hz
Spaces between fields may be included or omitted. A scoring script will be provided by the conference organizers in March so that participants can evaluate their algorithms’ performance on the development data. Ground truth data based on trained analyst annotations is provided for the development data set.
A separate evaluation data set will be provided in May without answers, and participants wishing to be part of the algorithm comparison will be able to submit their detector’s CSV files via the conference web site. The evaluation dataset will contain additional weeks of data from the sites that have been included in the development set and data from a site that was not present in the development set.
Roch, M. A., Brandes, T. S., Patel, B., Barkley, Y., Baumann-Pickering, S. and Soldevilla, M. S. (2011). Automated extraction of odontocete whistle contours. J Acoust Soc Am 130, 2212-23, doi:10.1121/1.3624821.
Thompson, P. O. (1965). Marine biological sound west of San Clemente Island: diurnal distributions and effects on ambient noise level during July 1963. In US Navy Electronics Laboratory Report, pp. 1-42. San Diego, CA.
Watkins, W. A. (1981). Activities and underwater sounds of fin whales (Balaenoptera physalus). Sci. Rep. Whales Research Inst. Tokyo 33, 83-118.
Wiggins, S. M. and Hildebrand, J. A. (2007). High-frequency Acoustic Recording Package (HARP) for broad-band, long-term marine mammal monitoring. In Intl. Symp. Underwater Tech., pp. 551-557. Tokyo, Japan.