Dataset Documentation for the 2015 DCLDE Workshop

The DCLDE 2015 dataset consists of data from multiple deployments of high-frequency acoustic recording packages (Wiggins and Hildebrand, 2007) deployed in the Southern California Bight. Separate sets of development data are provided for mysticetes and odontocetes. The mysticete data have been decimated to 1 and 1.6 kHz bandwidth and the odontocete data bandwidth consists of data with 100 and 160 kHz of bandwidth. Data were selected to cover all four seasons and from multiple locations. If you would like to learn how to access these datasets, please check out Dataset Retrieval.

High-Frequency Data

This full-bandwidth data sets consists of annotated data from multiple odontocete species:

  • Berardius bairdii - Baird’s beaked whale
  • Ziphius cavirostris - Cuvier’s beaked whale
  • Physeter macrorhynchus - Sperm whale
  • Lagenorhynchus obliquidens - Pacific white-sided dolphin
  • Grampus griseus - Risso’s dolphin
  • Phocoenidae - unspecified porpoise
  • Odontoceti - odontocete other than those described above

The goal for this dataset is to identify acoustic encounters of a species during times when animals were echolocating. Analysts examined data for echolocation and approximated the start and end times of acoustic encounters. Any period that was separated from another one by five minutes or more was marked as a separate encounter. Whistle activity was not considered. Consequently, while the use of whistle information during echolocation activity is appropriate, reporting a species based on whistles in the absence of echolocation activity will be considered a false positive for this classification task.

Low-Frequency Data

The dataset consists of annotated data for specific calls from two mysticete species:

  • Balaenoptera musculus - blue whale D calls (Thompson, 1965)
  • Balaenoptera physalus - fin whale 40 Hz calls (Watkins, 1981)

The goal for this dataset is to identify specific blue whale D and fin whale 40 Hz calls.

Data Format

Acoustic data are provided as wav files, with the filename encoding the site, deployment, and starting timestamp of each file.

High frequency example: CINMS17B_DL37_111226_042730.x.wav

  • CINMS17B indicates the 17th deployment at Channel Islands National Marine Sanctuary, this deployment at site B. Other project names are SOCAL (Southern California) and DCPP (Diablo Canyon Power Plant).
  • DL37 - Is the data logger identifier.
  • Recording started at 2011 December 26 at 04:27:30. All times are UTC.

Low frequency files are similar but contain additional fields in the filename related to the decimation.

Deployment Locations

Data are provided from seven different locations recorded between 2009-2013 offshore Southern California as shown in the figure below. The accompanying table lists the coordinates, and depth of the various sites. Time periods should be inferred directly from the data as the low- and high- frequency datasets sample different times.

Deployment Locations
Project Site Deployment (Preamp) Depth (m) Sample Rate (kHz) Latitude Longitude
CINMS B 17 (646)
18 (618)
600 200 34-17.0 N 120-01.7 W
CINMS C 18 (645)
19 (669)
800 320 34-19.5 N 120-48.4 W
DCPP A 1 (688) 65 320 35-36.7 N 121-14.5 W
DCPP B 1 (686) 100 320 35-09.6 N 120-53.1 W
DCPP C 1 (682) 1000 200 35-24.0 N 121-33.8 W
SOCAL E 32 (452)
33 (481)
1300 200 32-39.4 N 119-28.4 W
SOCAL R 35 (567)
38 (591)
1200 200 33-09.6 N 120-00.6 W


Preamplifiers for HARPs have been calibrated and two Matlab routines have been developed and will be provided along with the data to show how to apply the appropriate transfer function. All necessary files (including the Matlab functions) are available for download.

  • gettransferfn(filename, BinsHz) - Assuming that the transfer function folder is in the same folder as this function, it will parse the filename and load the appropriate transfer function. The function will be sampled at the frequency bin center frequencies provided in BinHz and the appropriate offsets will be returned.
  • tfadjustexample() - This simple function prompts the user for a filename, reads the first 1/10th of a second of data and produces a plot of sound pressure level after applying the transfer function.

Annotation and Determining Results

We are using comma separated value files as input to routines that compute the precision and recall as well as coverage and fragmentation for encounters (see Roch et al., 2011 for details). The following species abbreviations should be used:

Abbreviation Species
Bb Berardius bairdii - Baird’s beaked whale
Zc Ziphius cavirostris - Cuvier’s beaked whale
Pm Physeter macrorhynchus - sperm whale
Lo Lagenorhynchus obliquidens - Pacific white-sided dolphin
Gg Grampus griseus - Risso's dolphin
UPP Phocoenidae - unspecified porpoise
UO unidentified odontocete
Bm Balaenoptera musculus - blue whale
Bp Balaenoptera physalus - fin whale

For encounter level tests, the result file should contain comma separated value (CSV) entries with each line as follows:

project, site, species-abbreviation, start-time, end-time

Time stamps are provided as follows: YYYY-MM-DDTHH:MM:SS with an optional decimal and fractional seconds following the seconds field:

Example for Risso’s dolphin detection at CINMS site B: CINMS, B, Gg, 2011-12-27T15:51:47.0, 2011-12-26T16:59.07.0

Call level results for blue and fin whales are similar, with the addition of a final call name which is either “D” or “40Hz”:

DCPP, C, Bp, 2013-02-04T15:13:15.8, 2013-02-04T15:13:16.3, 40Hz

Spaces between fields may be included or omitted. A scoring script will be provided by the conference organizers in March so that participants can evaluate their algorithms’ performance on the development data. Ground truth data based on trained analyst annotations is provided for the development data set.

A separate evaluation data set will be provided in May without answers, and participants wishing to be part of the algorithm comparison will be able to submit their detector’s CSV files via the conference web site. The evaluation dataset will contain additional weeks of data from the sites that have been included in the development set and data from a site that was not present in the development set.

Literature Cited:

Roch, M. A., Brandes, T. S., Patel, B., Barkley, Y., Baumann-Pickering, S. and Soldevilla, M. S. (2011). Automated extraction of odontocete whistle contours. J Acoust Soc Am 130, 2212-23, doi:10.1121/1.3624821.

Thompson, P. O. (1965). Marine biological sound west of San Clemente Island: diurnal distributions and effects on ambient noise level during July 1963. In US Navy Electronics Laboratory Report, pp. 1-42. San Diego, CA.

Watkins, W. A. (1981). Activities and underwater sounds of fin whales (Balaenoptera physalus). Sci. Rep. Whales Research Inst. Tokyo 33, 83-118.

Wiggins, S. M. and Hildebrand, J. A. (2007). High-frequency Acoustic Recording Package (HARP) for broad-band, long-term marine mammal monitoring. In Intl. Symp. Underwater Tech., pp. 551-557. Tokyo, Japan.