blog

ATCO2 at interspeech 2021

2021-08-20T08:09:12Z

Figure 1. Interspeech 2021 will be held between August 30 and September 3, 2021.

This blog post will shortly review each of the three research papers ATCO2 will present on-site during INTERSPEECH. The first paper is related to the language used during the ATC communication.

“Detecting English Speech in the Air Traffic Control Voice Communication” by @ReplayWell

We launched a community platform for collecting the ATC speech world-wide in the ATCO2 project. Filtering out unseen non-English speech is one of the main components in the data processing pipeline. The proposed English Language Detection (ELD) system is based on the embeddings from a Bayesian subspace multinomial model. It is trained on the word confusion network from an ASR system. It is robust, easy to train, and light weighted. We achieved 0.0439 equal-error-rate (EER), a 50% relative reduction as compared to the state-of-the-art acoustic ELD system based on x-vectors, in the in-domain scenario. Further, we achieved an EER of 0.1352, a 33% relative reduction as compared to the acoustic ELD, in the unseen language (out-of-domain) condition. We plan to publish the evaluation dataset from the ATCO2 project.

Further information in the following links:
Teaser: https://www.youtube.com/watch?v=qj42c4qmmAc
Abstract: https://arxiv.org/abs/2104.02332 and,
Paper: https://arxiv.org/abs/2104.02332

“Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition” by @Brno University of Technology

Contextual adaptation is a technique of “suggesting” small snippets of text that are likely to appear in the speech recognition output. The snippets of text are derived from the current “situation” of the speaker, in our project ATCO this is location and time. The location and time are then used to query from OpenSky Network a list of callsigns (airplanes) that match these two inputs.

Applying Automatic Speech Recognition (ASR) to the Air Traffic Control domain (ATC) is difficult due to factors like : noisy radio channels, foreign accents, cross-language code-switching, very fast speech rate, and also situation-dependent vocabulary with many infrequent words. All this combined leads to error rates that make it difficult to apply speech recognition.

For ASR in ATC, contextual adaptation is beneficial. For instance, we can use a list of airplanes that are nearby. From an airport identity, we can derive local waypoints, local geographical names, phrases in local language etc. It is important that the adaptation is dynamic, i.e. the adaptation snippets of text do change over time. And, the adaptation also has to be light-weight, so it should not require rebuilding the recognition network from scratch. We use the snippets of text by means of Weighted Finite State Transducer (WFST) composition. An example of a biasing FST is shown in Figure 2.

Figure 2. “Toy-example” topology of a biasing WFST graph for boosting the ASR’s recognition network. The boosted callsign is ‘CSA one two three alfa bravo’.

Further information in the following link:

Paper: Boosting of contextual information in ASR for air-traffic call-sign recognition

‘Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems’ by @Idiap Research Institute

Air traffic management and specifically air-traffic control (ATC) rely mostly on voice communications between Air Traffic Controllers (ATCos) and pilots. In most cases, these voice communications follow a well-defined grammar that could be leveraged in Automatic Speech Recognition (ASR) technologies. The callsign used to address an airplane is an essential part of all ATCo-pilot communications. We propose a two-steps approach to add contextual knowledge during semi-supervised training to reduce the ASR system error rates at recognizing the part of the utterance that contains the callsign. Initially, we represent in a WFST the contextual knowledge (i.e. air-surveillance data) of an ATCo-pilot communication. Then, during Semi-Supervised Learning (SSL) the contextual knowledge is added by second-pass decoding (i.e. lattice rescoring). Results show that 'unseen domains' (e.g. data from airports not present in the supervised training data) are further aided by contextual SSL when compared to standalone SSL. For this task, we introduce the Callsign Word Error Rate (CA-WER) as an evaluation metric, which only assesses ASR performance of the spoken callsign in an utterance. We obtained a 32.1% CA-WER relative improvement applying SSL with an additional 17.5% CA-WER improvement by adding contextual knowledge during SSL on a challenging ATC-based test set gathered from LiveATC.

Figure 3. Process of retrieving a list of callsigns (contextual data) from OpenSky Network. The contextual data is the compendium of all possible verbalized versions of each callsign.

Further information in the following links:
Paper: Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems

Satellite Workshop – Automatic Speech Recognition in Air Traffic Management (ASR-ATM)

2021-08-12T11:40:00Z

The workshop is organized as a hybrid event:

For In-Person attendance the event takes place at Faculty of Information Technology, Brno University of Technology at Božetěchova 1/2, 612 00 Brno (in room G108). If you want to join in-person please contact: petr dot motlicek at idiap dot ch

If you prefer to participate virtually contact Hartmut.Helmke@dlr.de for the meeting link and access details

Agenda:

14:00
Hartmut Helmke (DLR)
Overview and Introduction of the Satellite Workshop

14:15
Raquel Garcia Lasheras (CRIDA), Adrián Fabio, Fernando Celorrio, Juan Albarrán, Nadal Ceñal, Carlos Pinto de Oliveira, Cristina Bárcena Martín, Julián Chaves Cáceres and Mhamed Fillal Kilch
Flight callsign identification on a Controller Working Position

14:30
Iuliia Nigmatulina (Idiap), Rudolf Braun, Juan Pablo Zuluaga and Petr Motlicek
Improving callsign recognition with air-surveillance data in air-traffic communication

14:45
Shuo Chen (MITRE), Hunter Kopald, Weiye Ma, Robert Tarakan and Yuan-Jun Wei
Air Traffic Control Speech Recognition

15:00
Amrutha Prasad (Idiap), Juan Pablo Zuluaga, Petr Motlicek, Oliver Ohneiser, Hartmut Helmke, Seyyed Saeed Sarfjoo and Iuliia Nigmatulina
Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR

Break of 30 minutes

15:45
Fernando Celorrio (CRIDA), Adrián Fabio, Raquel Garcia Lasheras and Carlos Pinto de Oliveira
Workload calculation based on Air Traffic Controllers’ utterances using Automated Speech Recognition

16:00
Petr Motlicek (Idiap), Pavel Kolcarek
The ATCO2 project: Creating a commonly available data base for training ATC ASR models

16:10
Pavel Kolcarek (Honeywell CZ):
Honeywell’s roadmap for a perfect Speech Recognition Engine for ATC on-board application

16:20
Hartmut Helmke (DLR), Shruthi Shetty, Matthias Kleinert, Oliver Ohneiser, Heiko Ehr, Amrutha Prasad, Petr Motlicek, Aneta Cerna and Christian Windisch
How to Measure Speech Recognition Performance in the Air Traffic Control domain? The Word Error Rate is only half of the truth!

16:35
Approx. 25 minutes of Questions and Answers and the Future of ASR in ATM

17:00
End

Special session on Interspeech 2021 conference

2021-08-09T13:56:35Z

Programme:

The organised session is dedicated to automatic speech recognition in air-traffic management, and the following agenda of the session has been released:

Thu-M-SS-2 Thursday, September 2, 11:00-13:00 Special-Hybrid: Automatic Speech Recognition in Air Traffic Management

11:00 Introduction
11:10 Thu-M-SS-2-1 333 Towards an Accent-Robust Approach for ATC Communications Transcription, Nataly Jahchan, Florentin Barbier, Ariyanidevi Dharma Gita, Khaled Khelif and Estelle Delpech
11:25 Thu-M-SS-2-2 1033 Detecting English Speech in the Air Traffic Control Voice Communication, Igor Szöke, Santosh Kesiraju, Ondřej Novotný, Martin Kocour, Karel Veselý and Jan Černocký
11:40 Thu-M-SS-2-6 935 Robust Command Recognition for Lithuanian Air Traffic Control Tower Utterances, Oliver Ohneiser, Saeed Sarfjoo, Hartmut Helmke, Shruthi Shetty, Petr Motlicek, Matthias Kleinert, Heiko Ehr and Šarūnas Murauskas
11:55 Thu-M-SS-2-3 1373 Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Karel Veselý, Martin Kocour and Igor Szöke
12:10 Thu-M-SS-2-4 1619 Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition, Martin Kocour, Karel Veselý, Alexander Blatt, Juan Zuluaga Gomez, Igor Szöke, Jan Černocký, Dietrich Klakow and Petr Motlíček
12:25 Thu-M-SS-2-5 1650 Modeling the Effect of Military Oxygen Masks on Speech Characteristics, Benjamin Elie, Jodie Gauvain, Jean-Luc Gauvain and Lori Lamel
12:40 Pannel discussion

Goal of the special session:

Air-traffic management is a dedicated domain where in addition to using the voice signal, other contextual information (i.e. air traffic surveillance data, meteorological data, etc.) plays an important role. Automatic speech recognition is the first challenge in the whole chain. Further processing usually requires transforming the recognized word sequence into the conceptual form, a more important application in ATM. This also means that the usual metrics for evaluating ASR systems (e.g. word error rate) are less important, and other performance criteria (i.e. objective such as command recognition error rate, callsign detection accuracy, overall algorithmic delay, real-time factor, or reduced flight times, or subjective such as decrease of a workload of the users) are employed.

Main objective:

This special session is to bring together ATM players (both academic and industrial) interested in ASR and ASR researchers looking for new challenges. This can accelerate near future R&D plans to enable an integration of speech technologies to the challenging, but highly safety oriented air-traffic management domain.

Organizers:

The organisation is split among two persons (Hartmut Helmke (DLR, coordinator of HAAWAII project) and Pavel Kolcarek (Honeywell, topic manager of ATCO2 project).

Contextual adaptation for improving call sign recognition

2021-07-19T11:47:16Z

For ATC ASR contextual adaptation is beneficial. For instance, we can use a list of airplanes that are nearby. From an airport identity, we can derive local waypoints, local geographical names, phrases in local language etc. It is important that the adaptation is dynamic, i.e. the adaptation snippets of text do change over time. And, the adaptation also has to be light-weight, so it should not require rebuilding the recognition network from scratch. We use the snippets of text by means of Weighted Finite State Transducer (WFST) composition.

HCLG boosting

We apply the on-the-fly boosting to the HCLG graph. The HCLG graph is the recognition network which defines the paths that the beam-search HMM decoder will be exploring. This graph contains costs that can be altered. We do this by WFST composition applied as:

HCLG’ = HCLG o B.

The composition is marked with operator ‘o’ and its algorithm is described in ^{^[1]}. Informally, the output symbols of left operand are coupled (matched) with input symbols of right operand. The weights from both graphs are recombined in a way defined by the semi-ring of WFST weights. The result is a single graph having input symbols from left operand and output symbols of right operand. An example of boosting graph B is in Figure 1.

^{^[1]} Mehryar Mohri, Fernando Pereira, Michael Riley: Weighted finite-state transducers in speech recognition. Comput. Speech Lang. 16(1): 69-88 (2002)

^[1] Keith B. Hall, Eunjoon Cho, Cyril Allauzen, Françoise Beaufays, Noah Coccaro, Kaisuke Nakajima, Michael Riley, Brian Roark, David Rybach, Linda Zhang: Composition-based on-the-fly rescoring for salient n-gram biasing. INTERSPEECH 2015: 1418-1422

^{^[1]} Mehryar Mohri, Fernando Pereira, Michael Riley: Weighted finite-state transducers in speech recognition. Comput. Speech Lang. 16(1): 69-88 (2002)

Figure 1. “Toy-example” topology of a WFST graph B for boosting the recognition network HCLG.
The boosting is done as composition: HCLG’ = HCLG o B, which introduces
the score discounts into the HCLG recognition network.

As you can see, we are boosting individual words. We cannot boost whole phrases, since such composition would require a lot of computation time. Also, we should not boost common words that are likely to be present in the lattice anyway. So, here we boost only ‘rare’ words like airline designators from callsigns (e.g. ‘air_berlin’). For the future, we think of boosting waypoints, local names and frequent phrases in local language.

Lattice boosting

In lattice boosting we have more freedom for designing the boosting graphs. The lattice is a relatively small graph compared to the HCLG graph, plus lattices are acyclic graphs. All this combined leads to faster runtimes of the composition operation. So, the boosting graph B can encode many word-sequences that obtain the score discount only if the whole word-sequence is matched in the lattice, when doing the WFST composition.

Similarly to previous section, the composition is done as:

L’ = L o B

where L is the input lattice, B is a boosting graph from Figure 2 and L’ is the output lattice with score discounts introduced by the composition.

Figure 2. A “toy-example” topology of a WFST graph B for boosting lattices (speech-to-text
output with alternative hypotheses). The boosting is done as composition: L’ = L o B,
which introduces score discounts for word-sequences that we decided to boost.
These word sequences represent the contextual information.

The lattice boosting is specific for each utterance, the composition is run in batch mode for a whole test-set. The toy-example in Figure 2 has a “lower part” with all the words in a lexicon in parallel; this makes sure no word sequence is dropped by the composition. There is also a phi symbol #0 on the ”entrance” to the lower part. The “upper part” encodes word sequences that we want to boost (e.g. call signs), the score discounts -4 or -8 are on the word links. As we use the phi symbol #0 in the composition, the lower part is accessed only if the partial word sequence in the lattice cannot be matched with the “upper part” of the B graph (the part with discounts).

Results

The experiments with HCLG boosting and Lattice boosting are summarized in the paper we submitted to the conference Interspeech 2021. Here, we share the main table from the paper:

The table contains both Word Error Rate results (WER) and Call-Sign Accuracies (CSA). On liveatc_test_set2 we have a huge improvement from 53.5 to 80.6. For malorca_vienna the absolute CSA improvement is smaller, nevertheless the gain from 84.4 to 88.1 removed 60.7% of the gap spanning from baseline to oracle CSA. We also see that Lattice boosting on its own already brings good improvements, and the best results are obtained with the combination of HCLG boosting and Lattice boosting.

ATC recording using SDR - deeper analysis - comparing HW setups

2021-07-08T09:29:00Z

Check out our previous blog posts:

We introduced hardware setups in one of our previous blog posts. We used 2 different antennas and 2 different SDR receivers. We share our results about comparison of various combinations of the HW. To recall we have following HW:

	Low performance/quality		Higher performance/quality (more expensive)
	Item	Price (EUR)	Item	Price (EUR)
Antenna	Sirio Md 118-137 incl. 5m cable	40	Watson WBA-20	60
SDR receiver	RTL-SDR	50	SDRPlay - RSP1A	130

Our experiment was done on LKTB (Brno airport), where we are located at a distance of about 14km from the airport (see this blog post for details). See the altitude profile on the image below.

We placed both antennas for the test at approximately the same height.

Tests

One of our interests was to find out the quality of recorded audio signals (as we want to be as close as possible to the speech observed in Cockpit / Tower) and compare the more expensive and cheaper recording setups. The comparison is made on the estimated SNR values (see previous blog post link to blog 5). It is worth to mention, that the RSP1A was also run in 8-bit mode (recording 10MHz bandwidth).

We recorded 3 days with both (more expensive RSP1A on Watson WBA-20 and cheaper RTL-SDR on Sirio MD) HW setups in parallel and then switched the antennas and recorded another 3 days (RSP1A on Sirio MD and RTL-SDR on Watson WBA-20). We conclude the experiments with the following results:

To briefly compare the lower quality (~200EUR) and the more expensive (~440EUR) HW setups, refer to the histograms below. The cheaper setup (RTL-SDR dongle with Sirio antenna) provides a SNR ~3.6dB on average while the expensive setup ~19.2dB on average. We also emphasized the amount of speech and signal in the histogram. The speech is filling about 70% of recorded audio signals.

Next two histograms compare SNRs of ‘fixed’ receivers while we switch the antennas. We see that Watson antenna provides higher SNR (6 to 10dB) compared to Sirio.

The next two histograms compare SNR with a ‘fixed’ antenna while we switch the receiver. Here we see the 4dB SNR superiority of RSP1A on the Sirio antenna and 10dB SNR superiority on the Watson antenna.

Our main conclusion is that a good antenna is important (i.e. it increases the SNR from 3.6dB to 9.2dB on average). If a good antenna is deployed, we can get even more gain in SNR from a better receiver (9.2dB to 19.2dB).

Let’s summarize mean SNRs in the following table:

mean SNR [dB]		antenna
mean SNR [dB]		Sirio MD (cheaper)	Watson (more expensive)
receiver	RTL-SDR dongle (cheaper)	3.58	9.22
receiver	SDRplay RSP1A (more expensive)	8.78	19.16

ATC recording using SDR - deeper analysis - raw signal processing and SNR estimation

2021-07-08T09:10:00Z

Check out our previous blog posts:

Blog 1: Basic terminology and hardware setup description for ATC listening
Blog 2: Where to place your antenna for ATC recordings
Blog 3: What is the best SDR hardware choice for ATC
Blog 4: How to setup your SDR for clean ATC audio

This blog post is more technical compared to the previous ones. In the next paragraphs we
will describe the raw signal processing pipeline. The rtl-airband software is set to produce
raw data coming from the SDR hardware in cs16 format.

Converting the raw signal into the audio format

Produced cs16 files are processed through:

cat ${signalfile}.cs16 | csdr convert_s16_f | csdr amdemod_cf | csdr fastdcblock_ff | csdr gain_ff 3 | csdr limit_ff | csdr convert_f_s16 > ${signalfile}.raw

which does:

conversion from int to float value,
AM demodulation
signal level adjustments
back to int conversion
saving as PCM

Next, we drop all segments shorter than 1 second as they do not contain any meaningful signal. You may have noticed we are not using automatic gain control (AGC). The reason is, that the AGC does a signal deformation (rapidly changing volume and thus amount of noise). As we have the whole recording and can process it off-line, we implemented a segment base gainer.

Segment base gainer

We detect push-to-talk clicks using wavelet transform and identify particular utterances in the audio. We amplify each segment not to exceed 95% of the maximum level of the wav file (1.0 in our case). The peak levels are ignored. See the figure below:

Original raw signal is on top, amplified is on the bottom.

Voice activity detection

We detect speech parts of the audio to be further used to reliably estimate the Signal-to-Noise Ratio. The Voice Activity Detector (VAD) is based on a neural network with 2 hidden layers and 2 output classes. It was trained on 1366 hours of multilingual telephone speech corpus. The neural network output is smoothed by averaging over a 5 frame window, and we can adjust the detection threshold to control the amount of detected speech. See the figure below with indicated speech in the recording (red parts).

Signal-to-Noise Ratio estimation

The SNR estimation technique is based on the waveform amplitude distribution analysis (Chanwoo Kim, Richard M. Stern, "Robust Signal-to-Noise Ratio Estimation Based on Waveform Amplitude Distribution Analysis", Interspeech 2008). In principle, the amplitude distribution of noise is Gaussian while the amplitude distribution of speech is Gamma. We can “guess the SNR by estimating where we are between Gaussian and Gamma distributions” for our signal.

To estimate the SNR reliably we select only speech segments and avoid all the non-speech parts. We apply the SNR estimation technique which provides an SNR estimate per each voiced segment.

How to setup your SDR for clean ATC audio

2021-06-07T15:31:00Z

Check out our previous blog posts:

Blog 1: Basic terminology and hardware setup description for ATC listening
Blog 2: Where to place your antenna for ATC recordings
Blog 3: What is the best SDR hardware choice for ATC

Welcome to the next blog post from our series of “How to set up an ATC voice recorder”. We will aim at software installation and SDR settings. We expect you to choose Linux as the OS.

Please follow the instructions at https://atco.opensky-network.org/ website. You should end up with a Linux distribution with installed SDR drivers and RTL-airband software running (https://github.com/szpajder/RTLSDR-Airband).

You need to do several steps to set up the SDR. First, you need to identify VHF frequencies you want to record and decide what is your central frequency and bandwidth. If some frequencies are too distant, you may use two SDR devices (we are also using this setup). Let see two examples.

LKTB airport

We checked available on-line resources and found main frequencies used in LKTB.

We are not interested in ATIS. You can notice that the distance between LKTB_TWR and LKTB_APP is 7.75MHz which is much larger than 2.5MHz supported by the RTL-SDR but smaller than 10.6MHz supported by the SDR RSP1A (see previous blog post regarding more technical information). So to fully cover the LKTB, we need a pair of RTL-SDRs or one RSP1A. We choose the second option. See our configuration in the following figure:

The green boxes indicate 25kHz bandwidth of one channel. We placed the central frequency in the middle. The “bandwidth” of the SDR - sampling frequency - was chosen wider than needed to overcome possible distortions on the edges.

Sample of rtl-airband config for SDRplay RSP1A device would look like the following:

country = "Czech Republic";
location= "49.25411,16.58154";
fft_size = 1024;
devices:
({
  type = "soapysdr"; #driver
  device_string="driver=sdrplay,serial=xxxxxxxxxxxxxx";
  gain = "IFGR=20,RFGR=2"; #Every type of device has different gain settings
  centerfreq = 123.500; #MHz
  correction = 0;
  mode = "multichannel";
  sample_rate = 9.00; #bandwidth in MHz around centerfreq
  channels:
  ({
    freq = 119.600;
    airport = "LKTB";
    label = "BRNO_Tower";
    outputs:
    ({
      type = "rawfile";
      directory = "/home/pi/output_airband";
      filename_template = "BRNO_Tower_119_600MHz";
      split_on_transmission = true;
    });
  });
});

LKPR airport

The LKPR airport has more channels. One of the Radars and the Tower are the problematic ones as they are away from the rest. We would need about 16MHz bandwidth to cover them all.

We analyzed the traffic on the channels and found out that the Radar on 127MHz is a “copy” of Radar on 120MHz. So we discarded it. Finally, our solution was to use SDRplay RSP1A and RTL-SDR (on two separated antennas). RSP1A covered the group of channels around 123MHz and RTL-SDR took care of the Tower on 134MHz. See the following figure:

We set the bandwidth of RSP1A to 5MHz which gave us 14-bit sampling precision (better audio quality). We limited the bandwidth of RTL-SDR and put the center frequency the same as the frequency of the Tower (134.55MHz).

However we found a problem with recording the Tower (note: our setup is very close to the airport so we have a strong signal). We had strong harmonic distortion in the audio signal. See the following spectrogram:

Notice the spectral line around 1.6kHz. RSP1A did not suffer from the problem. The problem is called ghosting (thanks to https://www.sdrplay.com/community/viewtopic.php?t=2968). It may happen that a strong source near you may leak into your recording (even if it is on a different frequency). We have tried to change the bandwidth and gain but it did not help. The solution was to change the central frequency.

Sample of rtl-airband config for RTL-SDR device.

country = "Czech Republic";
location= "50.10678,14.26600"
fft_size = 512;
devices:
({
  type = "rtlsdr"; #driver
  index = 0;
  gain = 15; #Every type of device has different gain settings
  serial = "00000001";
  centerfreq = 134.750; #MHz
  correction = 0;
  mode = "multichannel";
  sample_rate = 900100; #bandwidth in Hz around centerfreq
  channels:
  ({                                                                                                                                           freq = 134.550;
    airport = "LKPR";
    label = "PRAGUE_Tower";
    outputs:
    ({
      type = "rawfile";
      directory = "/home/pi/output_airband";
      filename_template = "PRAGUE_Tower_134_550MHz";
      split_on_transmission = true;
    });
  });
});

There are 2 more parameters that have an impact on the audio quality. The first one is gain and the second one is fft_size.

fft_size

FFT size is an internal parameter that impacts the signal processing. The larger the value (in power of 2) the slightly better the signal but the more CPU power is needed. Good practice is that for wider bandwidth the FFT size should be larger. Tune this parameter (128 / 256 / 512 / 1024) and watch the load and signal quality. If you set it too high, then the signal starts to be choppy.

Gain

To set up the gain(s) is critical. There may be more gain controllers for your device. RTL-SDR has 1 gain, SRDplay RSP1A has 2 gain controls. Please consult documentation, support, or community for your device to find out block diagrams, gain controllers and proper settings. You should set the gain as low as possible in general. Ideally you should tweak only the analog gain closest to the antenna. The rest can be switched off. If you set the gain too low, you will receive noisy audio signals as there is not enough energy and your signal will be coded only in a few lower bits by the ADC. On the other hand, if you set the gain too high, then clipping appears on the ADC and you get “noisy” recordings.

We tuned the gains carefully and did some more experiments which we will share with you in one of our next blog posts. To make the long story short:

We set the gain to a value and record about half a day of data.
We calculated the average SNR of this data and filled the SNR in a table.
We changed the gain settings and repeated until the table was filled.

IFGN\RFGN	0	1	2	3	4
20	6.75	10.33	12.29	11.76	9.70
25	10.39	11.385	11.55	11.83	8.97
30	11.20	11.36	11.39	10.71	5.47

Table of gain tuning of RSP1A connected to the Watson WBA-20 antenna for LKTB. Values are SNR [dB].

The RFGN (columns) is the main gain on the SDRplay RSP1A, where higher the value (0-9) smaller the gain. The IFGN is a “minor” gain controller which does not have much influence if the RFGN is tuned properly. You can see, that there is optimal point at RFGN = 2 and IFGN “switched off”.

We encourage you to do a similar thing. You do not need to calculate SNR, but collect some sufficient amount of audio and listen to it. You can try to do this on ATIS or Tower ATCs where you should have stable signals. Then try different gains and find the optimum.

This is all about setting up the SDR software. We hope this will help you to set up the recording easily with good results. We are aware that many things were simplified here. To go deeper in principles is out of the scope of these blog posts. If you are interested, please study more underpinning resources.

Basic terminology and hardware setup description for ATC listening

2021-05-26T15:41:00Z

Welcome to our ATCO2 project site. This is the first blogpost from a short series of “What SDR to buy, where to place it, how to setup it and connect to the OpenSky-Network platform .” We hope these posts help you in receiving clean audio signals from ATC VHF communication and feed the community. As most of us were noobs in SDR we had to learn a lot. And now, we are sharing what we learned to make your life easier. If you are an expert in this area skip this post. If you think there is something missing here, share your thoughts!

You decided to buy, set up and use an ATC receiver. Congratulations for your decision! Now let’s see what you particularly need to do. You must decide on where to place, what HW to buy, and how to set it up. WHAT is discussed in this and the next blog post while the WHERE and HOW answers are discussed in the following posts.

You should select a place with as clear visibility as possible to the airport tower (or a place where the transmitter antennas are). Use some on-line map and make an elevation profile between your position and the airport. There should not be any hills. The better your position is close to an approach route or waiting circuit. You will have a clean signal from the plains above you.

Now the general WHAT answers come. What you buy depends very much on your budget. We have tried two varians:

cheap (~200EUR) and
more expensive (~400EUR).

There are four components you need to take into account:

The antenna

You want an 50ohm antenna with the highest “gain” (Well, antenna is a passive thing, so there is not any gain technically. You want to minimize signal loss.). There are many types of antennas so please select the one you can mount easily where you want to. We have tried several types of antenna (J-pole, Discone, Dipole). One important parameter of the antenna is the frequency range (or tuned frequency). Here, you are interested only in Rx (receiving frequency) range. The range should be covering the airband (ATC frequencies) which are in range 108MHz to 137MHz (usually around 122MHz). It is good to have a narrow band antenna tuned just for these frequencies. The narrow band antenna may lower noise coming from other strong sources around you (AM/FM radio stations, TV stations, GSM, ...).

The cable

You need a 50ohm coaxial cable to connect your antenna with an SDR device. Every cable has a signal loss. You want a cable as short as possible (but keep some reserve). We used the LLC category one (low loss). The lower the loss, the higher the price. It is also good to check the technical specifications and find the loss (in dB per m) for the given frequency range (you are not interested in loss in 2GHz but just around 120MHz). The last important property of the coax cable are the connectors. Every connector introduces signal losses. Find a cable which has the right connectors for your antenna (N-Type Female connector on the cable side for example) and SDR device (usually SMA Male connector on the cable side). Adding any adapters increases the signal loss. Warning: the coaxial cable cannot be bended in a small radius (several centimeters / an inch) - the bend may introduce high losses. Check the smallest allowed radius in the cable technical specifications. Note: If you want to use more receivers on one antenna, you will need to buy an active splitter. We have tried it and it works. But we will not go into details here.

The receiver

You need an SDR (Software Defined Radio) receiver. There are other types of receivers but we ignore them for the sake of simplicity here. The SDR means that the receiver just digitalizes the analog signal from the antenna. The voice decoding is done by a software in a computer. The most expensive item in your bill is the SDR. The more expensive, the better quality (means coping better with low quality signals). The SDR usually has some analog circuits (gain controllers, filters, etc), an analog-to-digital converter (ADC), and some communication chips to talk to the computer (handling USB port for example). One of the most important parameters is the dynamic range of the SDR. The range is defined by the ADC. The problem is, that you will face strong and weak signals. If the dynamic range is small, then the strong signals may lead to clipping (signal distortion) while the weak ones are sunk in noise. Also the quality of the analog part is essential to overcome noise coming from your computer, power suppliers and other electronic devices at home. Minimum is 8-bit SDR but if you can afford 12, 14, 16 or more bit SDR it would be better. (some more reading about SDR sensitivity is here: SDR Receiver Performance Overview)

The computer

Here you want something small enough with low consumption, but powerful enough to decode all the channels you want to listen to and share with the community. The computer should also have the internet connection (WiFi, Ethernet, etc.). You can use an old notebook, your desktop or some sort of Raspberry Pi etc. Just take into account that the computer should be always on (if you want to be our data feeder). You connect the SDR to the computer (by USB in most cases) and then the computer to the Internet. We provide you with a description of how to install and configure all the software needed. There are several programs running on the computer. First, there is a radio demodulator. This program takes raw data (digitized signal) from the SDR and extracts the voice. Amplitude modulation is used in the VHF ATC. The program listens to selected frequencies (yes you can tune in and listen to voice communications in parallel), detects communication (when the pilot pushes a button and starts to talk), passes data through the demodulator, and stores the demodulated audio internally. Another program immediately post-processes these files and sends them to our servers. You can then log in to the OpenSky Network web and listen to your recordings.

That is all the compressed basic information about what you need to set up your own data feeder and to start to listen to ATC communication. We will go deeper in the next post. We will share what devices we tried and what results we got.

What is the best SDR hardware choice for ATC

2021-05-26T12:03:00Z

Check out our previous blog posts:

In this blog post, we will look closer to the hardware (HW) setups for ATC recording from the VHF channel. We did a general overview of the four most important components: Antenna, Coaxial cable, SDR receiver, and computer (and computing resource). We built and tested two HW setups. The first one costs about 200EUR as an “entry solution” and the second one for about 400 EUR as a better one. The table below describes both configurations/setups:

	Entry solution (more affordable)		More expensive
	Item	Price (EUR)	Item	Price (EUR)
Antenna	Sirio Md 118-137 incl. 5m cable	40	Watson WBA-20	60
Coax cable	-	0	LLC200A 20m	77
SDR receiver	RTL-SDR	50	SDRPlay - RSP1A	130
Raspberry Pi	RPi 3 - 1GB	40	RPi 4 - 8GB	92
RPi case	Metal case + active cooling	24	Argon One	28
micro SD	256GB	38	256GB	38
power source	USB 5V 2.5A	10	USB-C 5V 3A	10

SUM		202		435

Let us discuss the items now.

Antenna and coaxial cable

The antenna is crucial as it has a direct impact on SNR of radio communication. We decided to purchase two tuned dipole antennas for aviation frequencies (118MHz-137MHz). We picked Sirio MD 118-137

and Watson WBA-20.

The Sirio is equipped with 5m long coaxial cable. We also purchased 20 meters LLC200A coaxial cable for the Watson antenna to allow us an easy mounting of the antenna on a roof with minimum signal loss. On the other hand, the Sirio is good to mount on a balcony for example.

We did a set of experiments (see details in one of our next blog posts) to estimate the impact of different antennas on ATC voice quality. The voice quality was measured by SNR -- signal-to-noise ratio or rather speech-to-noise ratio. Our conclusion was that we were able to get up to +6dB - +10dB better SNR with the Watson antenna.

We also tested wideband double discone and narrow band J-pole antennas. This was done in other places and close to the airport so we do not have direct comparison of all four. A custom built J-pole style antenna tuned to 135 MHz was connected to 5 meters of RG-58 type coaxial cable.

The double discone is a wideband antenna tuned to receive 25-2000MHz. This antenna was connected into an active two way splitter using 2 meters of CFD240 type coaxial cable (up to 5GHz).

Both antennas worked well but we cannot make any deeper comparison. These antennas belong to one of our data feeders who allowed us to use them.

SDR receiver

We followed general recommendation and purchased a suggested “standard” for airband the RTL-SDR dongle

We also aimed to test technically better solutions but still for a reasonable budget. After a quick survey we decided to go for SDRplay RSP1A.

Both receivers have SMA female coaxial cable connector and USB. The difference in internal circuits (gain controllers, filters, ADC, etc.).

The main advantage of SDRplay RSP1A (over RTL-SDR) is, that it is up to 14bits (versus 8bits RTL) and has up to 10.6MHz recording bandwidth (versus 2.5MHz for RTL) - note: the 14bits precision is available up to 6MHz, 12bit up to 8MHz, 10bit up to 9.2Mhz and 8bit above 9.2MHz bandwidth. Both of these options are critical, because we are targeting to collect all available frequencies used by the given airport (Tower, Approach, Radar, Ground, Departure, ...) to monitor the whole flight communication. It happens often, that the frequencies are spread in the larger window than 2.5MHz (RTL dongle). Sometimes even the 10MHz bandwidth is not enough, thus two receivers would be required. Next, the 14bits bit depth may help to get a better SNR (signal to noise ratio), but it depends on the bandwidth used. Please see this post aiming at deeper channels vs. bandwidth analysis and suggestions.

Computer

We decided to use the Raspberry Pi mini computer that is used for running the SDR software and processing pipeline. Both Raspberry Pi are small and powerful enough. We bought a RPi 3B+ (with 1GB of RAM) as an ‘entry solution’ and a most powerful RPi 4 with 8GB RAM. We used 256GB microSD for system and data storage. To avoid overheating, we also used active cooling (heatsink and fan). You have several tens of combinations of RPi models and cases. You probably want to go with a good passive heatsink to minimize the noise coming from the fan. The RPi 4 in the Argon One case (url: https://www.amazon.com/Argon-Raspberry-Aluminum-Heatsink-Supports/dp/B07WP8WC3V) was an excellent solution. The Argon One case is easy to mount and has sufficient passive cooling for processing of 4 channels in parallel. The RPi 4 is able to process these 4 channels in parallel at 90% of load on 1 core. We noticed that the RPi 3B+ cannot handle 4 channels coming from RSP1A. So if you want to receive just 1 or 2 channels, the RPi 3B+ and RTL may be good enough. Otherwise we would suggest you go with the RPi4. One of our next posts will discuss the settings of the processing pipeline in the RPi and what can be tweaked.

We hope you got some better insight into the HW needed for receiving, processing and feeding the ATC communication. There are also other possibilities, so please do not hesitate and search for it. What we put here is our experience and what worked for us for a reasonable budget.

Where to place your antenna for ATC recordings

2021-05-26T12:03:00Z

Check out our previous blog posts:

Blog 3: What is the best SDR hardware choice for ATC
Blog 4: How to setup your SDR for clean ATC audio

Let’s take a look on where to mount your antenna and what to do to check if your place is good or not. The first step is to find out the elevation (or altitude) profile between your place and the airport. We expect that the transmitting antennas are on the airport tower or nearby. It is good to check where exactly the airports’ antennas are placed. There should not be hills and other obstacles in between. The best option is that you have direct visibility. Any obstacles will block the signal coming from the Tower. You will hear the pilots well, but not the ground. We share two cases: LKTB and LKPR airports.

LKTB

The LKTB is a small “international” airport.

We were lucky that one of our project partners lives in a house with a good enough position. The elevation profile does not look very bad.

The distance is about 14 kilometers and there is a hill in between which is not very high. There is no direct visibility. On the other hand, he lives “under” one of the approaches way November - Bravo. We mounted the Watson antenna on the roof (and also experimented with the Sirio antenna - see this post for details).

To make the long story short, it works. We got good results with the Watson antenna and the RSP1A. However the combination of Sirio antenna and RTL-SDR was not so successful. The amount of good quality speech was significantly lower compared to Watson + RSP1A. So if you are in similar conditions, think about a more expensive (and better) receiver and antenna.

LKPR

LKPR is the largest airport of Czech Republic.

Here we were more lucky. Our data feeder lives in the ideal position.

The signal there is strong so all combinations of the devices worked well. The only problem is, that part of one runway is below the horizon so we receive low quality signals from airplanes in that position.

Notes: If you install the antenna on your roof, be sure you place it as high as possible from your metal roof. Also be sure it is well grounded. The last thing you want is to let a lightning bolt hit your antenna. If there is a thunderstorm near you, you should unplug the coaxial cable from your SDR receiver (and ideally throw the end outside). A close lightning can induct high voltage and the free cable connector may be dangerous. Please search the internet for a proper solution. Here is a nice solution for ADS-B antennas.

End to End Callsign Recognition System

2021-05-12T13:37:00Z

In the last blog, we introduced a way to improve the word error rate (WER) on callsigns of the automatic speech recognition (ASR) output by incorporating surveillance information in the transcription process. In this blog post, we want to talk about extracting the callsigns from the ASR output. The process of callsign recognition can be broken down in two stages:

1) Tagging the callsign in the sequence

2) Mapping of the callsign word sequence into its ICAO format (ICAO stands for International Civil Aviation Organization)

Figure 1 illustrates the two-stage process. In the tagging step, the input transcript, originating from our ASR system is tagged with the IOB format (short for inside, outside, beginning), to find the tokens that are part of a callsign. In the second step, the part of the ASR transcript, that is tagged as callsign (labeled with B/I-CALL) is mapped to the standard ICAO format for callsigns, which consists of a 3 character airline identifier followed by the flight ID which consists out of several digits followed optionally by 1-2 characters (In case of interest, a list of airline identifiers can be found here:https://en.wikipedia.org/wiki/List_of_airline_codes).

Since we have two processes, the idea on hand is to train two different networks for the task, one that specializes in tagging and one that takes care of mapping the sequence tagged as callsign into the ICAO format. In this case, both processes can be tuned individually. The drawback of this architecture is, that information, that is lost in the first step, cannot be recovered in the second step. The other possibility is to train an End-to-End network, that outputs directly the ICAO callsign given the ASR transcripts as input. This architecture has the benefit, that there is no information loss in between. Both architectures are visualized in Figure 2. In our experiments showed that the End-To-End approach performs better than the 2 network solution in the majority of test cases.

A closer look at Figure 1 reveals that the predicted ICAO callsign does contain information that is missing in the labels and in the transcript, namely the last two digits of the flight id. This information comes from the surveillance information. Callsigns from planes near the location where the ATC Communication is recorded are time matched with the recordings and fed as additional input into the network as seen in Figure 3. In case the transcripts only contains the partial information of a callsign, the missing information can be recovered from the surveillance input. The End-To-End network shows a callsign accuracy rate over 90% on clean transcripts, if surveillance information is available. On our ASR output with a WER of 28.7, an accuracy over 80% is reached. The network also shows an increased resistance towards higher ASR WERs. The accuracy scores for two different datasets can be read up in our Interspeech paper submission: “Boosting of contextual information in ASR for air-traffic call-sign recognition”.

BUT partner ranked high for their work in the field of automatic speech recognition

2021-05-12T13:31:00Z

ATCO2 project is proud to have in our consortium one of the best research center oriented in the field of automatic speech recognition:

According to the the latest ranking of the AI 2000 Most Influential Scholars, Faculty of Information Technology, BUT is among the world leaders in this field. BUT is among the five most important world institutions in this field - next to Google, Facebook, IBM and Carnegie Mellon University. FIT researchers Lukáš Burget, Jan Černocký and Pavel Matějka are also on the list of TOP100 world's most influential researchers. Brno University of Technology is the only institution from the Czech Republic in this ranking. BUT researchers succeeded together with Tomáš Mikolov, a FIT graduate. AMiner indexes authors, publications and data from the field of computer science. The Faculty of Information Technology, BUT, and the research group BUT Speech@FIT are among the leaders in the field of speech data mining for a long time.

This year, Brno will host InterSpeech 2021, the world's largest conference in this field.

Improving callsign recognition by incorporating information from the radar

2021-03-29T12:00:00Z

Fortunately we have additional information that can help us with recognizing call signs. The radar, which every Air Traffic Tower has, tells us what planes are in the vicinity, and as an ATC could only be talking to one of these planes we therefore know that if a call sign was said it must be one of those that is on the radar. We developed two methods to use this information so as to improve callsign recognition.

The first modified the speech recognition system directly to boost the probability of recognizing the callsigns that we knew from the radar. Thanks to advances in efficient transducer composition[1] these modifications can be done so as to allow continuous updating of the model as new information from the radar comes in. This means that the model is tuned in real-time so it can adapt to changes in the real world. We published this technique in a paper at ICASSP 2021[2].

The second method post-processes the output of the speech recognition system to boost the probability of recognizing specific callsigns. This is simpler as it just involves rescoring the system output. Rescoring means giving certain outputs more weight, and thereby increasing the chance that the higher weighted terms are output as the model predictions. The rescoring was implemented as composition[3].

Both methods worked well, increasing call sign accuracy by up to 30\%. We believe there is still further room for improvement and plan on working further on this topic.

[1] Filters for Efficient Composition of Weighted Finite-State Transducers

[2] A comparison of methods for oov-word recognition on a new public dataset

[3] Weighted finite-state transducers in speech recognition

Processing speech recordings: some data protection issues by Romagna Tech

2021-02-08T14:10:39Z

Biometrics refers to technologies that measure and analyse a person’s physical characteristics, making it possible to identify it through its biometric features and can also be used for authentication purposes.

From a data protection perspective, biometric technologies in general are closely linked to specific physical, physiological, behavioural or even psychological characteristics of a person, and some of them might also reveal sensitive data.

As to the voice, biometrics may concern the analysis of the tone, pitch, cadence and frequency of a person’s voice, which can make it possible to determine if a certain person is who he/she declares to be, or the identity of an unknown person, if matched with data from other databases.

Biometric data may also allow for automated tracking, tracing or profiling of persons and, as such, their potential impact on the privacy and the right to data protection of individuals is high, as also observed by the EU data protection authorities.

Moreover, biometric data are irrevocable: a breach concerning biometric data threatens the further safe use of biometrics as identifier and the right to data protection of the concerned persons for which there is no possibility to mitigate the effects of the breach.

One can change its passwords if forgotten or compromised, or its home keys if lost, but not its voice.

Voice biometric authentication systems are based on measurements of the biological characteristics of the individual and comparisons with other individuals previously checked and recorded in a database by a mechanism called enrollment.

Every spoken word (of a predefined speech used as sample) is converted, by a chain of mathematical operations, into a person’s voice print (also called ‘iVector’ in the R&D community), which is stored in the database. This shall be further interrogated to determine if a speaker is the person it claims to be, by comparing the stored voice print with the speaker’s, or even to determine which speaker, in a group of known speakers, most closely matches the unknown speaker (and in this case it is more appropriate to refer to identification systems, instead of authentication systems).

According to the General Data Protection Regulation (article 9), biometric data may be regarded as a ‘special category’ of data (commonly said: sensitive data).

However, in order for it to be considered as processing of special categories of personal data (Article 9) it requires that biometric data is processed “for the purpose of uniquely identifying a natural person”.

In short, in the light of articles 4.14 and 9, three criteria must be considered:

Nature of data: data relating to physical, physiological or behavioural characteristics of a natural person,
Means and way of processing: data “resulting from a specific technical processing”,
Purpose of processing: data must be used for the purpose of uniquely identifying a natural person.

Sensitive data may only be processed if specific conditions are met, for example:

the data subject has given its express consent, which should be freely given, specific, informed and unambiguous;
processing is carried out in the course of its legitimate activities with appropriate safeguards by a foundation, association or any other not-for-profit body with a political, philosophical, religious or trade union aim and on condition that the processing relates solely to the members or to former members of the body or to persons who have regular contact with it in connection with its purposes and that the personal data are not disclosed outside that body without the consent of the data subjects;
processing relates to personal data which are manifestly made public by the data subject;
processing is necessary for reasons of substantial public interest, on the basis of Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject;
processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) based on Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject.

Being an EU Regulation, the GDPR is directly applicable in all EU Member States, but we should remember that in some cases it leaves States free to adopt specific rules, as in the case of the special categories of data.

Member States may actually maintain or introduce further conditions, including limitations, with regard to the processing of genetic, biometric or health data.

Attention should thus be paid to State-specific rules and regulations.

(Romagna Tech, Claudia Cevenini)

References

Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)

Setting Up VHF receiver for air-traffic communication

2020-11-30T11:02:00Z

Obviously one needs to have the voice recordings in order to be able to do the conversion and this will be the focus of this blog post - we’ll take a look at how to set a VHF (very high frequency) receiver. Luckily it is really simple and a person interested in eavesdropping to pilot-controller dialogue does not have to be an expert in radio equipment.

So, what is needed to get started?

A Raspberry Pi (RPI)
USB power supply for the RPI
LAN cable (for initial set-up)
MicroSD card
One or more RTLSDR USB sticks
An antenna

These things will make set-up easier:

USB extension cables for the SDR USB sticks
Monitor and keyboard if you want to set-up the RPI without LAN cable

Setting up Raspberry Pi

The first step to get started is to set up the Raspberry PI, a versatile single-board computer that can be used for developing pretty much anything.

More about the Raspberry Pi project can be found here: https://www.raspberrypi.org/

As the Raspberry Pi project has produced really easy to follow instructions on their homepage, we are not going to give exhaustive instructions here. Instead we’ll guide people to here: https://projects.raspberrypi.org/en/projects/raspberry-pi-setting-up

Installing the ATC software

Alright, hopefully by now you have a Raspberry Pi ready to be used. Next, software to enable radio signal reception needs to be installed. The following is based on the instructions given here: https://atco.opensky-network.org/

The software is based on RTLSDR-Airband. It’s beautifully crafted open source software that can be found here: https://github.com/szpajder/RTLSDR-Airband/

It pretty much has three main parts in it:

Set up Raspberry Pi. As we described it earlier, you do not do anything here
Install RTL-SDR Airband software and all required utilities. We have made an effort to automate this step so all you need to do is to execute a few commands on your Raspberry Pi terminal window.
Modify the RTL-SDR Airband config file. We have created some functions that will help you along the way.

(I live in Tallinn, Estonia so the following examples are based on that.And although you can have more than one receiver dongle attached to your Raspberry Pi, the following example assume only one)

a. Hit “Add New” button and hit to confirm your choices

(Note the comments explaining some of the choices you need to make. You can have more than 1 one receiver dongle per your Raspberry Pi)

b. Specify parameters related to the receiver location

(It will be used for proposing frequencies you could listen to by searching airports that are close to your location. Pick one method to localize yourself.)

c. Choose an airport of whose communication you would like to listen to

(Normally you would not one want to pick one that is further than 10km or so. But in some cases it would be of interest. For example, when you're directly underneath the descend route and would like to eavesdrop pilots’ talk. In this case you’ll probably have a bad reception of the controller's voice).

d. Choose the frequencies you would like to record.

(You basically can choose to listen to more than one frequency. BUT if the difference between the frequencies is greater than the one you specified in step “a)” then an error message will be given. It looks something like that: “Bandwidth of device 0 - My New VHF Receiver exhausted! Used bandwidth 7.300000000000011 - available 2.4”

If you see something like that, then just reselect frequencies that you want to follow and consider using multiple SDR dongles)

e. Place the config file to the right place

After you download the configuration file, place it to the right folder and restart the device, the receiver will start to record the communication taking place on the frequency. By default The audio files will be created to "/home/pi/output_airband"

You can make whichever modifications to the configuration file. The instructions can be found here: https://github.com/szpajder/RTLSDR-Airband/wiki

f. Hear the skies with your new VHF receiver

In this instance, you probably have set up both, hardware and software. Now, it is time to start hearing what’s in the sky. Firstly, you need to open a terminal and type “rtl_airband -ef”. After some time, you’ll see in the terminal, the frequencies that your receiver is “hearing” and when you see “*” that means that the recording system has been activated and an output file is going to be created.

After some time, you could check the following folder → ~./output_airband/ to see the segmented files in “cs.16” format, accompanied by a “cs16.info” file, which shows some key information.

And that’s it. I hope you enjoy following what’s going on in the skies above you. And don’t forget to contact any of the project members should you have any comments.

blog

ATCO2 at interspeech 2021

“Detecting English Speech in the Air Traffic Control Voice Communication” by @ReplayWell

“Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition” by @Brno University of Technology

‘Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems’ by @Idiap Research Institute

Satellite Workshop – Automatic Speech Recognition in Air Traffic Management (ASR-ATM)

The workshop is organized as a hybrid event: For In-Person attendance the event takes place at Faculty of Information Technology, Brno University of Technology at Božetěchova 1/2, 612 00 Brno (in room G108). If you want to join in-person please contact: petr dot motlicek at idiap dot ch

Special session on Interspeech 2021 conference

Goal of the special session:

Main objective:

Organizers:

Contextual adaptation for improving call sign recognition

HCLG boosting

Lattice boosting

Results

ATC recording using SDR - deeper analysis - comparing HW setups

Tests

ATC recording using SDR - deeper analysis - raw signal processing and SNR estimation

Converting the raw signal into the audio format

Segment base gainer

Voice activity detection

Signal-to-Noise Ratio estimation

How to setup your SDR for clean ATC audio

LKTB airport

LKPR airport

fft_size

Gain

Basic terminology and hardware setup description for ATC listening

The antenna

The cable

The receiver

The computer

What is the best SDR hardware choice for ATC

Antenna and coaxial cable

SDR receiver

Computer

Where to place your antenna for ATC recordings

LKTB

LKPR

End to End Callsign Recognition System

BUT partner ranked high for their work in the field of automatic speech recognition

Improving callsign recognition by incorporating information from the radar

Processing speech recordings: some data protection issues by Romagna Tech

Setting Up VHF receiver for air-traffic communication

Setting up Raspberry Pi

Installing the ATC software

The workshop is organized as a hybrid event:

For In-Person attendance the event takes place at Faculty of Information Technology, Brno University of Technology at Božetěchova 1/2, 612 00 Brno (in room G108). If you want to join in-person please contact: petr dot motlicek at idiap dot ch