Blog

Testing Speech AI - Richard Seidl

Written by Richard Seidl | Jul 10, 2023 10:00:00 PM

Olaf is an expert in audio AIs. There are a lot of challenges here, especially in German-speaking countries: many different dialects and the necessary - but often missing - training data. He takes us into the world of voice AIs, how we can train and test them and what tasks he actually sets ChatGPT.

“Unfortunately, we have the problem with audio that the data volumes are much larger. That’s why there aren’t yet any cool models like OpenAI ChatGPT or something like that” - Olaf Thiele

Olaf digitizes spoken German. He trains models for the transcription and synthesis of German audio material with the help of artificial intelligence. As these models are being used more and more, testing and quality assurance of these models is also becoming increasingly important.

Highlights of this episode:

  • Olaf has been working with artificial intelligence for many years, especially in the field of audio AIs
  • Testing audio AIs presents a particular challenge, as it involves a combination of two demanding areas
  • Olaf and his team have specialized in German, as there are fewer large amounts of data in this language
  • The development of AI models has changed significantly thanks to technologies such as deep learning
  • It is difficult to test AI models as they work like a black box and can deliver different results on different computer architectures
  • Tools like Chat-GPT can help generate textual test data to better test the models
  • There is still a lot of room for improvement and standardization in the field of testing AI models
  • Olaf sees Hugging Face as a potential player for the further development and standardization of AI model testing

Testing audio AI - why it’s particularly difficult

This podcast episode is about audio AI testing, the challenges and opportunities, and how we can make progress with the latest technologies.

Challenges in testing audio AI

Olaf emphasized that testing audio AI is not only technically challenging, but also requires an enormous amount of creative thinking. He explained the difficulties associated with the limited availability of data in German and the varying dialects. The challenge is to collect enough diversified data to train a model effectively. In addition, he reported on the problems of reproducibility with different hardware configurations and the difficulties in ensuring model generalization.

The use of chat GPT to generate test data

Olaf talked about Chat-GPT as a tool for generating test data. This could be a revolutionary method to expand the range of possible test scenarios. Through such tools, testers can have a variety of data produced without the need for manual intervention. This could be particularly useful for language models where variation in pronunciation or dialect is difficult to simulate manually.

The future of testing in the AI era

Olaf expressed hope for progressive developments in tools and methodologies, particularly through platforms such as Hugging Face. These could introduce standardized procedures for training and testing AI models. Such a development would not only simplify testing, but also lead to more meaningful comparisons between different models.

The ethical dimension of the use of AI

In addition to the technical aspects, we also talked about the ethical considerations when using AI technologies. The need for careful consideration of what models should and should not learn was emphasized. The European AI Act, which provides for potential regulations for the use of AI systems, was also discussed.

A world full of possibilities

The conversation ended with an optimistic outlook on the opportunities opened up by advanced AI technologies. Despite the many challenges, both Olaf and I are convinced that through innovative approaches and continuous research, the potential of audio AI can be fully realized.