A Review on Machine Learning for Audio Applications

Nagesh B;  ; Dr. M. Uttara Kumari;

doi:10.51201/jusst/21/06508

A Review on Machine Learning for Audio Applications

Mapping Intimacies ◽

10.51201/jusst/21/06508 ◽

2021 ◽

Vol 23 (07) ◽

pp. 62-70

Author(s):

Nagesh B ◽

◽

Dr. M. Uttara Kumari ◽

Keyword(s):

Machine Learning ◽

Signal Processing ◽

Language Processing ◽

Speech Processing ◽

Noise Suppression ◽

Audio Signal ◽

Audio Signals ◽

Audio Processing ◽

Speech Generation

Audio processing is an important branch under the signal processing domain. It deals with the manipulation of the audio signals to achieve a task like filtering, data compression, speech processing, noise suppression, etc. which improves the quality of the audio signal. For applications such as natural language processing, speech generation, automatic speech recognition, the conventional algorithms aren’t sufficient. There is a need for machine learning or deep learning algorithms which can be implemented so that the audio signal processing can be achieved with good results and accuracy. In this paper, a review of the various algorithms used by researchers in the past has been described and gives the appropriate algorithm that can be used for the respective applications.

Download Full-text

What's my App?

ACM SIGMETRICS Performance Evaluation Review ◽

10.1145/3466826.3466841 ◽

2021 ◽

Vol 48 (4) ◽

pp. 41-44

Author(s):

Dena Markudova ◽

Martino Trevisan ◽

Paolo Garza ◽

Michela Meo ◽

Maurizio M. Munafo ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Real Time ◽

Language Processing ◽

Traffic Management ◽

Quality Of Experience ◽

Broadband Internet ◽

Management Policies ◽

Control Traffic

With the spread of broadband Internet, Real-Time Communication (RTC) platforms have become increasingly popular and have transformed the way people communicate. Thus, it is fundamental that the network adopts traffic management policies that ensure appropriate Quality of Experience to users of RTC applications. A key step for this is the identification of the applications behind RTC traffic, which in turn allows to allocate adequate resources and make decisions based on the specific application's requirements. In this paper, we introduce a machine learning-based system for identifying the traffic of RTC applications. It builds on the domains contacted before starting a call and leverages techniques from Natural Language Processing (NLP) to build meaningful features. Our system works in real-time and is robust to the peculiarities of the RTP implementations of different applications, since it uses only control traffic. Experimental results show that our approach classifies 5 well-known meeting applications with an F1 score of 0.89.

Download Full-text

Machine Learning for Dissimulating Reality

Proceedings ◽

10.3390/proceedings2021077017 ◽

2021 ◽

Vol 77 (1) ◽

pp. 17

Author(s):

Andrea Giussani

Keyword(s):

Machine Learning ◽

Language Processing ◽

Huge Amount ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Technological Advances ◽

Textual Data ◽

Musical Scores ◽

Mathematical Formulas

In the last decade, advances in statistical modeling and computer science have boosted the production of machine-produced contents in different fields: from language to image generation, the quality of the generated outputs is remarkably high, sometimes better than those produced by a human being. Modern technological advances such as OpenAI’s GPT-2 (and recently GPT-3) permit automated systems to dramatically alter reality with synthetic outputs so that humans are not able to distinguish the real copy from its counteracts. An example is given by an article entirely written by GPT-2, but many other examples exist. In the field of computer vision, Nvidia’s Generative Adversarial Network, commonly known as StyleGAN (Karras et al. 2018), has become the de facto reference point for the production of a huge amount of fake human face portraits; additionally, recent algorithms were developed to create both musical scores and mathematical formulas. This presentation aims to stimulate participants on the state-of-the-art results in this field: we will cover both GANs and language modeling with recent applications. The novelty here is that we apply a transformer-based machine learning technique, namely RoBerta (Liu et al. 2019), to the detection of human-produced versus machine-produced text concerning fake news detection. RoBerta is a recent algorithm that is based on the well-known Bidirectional Encoder Representations from Transformers algorithm, known as BERT (Devlin et al. 2018); this is a bi-directional transformer used for natural language processing developed by Google and pre-trained over a huge amount of unlabeled textual data to learn embeddings. We will then use these representations as an input of our classifier to detect real vs. machine-produced text. The application is demonstrated in the presentation.

Download Full-text

ANALYSIS AND IMPROVISATION IN EXTRACTING AUDIO SIGNAL AMPLITUDE USING LABVIEW

EPRA International Journal of Multidisciplinary Research (IJMR) ◽

10.36713/epra3965 ◽

2020 ◽

pp. 273-278

Author(s):

Adarsh V Srinivasan ◽

Mr. N. Saritakumar

Keyword(s):

Virtual Instrument ◽

Sampling Rate ◽

Audio Signal ◽

Signal Amplitude ◽

Audio Signals ◽

Virtual Instrumentation ◽

Audio Processing ◽

Reading And Writing ◽

Labview Software ◽

Frequency Sampling

In this paper, either a pre-recorded audio or a newly recorded audio is processed and analysed using the LabVIEW Software by National Instruments. All the data such as bitrate, number of channels, frequency, sampling rate of the Audio are analyzed and improvising the signal by a few operations like Amplification, De-Amplification, Inversion and Interlacing of Audio Signals are done. In LabVIEW, there are a few Sub Virtual Instrument’s available for Reading and Writing Audio in .wav formats and using them and array Sub Virtual Instrument, all the processing are done. KEYWORDS: Virtual Instrumentation (VI), LabVIEW (LV), Audio, Processing, audio array.

Download Full-text

Data Hiding for Stereo Audio Signals

Advances in Multimedia and Interactive Technologies - Multimedia Information Hiding Technologies and Methodologies for Controlling Data ◽

10.4018/978-1-4666-2217-3.ch006 ◽

2013 ◽

pp. 104-128

Author(s):

Kazuhiro Kondo

Keyword(s):

Data Hiding ◽

Audio Signal ◽

Audio Coding ◽

Original Signal ◽

Audio Signals ◽

Audio Quality ◽

Input Source ◽

Rate Conversion ◽

Fixed Delay

This chapter proposes two data-hiding algorithms for stereo audio signals. The first algorithm embeds data into a stereo audio signal by adding data-dependent mutual delays to the host stereo audio signal. The second algorithm adds fixed delay echoes with polarities that are data dependent and amplitudes that are adjusted such that the interchannel correlation matches the original signal. The robustness and the quality of the data-embedded audio will be given and compared for both algorithms. Both algorithms were shown to be fairly robust against common distortions, such as added noise, audio coding, and sample rate conversion. The embedded audio quality was shown to be “fair” to “good” for the first algorithm and “good” to “excellent” for the second algorithm, depending on the input source.

Download Full-text

Spectral-Based Analysis and Synthesis of Audio Signals

Advances in Audio and Speech Signal Processing ◽

10.4018/978-1-59904-132-2.ch003 ◽

2011 ◽

pp. 56-92 ◽

Cited By ~ 1

Author(s):

Paulo A.A. Esquef ◽

Luiz W.P. Biscainho

Keyword(s):

Signal Processing ◽

Audio Signal ◽

Parametric Representation ◽

Sound Generation ◽

Audio Signal Processing ◽

Audio Signals ◽

Sinusoidal Modeling ◽

Analysis And Synthesis ◽

Processing Techniques ◽

Signal Processing Techniques

This chapter reviews audio signal processing techniques related to sound generation via additive synthesis. Particular focus will be put on sinusoidal modeling. Each processing stage involved in obtaining a sinusoidal representation for audio signals is described. Then, synthesis techniques that allow reconstructing an audio signal based on a given parametric representation are presented. Finally, some audio applications where sinusoidal modeling is employed are briefly discussed.

Download Full-text

Speech and Audio Signal Applications

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch459 ◽

2005 ◽

pp. 2592-2596

Author(s):

Hector Perez-Meana ◽

Mariko Nakano-Miyatake

Keyword(s):

Signal Processing ◽

Audio Signal ◽

Audio Signals ◽

Vlsi Technology ◽

Transmission Enhancement

With the development of the VLSI technology the performance of signal processing devices have greatly improved making possible the implementation of more efficient systems to storage, transmission enhancement and reproduction of speech and audio signals. Some of these successful applications are shown in Table 1.

Download Full-text

Automatic Identification of Information Quality Metrics in Health News Stories

Frontiers in Public Health ◽

10.3389/fpubh.2020.515347 ◽

2020 ◽

Vol 8 ◽

Author(s):

Majed Al-Jefri ◽

Roger Evans ◽

Joon Lee ◽

Pietro Ghezzi

Keyword(s):

Machine Learning ◽

Health Care ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Information Quality ◽

Evaluation Process ◽

Health News ◽

News Stories

Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-related news stories using natural language processing and machine learning.Materials and Methods: We used a database from the website HealthNewsReview.org that aims to improve the public dialogue about health care. HealthNewsReview.org developed a set of criteria to critically analyze health care interventions' claims. In this work, we attempt to automate the evaluation process by identifying the indicators of those criteria using natural language processing-based machine learning on a corpus of more than 1,300 news stories. We explored features ranging from simple n-grams to more advanced linguistic features and optimized the feature selection for each task. Additionally, we experimented with the use of pre-trained natural language model BERT.Results: For some criteria, such as mention of costs, benefits, harms, and “disease-mongering,” the evaluation results were promising with an F1 measure reaching 81.94%, while for others the results were less satisfactory due to the dataset size, the need of external knowledge, or the subjectivity in the evaluation process.Conclusion: These used criteria are more challenging than those addressed by previous work, and our aim was to investigate how much more difficult the machine learning task was, and how and why it varied between criteria. For some criteria, the obtained results were promising; however, automated evaluation of the other criteria may not yet replace the manual evaluation process where human experts interpret text senses and make use of external knowledge in their assessment.

Download Full-text

Deep learning for brain disorders: from data processing to disease treatment

Briefings in Bioinformatics ◽

10.1093/bib/bbaa310 ◽

2020 ◽

Author(s):

Ninon Burgos ◽

Simona Bottani ◽

Johann Faouzi ◽

Elina Thibeau-Sutre ◽

Olivier Colliot

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Language Processing ◽

State Of The Art ◽

Imaging Genetics ◽

Environmental Data ◽

Brain Disorders ◽

Disease Treatment ◽

Clinical Routine

Abstract In order to reach precision medicine and improve patients’ quality of life, machine learning is increasingly used in medicine. Brain disorders are often complex and heterogeneous, and several modalities such as demographic, clinical, imaging, genetics and environmental data have been studied to improve their understanding. Deep learning, a subpart of machine learning, provides complex algorithms that can learn from such various data. It has become state of the art in numerous fields, including computer vision and natural language processing, and is also growingly applied in medicine. In this article, we review the use of deep learning for brain disorders. More specifically, we identify the main applications, the concerned disorders and the types of architectures and data used. Finally, we provide guidelines to bridge the gap between research studies and clinical routine.

Download Full-text

An Effective Watermarking Method Based on Energy Averaging in Audio Signals

Mathematical Problems in Engineering ◽

10.1155/2018/6420314 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

S. E. Tsai ◽

S. M. Yang

Keyword(s):

Signal Processing ◽

Data Compression ◽

Error Correcting Code ◽

Audio Signal ◽

Segment Length ◽

Signal Quality ◽

Audio Signals ◽

Segment Sequence ◽

Audio Quality ◽

Dct Coefficients

Methods based on discrete cosine transform (DCT) have been proposed for digital watermarking of audio signals; however, the watermark is often vulnerable to data compression and signal processing. This paper presents an effective audio watermarking method by energy averaging of DCT coefficients such that an audio signal with watermark is robust to data processing. The method is to divide an audio signal into segments by three parameters defining the segment length, the segment sequence of watermark location, and the frequency range of DCT coefficients for watermark location. An error correcting code is also integrated to improve audio signal quality after watermarking. Experimental results show that the method is robust to data compression and many other kinds of signal processing. No original signal is required for decoding the watermark. Comparison of watermarking performance with a recent work validates that the watermarking method has better audio quality and higher robustness.

Download Full-text

Selected Papers from InTech'04

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2006.p0243 ◽

2006 ◽

Vol 10 (3) ◽

pp. 243-244

Author(s):

Richard Aló ◽

◽

Vladik Kreinovich ◽

Keyword(s):

Machine Learning ◽

Language Processing ◽

Adaptive Systems ◽

Vision System ◽

Weighted Average ◽

Machine Learning Algorithms ◽

Equivalent Transformation ◽

General Description ◽

International Conference

The main objective of the annual International Conference on Intelligent Technologies (InTech) is to bring together researchers and practitioners who implement intelligent and fuzzy technologies in real-world environment. The Fifth International Conference on Intelligent Technologies InTech'04 was held in Houston, Texas, on December 2-4, 2004. Topics of InTech'04 included mathematical foundations of intelligent technologies, traditional Artificial Intelligent techniques, uncertainty processing and methods of soft computing, learning/adaptive systems/data mining, and applications of intelligent technologies. This special issue contains versions of 15 selected papers originally presented at InTech'04. These papers cover most of the topics of the conference. Several papers describe new applications of the existing intelligent techniques. R. Aló{o} et al. show how traditional statistical hypotheses testing techniques – originally designed for processing measurement results – need to be modified when applied to simulated data – e.g., when we compare the quality of two algorithms. Y. Frayman et al. use mathematical morphology and genetic algorithms in the design of a machine vision system for detecting surface defects in aluminum die casting. Y. Murai et al. propose a new faster entropy-based placement algorithm for VLSI circuit design and similar applications. A. P. Salvatore et al. show how expert system-type techniques can help in scheduling botox treatment for voice disorders. H. Tsuji et al. propose a new method, based on partial differential equations, for automatically identifying and extracting objects from a video. N. Ward uses Ordered Weighted Average (OWA) techniques to design a model that predicts admission of computer science students into different graduate schools. An important aspect of intelligence is ability to learn. In A. Mahaweerawat et al., neural-based machine learning is used to identify and predict software faults. J. Han et al. show that we can drastically improve the quality of machine learning if, in addition to discovering traditional (positive) rules, we also search for negative rules. A serious problem with many neural-based machine learning algorithms is that often, the results of their learning are un-intelligible rules and numbers. M. I. Khan et al. show, on the example of robotic arm applications, that if we allow neurons with different input-output dependencies – including linear neurons – then we can extract meaningful knowledge from the resulting network. Several papers analyze the Equivalent Transformation (ET) model, that allows the user to automatically generate code from specifications. A general description of this model is given by K. Akama et al. P. Chippimolchai et al. describe how, within this model, we can transform a user's query into an equivalent more efficient one. H. Koike et al. apply this approach to natural language processing. Y. Shigeta et al. show how the existing constraint techniques can be translated into equivalent transformation rules and thus, combined with other specifications. I. Takarajima et al. extend the ET approach to situations like parallel computations, where the order in which different computations are performed on different processors depends on other processes and is, thus, non-deterministic. Finally, a paper by J. Chandra – based on his invited talk at InTech'04 – describes a general framework for robust and resilient critical infrastructure systems, with potential applications to transportation systems, power grids, communication networks, water resources, health delivery systems, and financial networks. We want to thank all the authors for their outstanding work, the participants of InTech'04 for their helpful suggestions, the anonymous reviewers for their thorough analysis and constructive help, and – last but not the least – to Professor Kaoru Hirota for his kind suggestion to host this issue and to the entire staff of the journal for their tireless work.

Download Full-text