Paper: | MMSP-P3.11 |
Session: | Multimedia Database, Content Retrieval, Joint Processing and Standards |
Time: | Wednesday, May 17, 16:30 - 18:30 |
Presentation: |
Poster
|
Topic: |
Multimedia Signal Processing: Joint audio, image, video, graphic signal processing |
Title: |
ARBITRARY LISTENING-POINT GENERATION USING SUB-BAND REPRESENTATION OF SOUND WAVE RAY-SPACE |
Authors: |
Mehrdad Panahpour Tehrani, Yasushi Hirano, Toshiaki Fujii, Shoji Kajita, Kazuya Takeda, Kenji Mase, Nagoya University, Japan |
Abstract: |
Sound can be recorded, computed and replayed by directed speakers, using the well-known sound processing methods, efficiently. Several approaches tried to generate arbitrary listening-point of sound; however there are few effective model such as Head Related Transfer Function (HRTF) and representation of the sound sources in 3D space to have an efficient processing. Meanwhile, images are rendered by computer graphics algorithms and have become more attractive and more efficient and image synthesis hardware has come to existence. The free viewpoint systems should have a correct correspondence of sound and images in an arbitrary viewpoint. Therefore, a representation method of sound sources in 3D space using computer graphics and image processing techniques is necessary. This research addresses the problem of free listening-point generation of sound without sound source localization and proposes a theory based on the ray-space representation of light rays which is independent of object’s specifications, and sub-band signal processing of sound waves. An array of beam-formed dynamic microphone-arrays (MAs), are set and each MA generates a multi frequency layer sound-image (SImage) by scanning the viewing range of a camera in the same location. Each layer is captured by the active microphones in the MA for a given frequency range which makes the correct interval among the active microphones to capture that frequency layer. Each layer of an SImage has the same size of an image and contains of blocks of sound wave with duration of one image-frame. Captured multi frequency layered SImages with the array of MAs generate a sound wave ray-space for each layer. To make a dense SImage ray-space, we propose to use the geometry compensation of corresponding images in the location of each MA or SImages or their combination, for each frequency layer. By a dense sound ray-space, any virtual SImage, which corresponds to an arbitrary listening-point, can be generated. The sound of each layer of an SIamge is generated by averaging the sound wave in each pixel or group of pixel. The listening-point is generated after combining each layer sound in frequency domain and inverse transformation from frequency to time domain. Experimental results using SImages for arbitrary listening-point generation is shown in the following for a layer of an SImage. The experimental setup has 3 MAs on a line. Each array has 3 microphones with 183mm distance. Distance between each MA is 183mm. The sound source has 2.3m distance from the center of MAs line. Three SImages (6x1) are captured and the middle SImage is compared to the original one, and SNR of 22.86dB is obtained. The generated sound by virtual SImage has SNR equal to 10.36dB in comparison with the original sound in that location. The proposed theory can solve the problem of 3D media integration. The proposed methods are currently being developed on a practical system. In our future research, we will work on sampling theory for capturing SImages, and will perform an efficient integration of 3D audio/video. |