IEC 62503 pdf download – Multimedia quality – Method of assessment of synchronization of audio and video

IEC 62503 pdf download – Multimedia quality – Method of assessment of synchronization of audio and video

IEC 62503 pdf download – Multimedia quality – Method of assessment of synchronization of audio and video
This International Standard provides a subjective (or perceptible) and statistical method ofassessment of overall, or end-to-end, difference of delays between real world and reproducedscenes in terms of video and accompanying audio recoded in a medium.
This International Standard does not specify limiting values for those results obtained by theapplication of the provisions in this standard. it excludes applications to professionalbroadcast systems.
2Normative reference
The following referenced document is indispensable for the application of this document. Fordated references,only the edition cited applies. For undated references,the latest edition ofthe referenced document (including any amendments) applies.
ITU-R BT.500-11:2002,Methodology for the subjective assessment of the quality of televisionpictures
3Terms and definitions
For the purposes of this document,the following terms and definitions apply.
lip sync
video delay against accompanying audio
subjective opinion score outside mts , where m is a sample mean of the original scores of aset of subjects for the same video delay and s is a standard deviation of the scores
ordinary untrained human audience of audio and video reproduction; random sample ofindividual members of general public
test video clip
short duration of video frames with accompanying audio to be used as original
test video sequence
random series of test video clips where the audio channels are shifted in time compared tothe original
4 Overview of methods of assessment
Figure 1 depicts overview of possible objective methods of measurement and subjective method of assessment to acquire necessary parameters corresponding to lip sync.
The leftmost in Figure 1 is a real world and the rightmost in Figure 1 is a reproduced world.
Lip sync at the section O-0′, i.e.scene delay in reference to accompanying sound, is normallyzero; in other words null video delay against accompanying audio is expected;A =0 . Wherero0 is foreseen, it shall also be taken into account.
Lip sync at the section 1-1′ is supposed to be introduced by separate acquisition of physicalphenomena by microphones and video cameras followed by yet further separate digitalprocessing for audio and video data. It will cause lip sync of At西0.
NOTE In case of MPEG-2 encoding ,there is the scheme of synchronization using Decoding Time Stamp (DTS) aswell as Presentation Time Stamp(PTS) embedded in the header of Packetized Elementary Stream (PES).See1SO/IEC 13818-1 [11].
Lip sync at the section 2-2′ is supposed to be introduced by reproduction process for audioand video channels separately such as decompression,rendering and reproduction. lt willcause lip sync of at, = o, which can be measured using a reference test multimedia materialwith As = o.
Lip sync at the section 3-3′ is in the reproduced multimedia world and assessed by humansubjects.Subjective opinion scores on lip sync are statistically analyzed to find estimatedvalue for Atf, = 0 ; corresponding to the amount of compensation for just-synchronizedreproduction.
5subjective assessment of lip sync5.1 ltems to be assessed
subjective grading level of miss-synchronization of video and audio.
5.2Preparation of test video clips and test video sequence5.2.1Selection of content of a test video clip
Since lip sync is a kind of human perception, it may depend on the contents of the video andaccompanying audio. Especially when it is related to movement of lips of a human speaker, amatch between a spoken language and a mother tongue may affect the result.
NOTE In this International Standard, in order to provide worked examples, speech in Japanese language utteredby a well rained processional news reader is watched and listened to by the subjects with the same mother tongue.A bust shot of a news reader shall be extracted,duration of which should be around 10 s to20 s. Data of audio channel of the video clip shall be taken as the timing reference.
Possible amount of time caused by miss-synchronization in this original video clip,Atf at thesection 1-1’, is unknown. However,this international standard provides the method toestimate overall lip sync Atincluding At,and At, . Namely,At, = At。+At, +At2.
5.2.2Creation of a test video sequence
The test video sequence shall be a randomised series of the video clip selected in 5.2.1, inwhich each of the audio channels shall be replaced by time-shifted audio data with necessaryduration of padding as a leader or a trailer depending on the direction of the time shift.Preparation of such video clips is show in Figure 2 as in the image frames with delayed audioand with led audio.The amount of time shifts T, and T, is subject to be adjusted.