Cómo usar VideoToolbox para descomprimir el flujo de vídeo H. 264

Question

Cómo usar VideoToolbox para descomprimir el flujo de vídeo H. 264

Tuve muchos problemas para averiguar cómo usar el marco de vídeo acelerado por hardware de Apple para descomprimir un flujo de vídeo H. 264. Después de unas semanas me di cuenta y quería compartir un ejemplo extenso ya que no pude encontrar uno.

Mi objetivo es dar un ejemplo completo e instructivo de la Caja de herramientas de video presentada en WWDC '14 session 513. Mi código no se compilará ni se ejecutará, ya que necesita integrarse con una transmisión H. 264 elemental (como un video leído desde un archivo o streamed from online etc) y necesita ser ajustado dependiendo del caso específico.

Debo mencionar que tengo muy poca experiencia con video en/decodificación excepto lo que aprendí mientras googleaba el tema. No conozco todos los detalles sobre formatos de video, estructura de parámetros, etc. así que solo he incluido lo que creo que necesitas saber.

Estoy usando XCode 6.2 y he implementado en dispositivos iOS que ejecutan iOS 8.1 y 8.2.

55

objective-c ios8 h.264 video-toolbox

Author: Bobjt, 2015-04-08

Source

5 answers

Si no puede encontrar los códigos de error VTD en el marco, decidí incluirlos aquí. (De nuevo, todos estos errores y más se pueden encontrar dentro del propio VideoToolbox.framework en el navegador del proyecto, en el archivo VTErrors.h.)

Obtendrá uno de estos códigos de error en la devolución de llamada del marco de decodificación de VTD o cuando cree su sesión de VTD si hizo algo incorrectamente.

kVTPropertyNotSupportedErr              = -12900,
kVTPropertyReadOnlyErr                  = -12901,
kVTParameterErr                         = -12902,
kVTInvalidSessionErr                    = -12903,
kVTAllocationFailedErr                  = -12904,
kVTPixelTransferNotSupportedErr         = -12905, // c.f. -8961
kVTCouldNotFindVideoDecoderErr          = -12906,
kVTCouldNotCreateInstanceErr            = -12907,
kVTCouldNotFindVideoEncoderErr          = -12908,
kVTVideoDecoderBadDataErr               = -12909, // c.f. -8969
kVTVideoDecoderUnsupportedDataFormatErr = -12910, // c.f. -8970
kVTVideoDecoderMalfunctionErr           = -12911, // c.f. -8960
kVTVideoEncoderMalfunctionErr           = -12912,
kVTVideoDecoderNotAvailableNowErr       = -12913,
kVTImageRotationNotSupportedErr         = -12914,
kVTVideoEncoderNotAvailableNowErr       = -12915,
kVTFormatDescriptionChangeNotSupportedErr   = -12916,
kVTInsufficientSourceColorDataErr       = -12917,
kVTCouldNotCreateColorCorrectionDataErr = -12918,
kVTColorSyncTransformConvertFailedErr   = -12919,
kVTVideoDecoderAuthorizationErr         = -12210,
kVTVideoEncoderAuthorizationErr         = -12211,
kVTColorCorrectionPixelTransferFailedErr    = -12212,
kVTMultiPassStorageIdentifierMismatchErr    = -12213,
kVTMultiPassStorageInvalidErr           = -12214,
kVTFrameSiloInvalidTimeStampErr         = -12215,
kVTFrameSiloInvalidTimeRangeErr         = -12216,
kVTCouldNotFindTemporalFilterErr        = -12217,
kVTPixelTransferNotPermittedErr         = -12218,

14

Author: Olivia Stork,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2018-09-11 14:16:12

Un buen ejemplo rápido de mucho de esto se puede encontrar en la biblioteca de Avios de Josh Baker: https://github.com/tidwall/Avios

Tenga en cuenta que actualmente Avios espera que el usuario maneje los datos de chunking en los códigos de inicio NAL, pero maneja la decodificación de los datos a partir de ese momento.

También vale la pena echar un vistazo a la biblioteca RTMP basada en Swift HaishinKit (anteriormente "LF"), que tiene su propia implementación de decodificación, incluido el análisis NALU más robusto: https://github.com/shogo4405/lf.swift

9

Author: leppert,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2017-03-31 15:36:31

Además de VTErrors anteriores, pensé que vale la pena agregar errores CMFormatDescription, CMBlockBuffer, CMSampleBuffer que puede encontrar al probar el ejemplo de Livy.

kCMFormatDescriptionError_InvalidParameter  = -12710,
kCMFormatDescriptionError_AllocationFailed  = -12711,
kCMFormatDescriptionError_ValueNotAvailable = -12718,

kCMBlockBufferNoErr                             = 0,
kCMBlockBufferStructureAllocationFailedErr      = -12700,
kCMBlockBufferBlockAllocationFailedErr          = -12701,
kCMBlockBufferBadCustomBlockSourceErr           = -12702,
kCMBlockBufferBadOffsetParameterErr             = -12703,
kCMBlockBufferBadLengthParameterErr             = -12704,
kCMBlockBufferBadPointerParameterErr            = -12705,
kCMBlockBufferEmptyBBufErr                      = -12706,
kCMBlockBufferUnallocatedBlockErr               = -12707,
kCMBlockBufferInsufficientSpaceErr              = -12708,

kCMSampleBufferError_AllocationFailed             = -12730,
kCMSampleBufferError_RequiredParameterMissing     = -12731,
kCMSampleBufferError_AlreadyHasDataBuffer         = -12732,
kCMSampleBufferError_BufferNotReady               = -12733,
kCMSampleBufferError_SampleIndexOutOfRange        = -12734,
kCMSampleBufferError_BufferHasNoSampleSizes       = -12735,
kCMSampleBufferError_BufferHasNoSampleTimingInfo  = -12736,
kCMSampleBufferError_ArrayTooSmall                = -12737,
kCMSampleBufferError_InvalidEntryCount            = -12738,
kCMSampleBufferError_CannotSubdivide              = -12739,
kCMSampleBufferError_SampleTimingInfoInvalid      = -12740,
kCMSampleBufferError_InvalidMediaTypeForOperation = -12741,
kCMSampleBufferError_InvalidSampleData            = -12742,
kCMSampleBufferError_InvalidMediaFormat           = -12743,
kCMSampleBufferError_Invalidated                  = -12744,
kCMSampleBufferError_DataFailed                   = -16750,
kCMSampleBufferError_DataCanceled                 = -16751,

4

Author: Jetdog,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2015-06-08 17:49:17

@Livy para eliminar las fugas de memoria antes de CMVideoFormatDescriptionCreateFromH264ParameterSets debe agregar lo siguiente:

if (_formatDesc) {
    CFRelease(_formatDesc);
    _formatDesc = NULL;
}

1

Author: Kris Dude,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2018-01-30 03:07:23

score 150 · Accepted Answer

Conceptos:

NALUs: NALUs son simplemente un trozo de datos de longitud variable que tiene un encabezado de código de inicio NALU 0x00 00 00 01 YY donde los primeros 5 bits de YY le indican qué tipo de NALU es este y, por lo tanto, qué tipo de datos sigue al encabezado. (Puesto que usted necesita solamente los primeros 5 pedacitos, utilizo YY & 0x1F para conseguir apenas los pedacitos relevantes.) Enumero todos estos tipos en el método NSString * const naluTypesStrings[], pero no necesita saber cuáles son todos.

Parámetros: Su el decodificador necesita parámetros para saber cómo se almacenan los datos de vídeo H. 264. Los 2 que necesita establecer son Sequence Parameter Set (SPS)y Picture Parameter Set (PPS) y cada uno tiene su propio número de tipo NALU. No necesita saber qué significan los parámetros, el decodificador sabe qué hacer con ellos.

Formato de flujo H. 264: En la mayoría de flujos H. 264, recibirá un conjunto inicial de parámetros PPS y SPS seguido de un marco i (también conocido como marco IDR o marco enrasado) NALU. Luego recibirá varios NALUs de fotogramas P (tal vez unas pocas docenas), luego otro conjunto de parámetros (que pueden ser los mismos que los parámetros iniciales) y un fotograma i, más fotogramas P, etc. los marcos i son mucho más grandes que los marcos P. Conceptualmente puedes pensar en el fotograma i como una imagen completa del video, y los fotogramas P son solo los cambios que se han hecho a ese fotograma i, hasta que recibas el siguiente fotograma i marco.

Procedimiento:

Genere NALUs individuales a partir de su flujo H. 264. No puedo mostrar el código para este paso, ya que depende mucho de la fuente de video que esté utilizando. Hice este gráfico para mostrar con qué estaba trabajando ("datos "en el gráfico es" marco " en mi siguiente código), pero su caso puede y probablemente diferirá. Mi método receivedRawVideoFrame: se llama cada vez que recibo un marco (uint8_t *frame) que era uno de 2 tipos. En el diagrama, los 2 tipos de marco son las 2 grandes cajas púrpuras.
Cree un CMVideoFormatDescriptionRef desde su SPS y PPS NALUs con CMVideoFormatDescriptionCreateFromH264ParameterSets () . No puede mostrar ningún marco sin hacer esto primero. El SPS y el PPS pueden parecer un revoltijo de números, pero VTD sabe qué hacer con ellos. Todo lo que necesita saber es que CMVideoFormatDescriptionRef es una descripción de los datos de video., como ancho / alto, tipo de formato (kCMPixelFormat_32BGRA, kCMVideoCodecType_H264 etc.), relación de aspecto, espacio de color sucesivamente. Su decodificador mantendrá los parámetros hasta que llegue un nuevo conjunto (a veces los parámetros se reenvían regularmente incluso cuando no han cambiado).
Vuelva a empaquetar su NALUs de marco IDR y no IDR de acuerdo con el formato "AVCC". Esto significa eliminar los códigos de inicio de NALU y reemplazarlos con una cabecera de 4 bytes que indica la longitud del NALU. No es necesario hacer esto para el SPS y PPS NALUs. (Tenga en cuenta que el encabezado de longitud NALU de 4 bytes está en big-endian, por lo que si tiene un valor UInt32 debe ser intercambiado por bytes antes de copiar a CMBlockBuffer usando CFSwapInt32. Hago esto en mi código con la llamada a la función htonl.)
Empaquete los marcos NALU IDR y no IDR en CMBlockBuffer. No haga esto con el parámetro SPS PPS NALUs. Todo lo que necesita saber sobre CMBlockBuffers es que son un método para envolver bloques arbitrarios de datos en medios de base. (Cualquier dato de vídeo comprimido en una canalización de vídeo está envuelto en este.)
Empaquete el CMBlockBuffer en CMSampleBuffer. Todo lo que necesitas saber sobre CMSampleBuffers es que envuelven nuestro CMBlockBuffers con otra información (aquí sería el CMVideoFormatDescription y CMTime, si se usa CMTime).
Cree un VTDecompressionSessionRef y alimente los búferes de ejemplo en VTDecompressionSessionDecodeFrame( ). Alternativamente, puede usar AVSampleBufferDisplayLayer y su método enqueueSampleBuffer: y no necesitará usar VTDecompSession. Es más sencillo configurar, pero no arrojará errores si algo sale mal como lo hará VTD.
En la devolución de llamada VTDecompSession, utilice el resultado CVImageBufferRef para mostrar el fotograma de vídeo. Si necesita convertir su CVImageBuffer a un UIImage, vea mi respuesta de StackOverflow aquí.

Otras notas:

Las transmisiones H. 264 pueden variar mucho. Por lo que aprendí, Los encabezados de código de inicio de NALU a veces son de 3 bytes (0x00 00 01) y a veces 4 (0x00 00 00 01). Mi código funciona para 4 bytes; necesitará cambiar algunas cosas si está trabajando con 3.
Si quieres saber más sobre NALUs, he encontrado esta respuesta para ser muy útil. En mi caso, descubrí que no necesitaba ignorar los bytes de "prevención de emulación" como se describe, así que personalmente me salté ese paso, pero es posible que necesite saber sobre eso.
Si su VTDecompressionSession produce un error number (like -12909) busca el código de error en tu proyecto XCode. Encuentra el framework VideoToolbox en tu navegador de proyectos, ábrelo y encuentra el encabezado VTErrors.h. Si no puedes encontrarlo, también he incluido todos los códigos de error a continuación en otra respuesta.

Ejemplo de código:

Así que comencemos declarando algunas variables globales e incluyendo el marco VT (VT = Video Toolbox).

#import <VideoToolbox/VideoToolbox.h>

@property (nonatomic, assign) CMVideoFormatDescriptionRef formatDesc;
@property (nonatomic, assign) VTDecompressionSessionRef decompressionSession;
@property (nonatomic, retain) AVSampleBufferDisplayLayer *videoLayer;
@property (nonatomic, assign) int spsSize;
@property (nonatomic, assign) int ppsSize;

La siguiente matriz solo se utiliza para que pueda imprimir qué tipo de marco NALU está recibiendo. Si sabes lo que significan todos estos tipos, bien por ti, sabes más sobre H. 264 que yo:) Mi código solo maneja los tipos 1, 5, 7 y 8.

NSString * const naluTypesStrings[] =
{
    @"0: Unspecified (non-VCL)",
    @"1: Coded slice of a non-IDR picture (VCL)",    // P frame
    @"2: Coded slice data partition A (VCL)",
    @"3: Coded slice data partition B (VCL)",
    @"4: Coded slice data partition C (VCL)",
    @"5: Coded slice of an IDR picture (VCL)",      // I frame
    @"6: Supplemental enhancement information (SEI) (non-VCL)",
    @"7: Sequence parameter set (non-VCL)",         // SPS parameter
    @"8: Picture parameter set (non-VCL)",          // PPS parameter
    @"9: Access unit delimiter (non-VCL)",
    @"10: End of sequence (non-VCL)",
    @"11: End of stream (non-VCL)",
    @"12: Filler data (non-VCL)",
    @"13: Sequence parameter set extension (non-VCL)",
    @"14: Prefix NAL unit (non-VCL)",
    @"15: Subset sequence parameter set (non-VCL)",
    @"16: Reserved (non-VCL)",
    @"17: Reserved (non-VCL)",
    @"18: Reserved (non-VCL)",
    @"19: Coded slice of an auxiliary coded picture without partitioning (non-VCL)",
    @"20: Coded slice extension (non-VCL)",
    @"21: Coded slice extension for depth view components (non-VCL)",
    @"22: Reserved (non-VCL)",
    @"23: Reserved (non-VCL)",
    @"24: STAP-A Single-time aggregation packet (non-VCL)",
    @"25: STAP-B Single-time aggregation packet (non-VCL)",
    @"26: MTAP16 Multi-time aggregation packet (non-VCL)",
    @"27: MTAP24 Multi-time aggregation packet (non-VCL)",
    @"28: FU-A Fragmentation unit (non-VCL)",
    @"29: FU-B Fragmentation unit (non-VCL)",
    @"30: Unspecified (non-VCL)",
    @"31: Unspecified (non-VCL)",
};

Aquí es donde sucede toda la magia.

-(void) receivedRawVideoFrame:(uint8_t *)frame withSize:(uint32_t)frameSize isIFrame:(int)isIFrame
{
    OSStatus status;

    uint8_t *data = NULL;
    uint8_t *pps = NULL;
    uint8_t *sps = NULL;

    // I know what my H.264 data source's NALUs look like so I know start code index is always 0.
    // if you don't know where it starts, you can use a for loop similar to how i find the 2nd and 3rd start codes
    int startCodeIndex = 0;
    int secondStartCodeIndex = 0;
    int thirdStartCodeIndex = 0;

    long blockLength = 0;

    CMSampleBufferRef sampleBuffer = NULL;
    CMBlockBufferRef blockBuffer = NULL;

    int nalu_type = (frame[startCodeIndex + 4] & 0x1F);
    NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]);

    // if we havent already set up our format description with our SPS PPS parameters, we
    // can't process any frames except type 7 that has our parameters
    if (nalu_type != 7 && _formatDesc == NULL)
    {
        NSLog(@"Video error: Frame is not an I Frame and format description is null");
        return;
    }

    // NALU type 7 is the SPS parameter NALU
    if (nalu_type == 7)
    {
        // find where the second PPS start code begins, (the 0x00 00 00 01 code)
        // from which we also get the length of the first SPS code
        for (int i = startCodeIndex + 4; i < startCodeIndex + 40; i++)
        {
            if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01)
            {
                secondStartCodeIndex = i;
                _spsSize = secondStartCodeIndex;   // includes the header in the size
                break;
            }
        }

        // find what the second NALU type is
        nalu_type = (frame[secondStartCodeIndex + 4] & 0x1F);
        NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]);
    }

    // type 8 is the PPS parameter NALU
    if(nalu_type == 8)
    {
        // find where the NALU after this one starts so we know how long the PPS parameter is
        for (int i = _spsSize + 4; i < _spsSize + 30; i++)
        {
            if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01)
            {
                thirdStartCodeIndex = i;
                _ppsSize = thirdStartCodeIndex - _spsSize;
                break;
            }
        }

        // allocate enough data to fit the SPS and PPS parameters into our data objects.
        // VTD doesn't want you to include the start code header (4 bytes long) so we add the - 4 here
        sps = malloc(_spsSize - 4);
        pps = malloc(_ppsSize - 4);

        // copy in the actual sps and pps values, again ignoring the 4 byte header
        memcpy (sps, &frame[4], _spsSize-4);
        memcpy (pps, &frame[_spsSize+4], _ppsSize-4);

        // now we set our H264 parameters
        uint8_t*  parameterSetPointers[2] = {sps, pps};
        size_t parameterSetSizes[2] = {_spsSize-4, _ppsSize-4};

        // suggestion from @Kris Dude's answer below
        if (_formatDesc) 
        {
            CFRelease(_formatDesc);
            _formatDesc = NULL;
        }

        status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault, 2, 
                                                (const uint8_t *const*)parameterSetPointers, 
                                                parameterSetSizes, 4, 
                                                &_formatDesc);

        NSLog(@"\t\t Creation of CMVideoFormatDescription: %@", (status == noErr) ? @"successful!" : @"failed...");
        if(status != noErr) NSLog(@"\t\t Format Description ERROR type: %d", (int)status);

        // See if decomp session can convert from previous format description 
        // to the new one, if not we need to remake the decomp session.
        // This snippet was not necessary for my applications but it could be for yours
        /*BOOL needNewDecompSession = (VTDecompressionSessionCanAcceptFormatDescription(_decompressionSession, _formatDesc) == NO);
         if(needNewDecompSession)
         {
             [self createDecompSession];
         }*/

        // now lets handle the IDR frame that (should) come after the parameter sets
        // I say "should" because that's how I expect my H264 stream to work, YMMV
        nalu_type = (frame[thirdStartCodeIndex + 4] & 0x1F);
        NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]);
    }

    // create our VTDecompressionSession.  This isnt neccessary if you choose to use AVSampleBufferDisplayLayer
    if((status == noErr) && (_decompressionSession == NULL))
    {
        [self createDecompSession];
    }

    // type 5 is an IDR frame NALU.  The SPS and PPS NALUs should always be followed by an IDR (or IFrame) NALU, as far as I know
    if(nalu_type == 5)
    {
        // find the offset, or where the SPS and PPS NALUs end and the IDR frame NALU begins
        int offset = _spsSize + _ppsSize;
        blockLength = frameSize - offset;
        data = malloc(blockLength);
        data = memcpy(data, &frame[offset], blockLength);

        // replace the start code header on this NALU with its size.
        // AVCC format requires that you do this.  
        // htonl converts the unsigned int from host to network byte order
        uint32_t dataLength32 = htonl (blockLength - 4);
        memcpy (data, &dataLength32, sizeof (uint32_t));

        // create a block buffer from the IDR NALU
        status = CMBlockBufferCreateWithMemoryBlock(NULL, data,  // memoryBlock to hold buffered data
                                                    blockLength,  // block length of the mem block in bytes.
                                                    kCFAllocatorNull, NULL,
                                                    0, // offsetToData
                                                    blockLength,   // dataLength of relevant bytes, starting at offsetToData
                                                    0, &blockBuffer);

        NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed...");
    }

    // NALU type 1 is non-IDR (or PFrame) picture
    if (nalu_type == 1)
    {
        // non-IDR frames do not have an offset due to SPS and PSS, so the approach
        // is similar to the IDR frames just without the offset
        blockLength = frameSize;
        data = malloc(blockLength);
        data = memcpy(data, &frame[0], blockLength);

        // again, replace the start header with the size of the NALU
        uint32_t dataLength32 = htonl (blockLength - 4);
        memcpy (data, &dataLength32, sizeof (uint32_t));

        status = CMBlockBufferCreateWithMemoryBlock(NULL, data,  // memoryBlock to hold data. If NULL, block will be alloc when needed
                                                    blockLength,  // overall length of the mem block in bytes
                                                    kCFAllocatorNull, NULL,
                                                    0,     // offsetToData
                                                    blockLength,  // dataLength of relevant data bytes, starting at offsetToData
                                                    0, &blockBuffer);

        NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed...");
    }

    // now create our sample buffer from the block buffer,
    if(status == noErr)
    {
        // here I'm not bothering with any timing specifics since in my case we displayed all frames immediately
        const size_t sampleSize = blockLength;
        status = CMSampleBufferCreate(kCFAllocatorDefault,
                                      blockBuffer, true, NULL, NULL,
                                      _formatDesc, 1, 0, NULL, 1,
                                      &sampleSize, &sampleBuffer);

        NSLog(@"\t\t SampleBufferCreate: \t %@", (status == noErr) ? @"successful!" : @"failed...");
    }

    if(status == noErr)
    {
        // set some values of the sample buffer's attachments
        CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, YES);
        CFMutableDictionaryRef dict = (CFMutableDictionaryRef)CFArrayGetValueAtIndex(attachments, 0);
        CFDictionarySetValue(dict, kCMSampleAttachmentKey_DisplayImmediately, kCFBooleanTrue);

        // either send the samplebuffer to a VTDecompressionSession or to an AVSampleBufferDisplayLayer
        [self render:sampleBuffer];
    }

    // free memory to avoid a memory leak, do the same for sps, pps and blockbuffer
    if (NULL != data)
    {
        free (data);
        data = NULL;
    }
}

El siguiente método crea su sesión VTD. Vuelva a crearlo cada vez que reciba nuevos parámetros. (No tienes que recrearlo cada vez que recibas parámetros, bastante seguro.)

Si desea establecer atributos para el destino CVPixelBuffer, lea en CoreVideo PixelBufferAttributes valores y póngalos en NSDictionary *destinationImageBufferAttributes.

-(void) createDecompSession
{
    // make sure to destroy the old VTD session
    _decompressionSession = NULL;
    VTDecompressionOutputCallbackRecord callBackRecord;
    callBackRecord.decompressionOutputCallback = decompressionSessionDecodeFrameCallback;

    // this is necessary if you need to make calls to Objective C "self" from within in the callback method.
    callBackRecord.decompressionOutputRefCon = (__bridge void *)self;

    // you can set some desired attributes for the destination pixel buffer.  I didn't use this but you may
    // if you need to set some attributes, be sure to uncomment the dictionary in VTDecompressionSessionCreate
    NSDictionary *destinationImageBufferAttributes = [NSDictionary dictionaryWithObjectsAndKeys:
                                                      [NSNumber numberWithBool:YES],
                                                      (id)kCVPixelBufferOpenGLESCompatibilityKey,
                                                      nil];

    OSStatus status =  VTDecompressionSessionCreate(NULL, _formatDesc, NULL,
                                                    NULL, // (__bridge CFDictionaryRef)(destinationImageBufferAttributes)
                                                    &callBackRecord, &_decompressionSession);
    NSLog(@"Video Decompression Session Create: \t %@", (status == noErr) ? @"successful!" : @"failed...");
    if(status != noErr) NSLog(@"\t\t VTD ERROR type: %d", (int)status);
}

Ahora este método se llama cada vez que VTD se realiza la descompresión de cualquier marco que le envió. Este método se llama incluso si hay un error o si el marco se cae.

void decompressionSessionDecodeFrameCallback(void *decompressionOutputRefCon,
                                             void *sourceFrameRefCon,
                                             OSStatus status,
                                             VTDecodeInfoFlags infoFlags,
                                             CVImageBufferRef imageBuffer,
                                             CMTime presentationTimeStamp,
                                             CMTime presentationDuration)
{
    THISCLASSNAME *streamManager = (__bridge THISCLASSNAME *)decompressionOutputRefCon;

    if (status != noErr)
    {
        NSError *error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil];
        NSLog(@"Decompressed error: %@", error);
    }
    else
    {
        NSLog(@"Decompressed sucessfully");

        // do something with your resulting CVImageBufferRef that is your decompressed frame
        [streamManager displayDecodedFrame:imageBuffer];
    }
}

Aquí es donde realmente enviamos el sampleBuffer al VTD para ser decodificado.

- (void) render:(CMSampleBufferRef)sampleBuffer
{
    VTDecodeFrameFlags flags = kVTDecodeFrame_EnableAsynchronousDecompression;
    VTDecodeInfoFlags flagOut;
    NSDate* currentTime = [NSDate date];
    VTDecompressionSessionDecodeFrame(_decompressionSession, sampleBuffer, flags,
                                      (void*)CFBridgingRetain(currentTime), &flagOut);

    CFRelease(sampleBuffer);

    // if you're using AVSampleBufferDisplayLayer, you only need to use this line of code
    // [videoLayer enqueueSampleBuffer:sampleBuffer];
}

Si está utilizando AVSampleBufferDisplayLayer, asegúrese de iniciar la capa de esta manera, en viewDidLoad o dentro de algún otro método init.

-(void) viewDidLoad
{
    // create our AVSampleBufferDisplayLayer and add it to the view
    videoLayer = [[AVSampleBufferDisplayLayer alloc] init];
    videoLayer.frame = self.view.frame;
    videoLayer.bounds = self.view.bounds;
    videoLayer.videoGravity = AVLayerVideoGravityResizeAspect;

    // set Timebase, you may need this if you need to display frames at specific times
    // I didn't need it so I haven't verified that the timebase is working
    CMTimebaseRef controlTimebase;
    CMTimebaseCreateWithMasterClock(CFAllocatorGetDefault(), CMClockGetHostTimeClock(), &controlTimebase);

    //videoLayer.controlTimebase = controlTimebase;
    CMTimebaseSetTime(self.videoLayer.controlTimebase, kCMTimeZero);
    CMTimebaseSetRate(self.videoLayer.controlTimebase, 1.0);

    [[self.view layer] addSublayer:videoLayer];
}