Here are all the actual test exam dumps for IT exams. Most people prepare for the actual exams with our test dumps to pass their exams. So it's critical to choose and actual test pdf to succeed.

Exam NCA-GENM Topic 4 Question 267 Discussion

Actual exam question for NVIDIA's NCA-GENM exam
Question #: 267
Topic #: 4
You are tasked with building a multimodal A1 system that can generate video descriptions from video footage. You have experimented with several architectures, including combining CNNs for visual feature extraction and LSTMs for sequence generation. However, you are facing challenges with the model capturing long-range dependencies in the video. Which of the following architectural modifications or training techniques is MOST likely to address this issue?

Suggested Answer: C Vote an answer

Transformers are known for their ability to capture long-range dependencies due to their self-attention mechanism. Replacing LSTMs with Transformers allows the model to attend to relevant parts of the video sequence regardless of their temporal distance. While CNNs can extract visual features, they don't inherently address long-range dependencies. RNNs are prone to vanishing gradients, making it difficult to learn long- range dependencies. Reducing the frame rate or batch size doesn't directly address the issue of capturing long-range dependencies within the video sequence.

by difegui1 at Dec 30, 2025, 03:52 PM

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Nick name: Submit Cancel
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.