Here are all the actual test exam dumps for IT exams. Most people prepare for the actual exams with our test dumps to pass their exams. So it's critical to choose and actual test pdf to succeed.

Exam NCA-GENM Topic 1 Question 162 Discussion

Actual exam question for NVIDIA's NCA-GENM exam
Question #: 162
Topic #: 1
You are tasked with optimizing a Generative A1 model that processes both image and text dat a. The current model uses a simple concatenation of image features (extracted from a ResNet-50) and text embeddings (from BERT) as input to a transformer. You observe that the model struggles to generate coherent descriptions for complex images. Which of the following optimization strategies would be MOST effective in improving the model's understanding of the multimodal input?

Suggested Answer: B Vote an answer

Cross-attention allows the model to learn which parts of the image are most relevant to each word in the text, enabling a more nuanced understanding of the relationship between the two modalities. Concatenation treats all features equally, which is less effective. Increasing transformer size or ResNet architecture might help but doesn't address the core issue of multimodal interaction.

by Norton at Jan 02, 2026, 08:21 AM

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Nick name: Submit Cancel
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.