• DillyDaily@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 month ago

    Unfortunately the models are have trained on biased data.

    I’ve run some of my own photos through various “lens” style description generators as an experiment and knowing the full context of the image makes the generated description more hilarious.

    Sometimes the model tries to extrapolate context, for example it will randomly decide to describe an older woman as a “mother” if there is also a child in the photo. Even if a human eye could tell you from context it’s more likely a teacher and a student, but there’s a lot a human can do that a bot can’t, including having common sense to use appropriate language when describing people.

    Image descriptions will always be flawed because the focus of the image is always filtered through the description writer. It’s impossible to remove all bias. For example, because of who I am as a person, it would never occur to me to even look at someone’s eyes in a portrait, let alone write what colour they are in the image description. But for someone else, eyes may be super important to them, they always notice eyes, even subconsciously, so they make sure to note the eyes in their description.