Copyright gives the copyright holder exclusive rights to modify the work, to use the work for commercial purposes, and attribution rights. The use of a work as training data constitutes using a work for commercial purposes since the companies building these models are distributing licencing them for profit. I think it would be a marginal argument to say that the output of these models constitutes copyright infringement on the basis of modification, but worth arguing nonetheless. Copyright does only protect a work up to a certain, indefinable amount of modification, but some of the outputs would certainly constitute infringement in any other situation. And these AI companies would probably find it nigh impossible to disclose specifically who the data came from.
I say “look it up”. Applies to lots of forms of search, be it google, DDG, YouTube, Wikipedia, a dictionary, a manual, pretty much anything.