Agent-based models (ABM) are the ultimate in being able to flexibly represent systems we observe. They are also very suggestive in their output - usually more suggestive than their validity would support. For these reasons it is very easy to be taken-in by an ABM, either one's own or another's. Furthermore, good practice in the form of visualisations and co-developing the model with stakeholders increase the danger that all those that might critique the model are cognitively committed to it.
It is to avoid such (self) deception that validation is undertaken - to make sure our reliance on a model is well-founded and not illusory. Unfortunately, effective validation is resource- and data- intensive and is far from easy. The flexibility in making a model needs to be matched by the strength of the validation checks. To be blunt - it is easy to (unintentionally or otherwise) 'fit' a complex ABM to a single line of data, so it looks convincing. This kind of validation where one aspect of the simulation outcomes is compared against one line of data is called 'thin validation'.
In physics or other sciences they do do this kind of validation, but they have well-validated micro-foundations so the effective flexibility of their modelling is far less than for a social simulation, where assumptions about the behaviour of the units (typically humans) are not well known. That is why such thin validation is inadequate for social phenomena.
Of course the kind and relevance of validation depends upon your purpose for modelling. If you are not really talking about the observed world but only exploring an entirely abstract world of mechanisms it might not be relevant. If you are only illustrating an idea or possible series of events and interactions then 'thin' validation might be sufficient.
However if you are attempting to predict unknown data or establish an explanation for what is observed then you probably need multi-dimensional or multi-aspect validation - checking the simulation in many different ways at once. Thus one might check if the right kind of social network emerges, and statistics about micro-level behaviours are correct and that emergent aggregate time-series are correct and that in snapshots of the simulation there is the right kind of distribution of key attributes. This is 'fat' validation, and then starts to be adequate to pin down our complex constructions.
Papers that have thin or non-existent validation (e.g. they rely on plausibility alone as a check) might be interesting but should not be interpreted as saying anything reliable about the world we observe.