The simplest explanation I can think of is you train a computer to recognize images and then ask it to combine those images into a new image based on either random noise (think snow on a TV) or a base image. For example, you train the computer (machine learning algorithm) on thousands of cat pictures. Then you train it to recognize what a robot looks like. Once you have that done, you ask it to create a "robot cat" an it will blend the two ideas into a new, original image.
For the more technical version, here is an excerpt from an Adafruit Learning Article:
GANs (Generative Adversarial Networks) are systems where two neural networks are pitted against one another: a generator which synthesizes images or data, and a discriminator which scores how plausible the results are. The system feeds back on itself to incrementally improve its score.
A lot of coverage has been on the unsettling and dystopian applications of GANs — deepfake videos, nonexistent but believable faces, poorly trained datasets that inadvertently encode racism — but they also have benign uses: upscaling low-resolution imagery, stylizing photographs, and repairing damaged artworks (even speculating on entire lost sections in masterpieces).