There are adventure games. And they're in the form of textual descriptions, with textual input from the user. So the game interpreter has to be capable of dealing with whatever word combination the user comes up with, but apart from that, the game interface is simple.
But players like pictures with their text. So the original text-only interface reserves a small rectangle of screen for projecting a picture of the current scene. This means that the players don't have to use their imagination as much and descriptions can't be quite so fanciful any more, but it looks pretty and adds to the game appeal. (Example: Arthur: The Quest For Excalibur.)
Soon, the small rectangle becomes a large rectangle, and some animation may be added to the image. (Example: Spellcasting 201: the Sorceror's Appliance.) This allows for jokes where the image doesn't match the description, or cuts down the amount of description needed because the image speaks for itself. The image can even contain extra information needed to solve a puzzle, or the puzzle itself. (Example: Zork Zero.) One image can convey more than a thousand words, so instead of a textual description, the player might simply be shown one big screen-filling image and left to deduce the message. (Example: Myst and its successors.)
Once most of the screen is graphics, two questions pose themselves that were a moot point with a purely textual interface. The first is: scene-to-scene or 360-degree view? Moving from one screen to another mirrors the text adventure's progression of moving from one room to another by typing in a direction; allowing the user to turn around completely and survey the scene from any angle standing on any point of the map was a moot point anyway until graphics cards had that kind of ability. The second is: first-person or third-person view? When it became customary to display the "room" the player is in as a picture, the next step was to display a player character in that room. This dissociated the actual player from the player character, but also added potential eye-candy and/or comedy as the player character swanned around or did funny things. (Example: Simon the Sorceror and most of the Sierra Quest games; in fact, most games in the adventure genre.) Having the player character, however blockily depicted, move around the screen had another consequence: text became less important, because it was no longer needed to get around. The player still had to enter text commands to do stuff to other stuff, but leaving the screen was as simple as steering the little onscreen manikin around with the mouse or cursor keys until the correct exit location was found, and a new screen loaded.
A side effect of having a visible player character was the possibility of having two or more player characters, and switching between them at will. With a little ingenuity, this would also be possible in text adventures, but it would be complicated and, due to the lack of visual appeal, rather pointless. (Example: any Sam and Max game.)
Some games already made it simpler for the player to enter text commands by clicking a button; for instance, despite having a text entry line, a game might have buttons saying NORTH, SOUTH, EAST, WEST, TAKE, DROP, EAT and other frequently used commands, which the player only had to click. Since the main game screen was already transforming from all-text to all-image, it made sense to also replace these words by images. The direction buttons could go, since the player only needed a WALK button and cursor keys. The LOOK and EXAMINE commands became an eye button; for INVENTORY, there was a purse, bag or something simple to open the inventory screen. A hand button covered all the "do stuff to stuff" commands, and a mouth button handled all the "talk to" business. That was the basic button set; games could add specific functions like Leisure Suit Larry's "zipper" button. Buttons for saving, restoring and changing game settings could also be added, or a save-restore-settings menu could be opened with a key combination. The buttons themselves could be tabbed to or activated by special keys, but since, by that time, every new computer came with a mouse, it was much simpler to click them, then click on the spot in the game screen where their effect was to be applied.
And thus, the point-and-click interface was born.
On the one hand, replacing all possible actions and conversational subjects with a hand and a mouth was a simplification that removed much inventiveness from adventure games. On the other hand, the inventiveness was sneaked back in by making it hard to guess just where in the picture the hand/mouth had to be applied. The decreased importance of language to solving the game also made games easier to translate (translating a text adventure would be a gargantuan task) and less impossible to play, even untranslated, by players who don't speak the game's language.
The combination of graphics, (often) humour, and a point-and-click interface
became a roaring success. A list of all text adventure titles written would fill
a few pages, but all point-and-click adventure titles ever made would fill a
volume. In the time when graphics cards for PC were all VESA-compatible with on
average 4MB of video RAM, a "computer game" was usually either a simple arcade
game, a primitive RPG or a point-and-click adventure.