Keeping it simple, camera sensors can be thought of as very small solar panel arrays.
The full description is more complicated. A 12 megapixel camera has 12 million pixels, or 12 million photosites. Each photosite is a very small cavity, and when a picture is taken photons are trapped in these. The camera then counts how many photons are in each cavity, and transmits this information to the processor, which assigns a lightness value to that particular photosite, depending on how many photons were in it.
For example, if there were 0 photons, the assigned lightness would be pure black. If there were 100,000, it would be very close to pure white.
However, this process only tells the camera what shades to make the pictures; the colors are not known. To find colors, RGB filters are placed over each photosite to collect only that color. If a certain photosite only collects red, that means that the two surrounding it will collect green and blue, respectively. This also means that a 12 megapixel is actually only a 4 megapixel camera, because each pixel can only pick up 1/3 of the light available.
The next process is called Bayer Demosaicing, and it is the process that actually turns the individual red, green and blue values into a full range of colors. Basically, the camera takes four pixels in a 2x2 array (two green, a red and a blue), and pretends that is only has one "full color" pixel. These arrays are assumed to overlap with each other, so the color information is take only from the points where they overlap, allowing the camera to artificially create extra resolution. (This is why a 12 megapixel camera is allowed to market itself as 12 megapixels, even though it only has 4megapixels true resolution).
This color information is measured on a scale of 0-255 for each color(for a basic camera). Each pixel is assigned a number, based off of this scale and the Bayer Demosaicing, which is then compared to a database. For example, a certain pixel could get (R,G,B)=(255,215,0), which would translate to the color gold.