The age of the modern smartphone, many would argue, began in 2007 with the launch of the original iPhone. After all, that device had the slab screen with a full touch surface instead of keys, cellular, Wi-Fi, and Bluetooth connectivity, access (although terribly slow) to the Internet — and a camera.
Ever since, those attributes have defined all smartphones, save for one common smartphone capability that didn’t arrive until the iPhone 3GS in 2008: the App Store. Apps, together with the aforementioned hardware features, define the modern smartphone.
But of all the capabilities and components of these amazing and ubiquitous computing devices, it is the smartphone camera that has seen the most extraordinary evolution. Phone cameras made their first appearance during the era when smartphones started becoming practical, everyday devices. Then, these cameras advanced in an App Store-centric world where companies like Blackmagic Design could create camera apps that redefine how a smartphone camera works. Now, artificial intelligence (AI) and machine learning (ML) are changing the very nature of what a camera can do.
Identifying the first anything is always a tricky undertaking. The very first device labeled as a smartphone (they called it a “Smart Phone”) was the Ericsson 88 from 1997. Only 200 were made, and it was mostly a personal digital assistant. It most definitely did not have a camera.
But before the modern smartphones, phones with some modern smartphone attributes were introduced. The best known, of course, was the BlackBerry. The earliest BlackBerry devices, the 850 and 857 models weren’t even phones. They were pagers with email. It wasn’t until 2003 that the BlackBerry 6210, with its famous keyboard and integrated phone, was released. But even then, the BlackBerry didn’t have a camera.
Also: The best phones we tested in 2023, including foldables and budget picks
Sharp — the company that makes my microwave oven — introduced what can be considered the world’s first camera phone. Released in 2000, it was called the J-SHO4, and was available exclusively in the Japanese market. With its 110,000-pixel sensor, the J-SHO4 was capable of taking very low-res digital images — tiny, almost postage-stamp-sized images, with a 383×287 pixel resolution.
Samsung also lays claim to the title of “first phone with a built-in camera” — the SCH-V200 — also released in 2000. This handset had a 0.35-megapixel sensor and could take up to 20 photos of 640×480 pixels before you had to hook it up to a computer and download the images. Samsung also claims to have invented the selfie camera. In 2002, the company released the SCH-X590, a flip phone with a rotating camera.
This was where the concept of megapixels started to take hold. The more data that could be stored, the more pixels there were. As photos started to get into the millions of pixels, the “megapixel” marketing term was created. Fundamentally, the more pixels, the higher the resolution of the image (and the more tinkering you can do with it).
Another phone — and my personal device for four years — was the Palm Treo 600, released in 2003. This device did a lot, including supporting a camera capable of taking 640×480 resolution images. The Treo didn’t have Wi-Fi or Bluetooth, limiting its connectivity to a cable connected to the computer for image downloads. Its big claim to fame was that it could run any of the thousands of PalmOS apps that were available for download. Installing those apps, however, also involved connecting a cable to a computer.
2008: The birth of the modern smartphone
Released on June 29, 2007, the iPhone was explosive. Many of us remember the lines of people waiting to get their first phone. I sat those lines out, happy with my Treo. While the first iPhone had nearly all of the characteristics of a modern smartphone, including a 2.0MP rear-facing camera, the only apps it allowed were crude customized web pages. My Treo had far better native PalmOS apps.
That first generation iPhone was not, in my opinion, the first true modern smartphone. For that, apps need to be installed directly on the handsets. When Apple’s iPhone 3G hit the market in July 2008 — along with the App Store — the world was never the same. Smartphone apps enabled billions of non-technical people to customize their phones, making it possible to install new software with a single click.
More specifically, apps can amplify camera functionality. With filter apps, users can easily add special effects or painterly screens over their images. Apps also paved the way for social media juggernauts like Instagram and TikTok. Consequently, social networking has become the force it is today with the ability for users to capture photos and videos, as well as share those with friends, family, and the whole world with the tap of a screen. With apps, photos and videos can go anywhere, instantly.
For serious photographers and videographers, apps can enhance phones’ cameras. The Blackmagic Design’s Camera app, for example, allows photo-savvy users to customize settings like frame rate, shutter angle, white balance, and ISO — the way professional photographers would in their photo-taking process.
The full-sized screen, high-speed Internet connectivity, fairly good to excellent cameras, and ability to completely customize phones with apps empowered people worldwide and put so much more than a little computer in their pockets. Everyone now has all this amazing power and flexibility, at all times.
Also: The best camera phones: Capture crystal-clear photos, videos, and selfies
Personally, the iPhone 3G was my first iPhone (there was no iPhone 2 or 3). I bought it and an iMac a few weeks after the phone was released, and went on to create 40 iPhone apps. That iPhone 3G didn’t improve the camera hardware by much. That wouldn’t happen until the iPhone 3GS, which jumped from a 2.0- to a 3.0MP camera and started to record 30 frames-per-second video.
Over in the Android world, its first phone was the HTC Dream, also marketed as the T-Mobile G1. It was released in October 2008 and was noteworthy for its slide-out display that opened to show a BlackBerry-style keyboard. This device came with a rear-facing 3.15MP camera.
With the iPhone 3 and the App Store, along with the first Android phone, it’s fair to say that 2008 was the first year of the modern smartphone era.
2010: Smartphones meet narcism, a match made in heaven
Smartphones have evolved with a cadence we’re all familiar with. Each year, new capabilities have been added and features improved. Cameras evolved from 3.0MP cameras to 48-50MP monsters like the iPhone 15 Pro Max, Google Pixel 8, OnePlus 11, and Asus ROG Phone 8.
Of particular note is the Sony Xperia smartphone. While all of the higher-end phones have exceptional camera systems, Sony is the only company that makes pro and prosumer cameras in addition to smartphones. Sony’s A7 series (I like the Sony A7 IV) is one of the most well-respected pro-level cameras in use, and I use the Sony ZV-E10 camera constantly in the video studio and for product shots here at ZDNET.
Also: How the iPhone 15 Pro Max challenges mirrorless cameras: We compare price and performance
Another slight jump came in 2010 when Apple introduced the iPhone 4 and HTC introduced the Android-based EVO 4G. Both of these featured front-facing cameras, suitable for taking selfies. Neither had much to write home about in terms of resolution, but front-facing cameras also got better over the years.
Of course, as storage has improved both in capacity and speed, it’s possible to store even 8K video at 120 frames per second on SSDs connected to cameras over fast USB-C ports. That makes for a huge amount of information to be captured and stored, and lots of devices today handle the requirements with ease.
Then there’s the Samsung Galaxy S23 Ultra, which features a ludicrously over-the-top 200MP main camera. That’s roughly 16,384 x 12,288 pixels, for those who can even picture such a thing. Each highly-compressed JPEG takes 20-40MB, but an uncompressed RAW image takes 100MB per image or more. That first iPhone 3G came with all of 128MB of storage, which might have held — at most — one or two images from this modern phone.
Also: Storage improvements have outperformed Moore’s Law by a factor of 800%
Many smartphones today capture 8K video directly into phone storage. These include the Samsung Galaxy S23 (8K was supported as far back as the S20), the Asus ZenFone 9, the OnePlus 11, and the iPhone 15 Pro Max can all record 8K video directly into the device.
Over time, all the increases in storage capacity, processor speed, battery life, and display resolution were accompanied by improvements to the software inside the phones, with vendors adding all sorts of smarts to their camera applications.
2017: The start of the AI/ML smartphone era
It’s difficult to nail down exactly when machine learning found its way into smartphones, but a good case can be made for 2017. That year, Google released the Pixel 2, which got a portrait mode that blurred backgrounds, and improved processing for HDR images.
Apple, too, was focusing on portrait mode photography in 2017, introducing the iPhone 8, 8 Plus, and iPhone X. Each of these devices included both a main processor and a Neural Engine — a processor dedicated to machine learning tasks.
Also: How the iPhone 15 Pro Max challenges mirrorless cameras: We compare price and performance
Overall, these initial machine learning capabilities enhanced overall photo processing, improving aspects like auto-focus, exposure, color balancing, and noise reduction. The integration of machine learning into the Pixel and iPhone’s camera systems marked a significant step forward in the quality and capabilities of smartphone photography.
AI and machine learning in today’s smartphones
If you think about the stages of photography, we had light and shadow, then we had fixed images stored using chemicals (the film stage), then we had magnetic media (still analog), and then the rise of digital cameras and smartphones. In each of those stages, the one common factor was that the camera was meant as a capture device. It didn’t do any of the art.
But all that is changing with modern smartphones. By embedding considerable machine learning technology inside these devices, the camera itself becomes a production partner in the creation of exceptional images and quality video.
Also: The best vlogging cameras you can buy
I asked Bob Caniglia, Blackmagic’s director of sales operations, about smartphone camera evolution. Blackmagic makes some key tools for managing the video production process, as well as some very slick cinema-grade cameras. Last year, Blackmagic introduced its Blackmagic Camera app, which takes the iPhone’s camera and gives it superpowers.
“Until recently,” Caniglia told me, “The discussion was mostly around how a smartphone’s camera was limited being in such a small physical device. And there is no doubt that there are still big differences between smartphones and larger professional cameras.”
“But the conversation has moved to the different ways cameras like the iPhone can be used,” he continued. “I think AI and machine learning features — like the iPhone 15’s scene, skin and sky segmentation and detection, periscope zoom lens, and Portrait Mode — have opened up the possibilities of how smartphones can be used by everyone.”
Let’s now explore the power that machine learning brings to smartphones. Specifically, I’ll talk about the machine learning magic incorporated into flagship phones like the iPhone 15 Pro Max, the Google Pixel 8, the Samsung Galaxy S23, and the OnePlus 11.
1. Image quality
Smartphones are now capable of making substantial enhancements to the quality of images as they are captured in the camera. Here are three examples of machine learning in use in the previously listed flagship phones.
Image processing and enhancement: Convolutional neural networks use a mathematical operation called convolution which calculates pixel values based on a sliding filter, helping the algorithm identify specific features like edges, textures, and shapes.
This then helps the machine learning algorithms to analyze and adjust parameters like exposure, contrast, and color balance to enhance quality. This is particularly useful in challenging lighting conditions; it’s how smartphones can take low-light and high-glare photos that previously were almost impossible to capture.
Also: Meta’s AI luminary LeCun explores deep learning’s energy frontier
Low-light photography and night mode: Speaking of tough lighting conditions, machine learning provides a powerful assist in low-light photography, where it helps in noise reduction, detail enhancement, and color accuracy. It does this using neural network technology to process multiple exposures, merging them into a single image while enhancing detail and reducing noise. Of course, decisions about what detail to enhance and what noise to reduce is where the AI comes into play.
HDR processing: High dynamic range (HDR) processing helps balance the dark and bright areas of an image for an improved dynamic range. Algorithms dynamically adjust the exposure of different regions in a photo, merging multiple exposures for a balanced high dynamic range image, keeping the visual fidelity of the image while allowing for blacker blacks, whiter whites, and other darker and lighter colors to better reflect what the photographer originally aimed to capture.
2. Object knowledge
During the film era, some analog techniques were possible for image improvement. Now, AI and ML are critical when it comes to having intelligence about what’s in a scene. Here are some of the powerful capabilities built into those smartphones I discussed earlier.
Scene and object recognition: Smartphones can recognize various scenes — landscapes, portraits, or low-light settings — and objects within an image. Based on this recognition, the camera can optimize settings for the best shot. Deep learning algorithms utilizing convolutional neural networks have been trained on vast datasets to accurately recognize and categorize different scenes and objects in images. Often, heavily optimized versions of the results of that training are embedded either in the camera apps or even in the phones’ chipsets.
Portrait mode and bokeh effect: Depth estimation models, often using ML techniques like semantic segmentation, can create a depth map of the scene, differentiating the subject from the background. This is how we get portrait mode, where the subject is in focus while the phone creates a seemingly artistically blurred background.
Face detection and beautification: Computer vision algorithms can detect faces in an image and apply subtle enhancements, like skin smoothing or light adjusting, to improve portraits. This process is often done based on a library of learned aesthetic preferences.
Also: AI safety and bias: Untangling the complex chain of AI training
Some early facial recognition applications demonstrated bias. They were using very limited and, therefore, often biased training data. This is less of a problem today, as vendors are using vastly larger and more inclusive training sets. We need to continue to fight against bias in our AI.
AI-powered filters and effects: Some of the earliest smartphone digital app photo features were creative filters and effects. Initially, these filters were mostly algorithmic, based on a programmer’s code. Over time, ML techniques like generative adversarial networks were applied.
This technique pits a “generator” algorithm against a “discriminator” algorithm process, where the discriminator provides feedback to the generator to drive improvement. The resulting learned effects, or style transfer processes, mimic the styles of various artists and techniques. This allows users to apply complex artistic styles to their photos, and for the resulting images to appear stylistically relevant. It also has resulted in lawsuits.
Also: Generative AI: Just don’t call it an ‘artist’
3. Quality-of-life enhancements
Smartphone cameras are not only taking better and better pictures and videos, but they’re also becoming easier to use at the same time. Here are a few quality-of-life enhancements that make smartphone cameras more helpful to their users.
Video stabilization: Ever hear the phrase, “We’ll fix it in post”? That’s the process of repairing a film or video after it leaves filming, with the editor using a combination of smart tools and skills to create a good clip. But now, ML models can fix shaky video dynamically, right in the camera.
Also: This new camera embeds authenticity details in photos, but it doesn’t come cheap
ML models in smartphones do this by analyzing motion patterns frame-by-frame to predict and correct camera shake and motion blur, resulting in a smoother clip. Usually, this “consumes” some of the edges of the video frame, creating a cropped but much more stable image.
Autofocus and tracking: Before machine learning, lenses with autofocus used distance sensors and calculations to determine the focus point of the lens. But now, ML has improved autofocus performance, making it substantially faster and more accurate.
Predictive algorithms and object detection models are often used for real-time tracking of moving subjects, maintaining a focus lock while the subjects (or the camera operators) move.
Automatically adapting to user preferences: Some smartphone cameras use reinforcement learning techniques to adapt to user preferences over time, and automatically adjust settings or suggest modes based on past usage.
Also: You can now run Microsoft’s AI-powered Copilot as a free Android app
One thing that’s important to note: Generative AI is something that occurs outside of the camera.
As Blackmagic Design’s Caniglia said, “There’s been an incredible evolution of smartphone camera capabilities in comparison to just a couple of years ago. AI machine learning, especially with the new iPhone 15, has been a big driver. A huge part of that is because Apple has focused on developing technologies that do more with the actual information captured by the camera’s sensor rather than a focus on creations of “faux images” via generational AI.”
Looking to the future
We’ve been doing a tremendous amount of coverage of generative AI this past year. And every year, phone vendors introduce even more smartphone capabilities. So what does the future hold?
Also: Generative AI filled us with wonder in 2023 – but all magic comes with a price
The simple answer is: ever-increasing capacity and higher and higher quality images. That’s the trajectory smartphone machine learning has been on for the past decade or so.
But as I’ve started to explore VR with the Meta Quest 3 headset, I’m starting to think there’s another path for smartphones.
After Apple’s WWDC keynote last year, I wrote an analysis the announcement of its Vision Pro XR headset. While I was fairly bullish on the overall idea, I derided its 3D camera playback capability:
Apple showed a ludicrous demo where a father, wearing a Vision Pro, used it to film a 3D “movie” of his kid’s birthday. But while that example was aspirationally wacky, capturing 3D video clips could prove hugely beneficial for training courses and other demonstrations, to be embedded inside of point-of-function applications.
Also: Can a claustrophobic guy with glasses learn to stop worrying and love Meta’s Quest 3?
I think I was wrong. I’ve been using the Meta Quest 3 for about a week and have looked at the Quest’s relatively rudimentary 3D home movies. There’s something there. It’s not like just watching a film. Once you enter VR, you really get the feels.
Beyond capturing personal memories, 3D VR camera capture has some enormous potential, especially once we move beyond heavy consoles on our faces and into clear glasses. I expect to see a ton of AI and ML applied to images and videos used in that context.
What are your thoughts about the future of photography and video, and how AI and ML can help? Let me know in the comments below.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter on Substack, and follow me on Twitter at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.