Standardized, repeatable testing is central to all 1,500-plus reviews we publish annually. Here’s how we test every desktop PC that hits the bench at PC Labs.
Desktop PCs were the first computer category PCMag, in its infancy, started out covering, and they’re still one of the most important. (They’re in our very name, after all.) And we’ve been testing them with the same tried-and-true care for more than 40 years, starting with the establishment of PC Labs in 1984: We compare each system to others in its category based on price, features, design aspects, and hands-on, repeatable performance tests so we can make smart comparisons across a broad range of competing products.
To evaluate performance, we use an array of benchmark software, real-world applications, and games, carefully chosen to highlight the strengths and weaknesses of a desktop PC’s mix of components. That evaluation ranges from the processor and the memory subsystem to the machine’s storage hardware and graphics silicon. We test only sale-ready production units with the latest updates and drivers available. (Pre-production or prototype systems may appear in our hands-on or preview stories, but not reviews with benchmark results and star ratings.)
We regularly evaluate new benchmark solutions as they hit the market and overhaul our testing procedures to keep pace with the latest technologies. In late 2024, we rolled out a new suite of benchmarks. The downside of changing our benchmarks is that it resets the database of tested PCs that we can use for comparisons in reviews. (That said, at the time, we retested every recent desktop we could lay our hands on to build the database back up.) The upside is that the new tests give us more current, accurate comparisons that will improve as we add more and more data.
Our desktop benchmark testing focuses on three roughly divided aspects of performance: general productivity, content creation, and graphics rendering. We also add specific tests to measure the capabilities of gaming PCs and desktop workstations. For all-in-one PCs, we add the same display benchmarks that we perform when testing laptops. Here’s a breakdown of each.Productivity Tests
Our suite of productivity benchmark tests simulates the broadest, most popular use cases of computers, such as writing, editing, information management, and multimedia communication. With these benchmarks, we also stress-test CPUs to cover performance with applications that run better with more (and more powerful) cores.PCMark 10
Our first (and arguably most important) benchmark test is UL’s PCMark 10. This wide-ranging suite simulates various Windows programs to give an overall performance score for office workflows. The tasks involved include such everyday staples as word processing, web browsing, videoconferencing, and spreadsheet analysis.
We run the primary PCMark 10 test (not the Express or Extended versions), which yields a proprietary numeric score. Results over 4,000 or 5,000 points indicate excellent productivity for everyday Microsoft Office or Google Workspace tasks.
The PCMark 10 test results let us compare systems’ relative performance for everyday tasks. (Large organizations also use PCMark 10 to gauge how well potential new hardware handles workloads, compared with existing installed hardware.) Remember that PCMark 10 results, like those from most of our benchmark tests here, are susceptible to the specific configuration of the PC running the benchmark. Changing key components will change the score.PCMark 10 Full System Drive Storage Test
We also run PCMark 10’s Full System Drive storage subtest, which measures the program load time and the throughput of the desktop’s boot drive. Nowadays, that is almost always a solid-state drive rather than a spinning hard drive.
Like the productivity test, the PCMark 10 Storage test delivers a numeric score, with higher numbers indicating quicker response.
The benchmark aims to factor in lower-end Serial ATA bus architectures and higher-end PCI Express/NVMe ones alike, quantifying the real-world performance differences attributable to these different drive types. (An earlier version of the test tended not to differentiate much between various SSD implementations.)Cinebench 2024
Maxon’s Cinebench is a component-specific rendering test that uses the company’s Redshift engine to render a complex scene using the CPU or GPU. We run the CPU version of the test in a multi-core benchmark that works across all of a processor’s cores and threads—the more powerful the chip, the higher the score—and in a single-core variant. Cinebench’s multi-core test scales well with more cores and threads and higher clock speeds. And because the latest version of the test is available for x86, Arm, or Apple Silicon, we can use the same test for most hardware platforms and compare numbers.
Cinebench is a raw test of a PC’s number-crunching ability, measured via a computer-aided design and 3D rendering task. The score reflects how well a desktop will handle processor-intensive workloads. For the number it kicks back, higher is better.Geekbench 6.3 Pro
Primate Labs’ Geekbench is another processor workout. It runs a series of CPU workloads designed to simulate real-world applications, such as PDF rendering, speech recognition, and machine learning. We run Geekbench 6.3 Pro, which was the latest Pro version of the test when we switched over to our new suite of benchmarks.
We record Geekbench’s Multi-Core and Single-Core scores. (Higher numbers are better.) Geekbench is especially handy because it has versions for many platforms (including Apple’s macOS and iOS, and Qualcomm’s Snapdragon X processors), enabling valuable cross-platform comparisons.HandBrake 1.8
Video-file transcoding is one of the most demanding tasks for a PC, and we test it with HandBrake, a free, open-source video transcoder for converting multimedia files to different resolutions and formats. We record the time HandBrake takes, rounded to the nearest minute, to convert a 12-minute 4K H.264 video file (the Blender Foundation movie Tears of Steel) to a more compact 1080p version via transcoding. We use the software’s Fast1080p30 preset for this conversion.
This benchmark is primarily a CPU test. Like Cinebench, it scales well with more cores and threads and in systems with robust thermals to handle heavy, sustained processing loads over several minutes. And, with compatibility across x86, Arm, and Apple Silicon, we can use this test to compare Windows, Windows on Arm, and macOS systems directly. Because this is a time-to-completion test, lower times are better.Content Creation Tests
Content creation testing overlaps with general productivity, but it’s also a distinct portion of our testing. We use the last three of these tests mainly for machines equipped with discrete GPUs, built for tasks like image editing, video work, and rendering. For these more demanding use cases, we use a handful of tests built around popular content creation tools, simulating the tasks that stress a system the most.