HiSilicon Kirin 970 - Android SoC Power & Performance Overviewby Andrei Frumusanu on January 22, 2018 9:15 AM EST
Today I would say that there’s only two truly vertically integrated mobile OEMs who have full control over their silicon: Apple and Huawei – and of the two one could say Huawei is currently even more integrated due to in-house modem development. Huawei’s semiconductor division, HiSilicon, has over the last several years been the one company which seems to have managed what the others haven’t: break in into the high-end market with solutions that are competitive with the current leader in the business, Qualcomm.
I remember the Honor 6 with the newly branded (Previously not having any “halo” line-up name) Kirin 920 SoC as the first device with the company’s in-house SoC that we reviewed. These and the following generation the Kirin 930 suffered from immaturity with problems such as a very power hungry memory controller and very disappointing camera processing pipeline (ISP/DSP). The Kirin 950 was in my opinion a turning point for HiSilicon as the product truly impressed and improved the quality of the product, catching many eyes in the semiconductor industry, including myself in the resulting review of the Huawei Mate 8.
Over the last several years we’ve seen great amounts of consolidation in the mobile semiconductor industry. Companies such as Texas Instruments which were once key players no longer offer mobile SoC products in their catalogue. We’ve seen companies such as Nvidia try and repeatedly fail at carving out meaningful market-share. MediaTek has tried providing higher end SoCs with the Helio X line-up with rather little success to the point that the company has put on hold development in that segment to rather focus on higher margin parts in the P-series.
Meanwhile even Samsung LSI, while having a relatively good product with its flagship Exynos series, still has not managed to win over the trust of the conglomorate's own mobile division. Rather than using Exynos as an exclusive keystone component of the Galaxy series, Samsing has instead been dual-sourcing it along with Qualcomm’s Snapdragon SoCs. It’s therefore not hard to make the claim that producing competitive high-end SoCs and semiconductor components is a really hard business.
Last year’s Kirin 960 was a bit of a mixed bag: the SoC still delivered good improvements over the Kirin 950 however it was limited in terms of what it could achieve against competing flagship SoCs from Samsung and Qualcomm as they both had a process node advantage. Huawei's introduction of flagships with new generation of SoCs in the fourth quarter is more close to the release time-frame of Apple than the usual first quarter that we’ve come accustomed of Qualcomm and Samsung.
As such when pitting the Kirin versus Snapdragon and Exynos’s we’re looking at a product that’s more often than not late to the party in terms of introduction of new technologies such as process node and IP. The Kirin 970 fits this profile: as a 10nm Cortex-A73 generation-based SoC, it lagged behind Qualcomm and Samsung in terms of process node, yet being too early in its release to match up with ARM’s release schedule to be able to adopt DynamiQ and A75 and A55 based CPU cores for this cycle. That being said the Kirin 970 enjoys a few months with technical feature parity with the Snapdragon 835 and Exynos 8895 before we see new Snapdragon 845 and Exynos 9810 products later in the usual spring refresh cycle.
Nevertheless, the article today is a focus on the Kirin 970 and its improvements and also an opportunity to review the current state of SoCs powering Android devices.
|HiSilicon High-End Kirin SoC Lineup|
|SoC||Kirin 970||Kirin 960||Kirin 950/955|
|CPU||4x A73 @ 2.36 GHz
4x A53 @ 1.84 GHz
|4x A73 @ 2.36GHz
4x A53 @ 1.84GHz
|4x A72 @ 2.30/2.52GHz
4x A53 @ 1.81GHz
|4x 16-bit CH
LPDDR4 @ 1833 MHz
|4x 16-bit CH
LPDDR4 @ 1866MHz
LPDDR4 @ 1333MHz 21.3GB/s
|Interconnect||ARM CCI||ARM CCI-550||ARM CCI-400|
|Storage I/F||UFS 2.1||UFS 2.1||eMMC 5.0|
|ISP/Camera||Dual 14-bit ISP||Dual 14-bit ISP
|Dual 14-bit ISP
|2160p30 HEVC & H.264
Decode & Encode
Decode & Encode
|Integrated Modem||Kirin 970 Integrated LTE
DL = 1200 Mbps
5x20MHz CA, 256-QAM
UL = 150 Mbps
2x20MHz CA, 64-QAM
|Kirin 960 Integrated LTE
DL = 600Mbps
4x20MHz CA, 64-QAM
UL = 150Mbps
2x20MHz CA, 64-QAM
|Balong Integrated LTE
DL = 300Mbps
2x20MHz CA, 64-QAM
UL = 50Mbps
1x20MHz CA, 16-QAM
|Mfc. Process||TSMC 10nm||TSMC 16nm FFC||TSMC 16nm FF+|
The Kirin 970, isn't a major IP overhaul as it continues to use the same central processing unit IP from ARM that was used in the Kirin 960. The new SoC even doesn't improve the frequency of the CPU clusters as we still see the same 2.36GHz for the A73 cores and 1.84GHz for the A53 cores. When ARM originally launched the A73 we had seen optimistic targets of up to 2.8GHz on TSMC 10nm, but we seem to have largely missed that target, a sign of ever increasing difficulty to scale frequency in mobile SoCs as the diminishing returns from process node updates become worse and worse.
The Kirin 970 does bring a major overhaul and change in the GPU configuration as we see the first implementation of ARM’s Mali G72 in a 12-cluster configuration, a 50% increase in core count over the Kirin 960’s G71-MP8 setup. The new GPU is running at a much reduced frequency of 746MHz versus the 1033MHz of the Kirin 960. In Matt Humrick’s review of the Kirin 960 we saw some disastrous peak average power figures of the Mali G71 outright exploding the thermal envelope of the Mate 9, so hopefully the architectural improvements of the new G72 alongside a wider and lower clocked configuration in conjunction with the new process node will bring significant improvements over its predecessor.
The new modem in the Kirin 970 now implements 3GPP LTE Release 13 and supports downlink speeds of up to 1200Mbps thanks to up to 5x20MHz carrier aggregation with 256-QAM, making the new Kirin modem feature equivalent to Qualcomm’s X20 modem that’ll be integrated in the Snapdragon 845.
The big story surrounding the Kirin 970 was the inclusion of a dedicated neural processing unit. The NPU, as HiSilicon decided to name it, is part of a new type and generation of specialised dedicated acceleration blocks with the aim of offloading “inferencing” of convolutional neural net (CNNs). Many will have heard buzzwords such as artificial intelligence surrounding the topic, but the correct term is machine learning or deep learning. The hardware acceleration blocks with various names from various companies do not actually do any deep learning, but rather are there to improve execution (inferencing) of neural network models while the training of the models will still remain something that will be done either in the cloud or by other blocks in the SoC such as the GPU. It’s still the early days but we’ll have a proper look at the NPU in its dedicated section of the article.
SoC die shot image & labels courtesy of TechInsights Mate 10 teardown
As aforementioned one of the bigger improvements of the Kirin 970 is the switch to a TSMC 10FF manufacturing node. While 10nm is supposed to be a long-lived node for Samsung's foundry – where indeed we’ll see two full generations of SoCs produced on 10LPE and 10LPP – TSMC is taking a different approach and sees its own 10FF process node a short-lived node and stepping-stone to the much anticipated 7FF node, which is to be introduced later in 2018. As such the only TSMC 10FF mobile products to date have been the low-volume MediaTek X30 and Apple 10X in summer and the high-volume Apple A11 and HiSilicon Kirin 970 in Q3-Q4, a 2-3 quarter after Samsung had entered high-volume production of the Snapdragon 835 and Exynos 8895.
HiSilicon’s expectations of the new process node are rather conservative improvement of only 20% in efficiency at the same performance point for the apples-to-apples CPU clusters, below ARM’s earlier predictions of 30%. This rather meagre improvement in power will be likely one of the reasons why HiSilicon decided not to increase the CPU clocks on the Kirin 970, instead focusing on bringing down power usage and lowering the TDP when compared to the Kirin 960.
The SoC does enjoy a healthy die size shrink from 117.72mm² down to 96.72mm² even though the new SoC has 50% more GPU cores as well as new IP blocks such as the NPU. Our colleagues at TechInsights have published a detailed per-block size comparison between the Kirin 960 and Kirin 970 and we see a 30-38% decrease in block size for apples-to-apples IP. The Cortex-A73 quad-core cluster now comes in at only 5.66mm², a metric to keep in mind and in stark contrast to Apple which is investing twice as much silicon area in its dual-core big CPU cluster.
Post Your CommentPlease log in or sign up to comment.
View All Comments
lilmoe - Monday, January 22, 2018 - linkUnfortunately, they're not "fully" vertical as of yet. They've been held back since the start by Qualcomm's platform, because of licensing and "other" issues that no one seems to be willing to explain. Like Andrei said, they use the lowest common denominator of both the Exynos and Snapdragon platforms, and that's almost always lower on the Snapdragons.
Where I disagree with Andrei, and others, are the efficiency numbers and the type of workloads used to reach those results. Measuring efficiency at MAX CPU and GPU load is unrealistic, and frankly, misleading. Under no circumstance is there a smartphone workload that demands that kind of constant load from either the CPU or GPU. A better measure would be running a actual popular game for 30 mins in airplane mode and measuring power consumption accordingly, or loading popular websites, using the native browser, and measuring power draw at set intervals for a set period of time (not even a benchmarking web application).
Again, these platforms are designed for actual, real world, modern smartphone workloads, usually running Android. They do NOT run workstation workloads and shouldn't be measured as such. Such notions, like Andrei has admitted, is what pushes OEMs to be "benchmark competitive", not "experience competitive". Apple is also guilty of this (proof is in the latest events, where they're power deliver can't handle the SoC, or the SoC is designed well above sustainable TDP). I can't stress this enough. You just don't run SPEC and then measure "efficiency". It just doesn't work that way. There is no app out there that stresses a smartphone SoC this much, not even the leading game. In the matter of fact, there isn't an Android (or iPhone) game that saturates last year's flagship GPU (probably not even the year before).
We've reached a point of perfectly acceptable CPU and GPU performance for flagships running 1080p and 1440p resolution screens at this point. Co-processors, such as the decoder, ISP, DSP and NPU, in addition to software optimization are far, FAR more more important at this time, and what Huawei has done with their NPU is very interesting and meaningful. Kudos to them. I just hope these co-processors are meant to improve the experience, not collect and process private user data in any form.
star-affinity - Monday, January 22, 2018 - linkJust curious about your claims about Apple – so you think it's a design fault? I'm thinking that the problem arise only when the battery has been worn out and a healthy battery won't have the problem of not sustaining enough juice for the SoC.
lilmoe - Monday, January 22, 2018 - linkTheir batteries are too small, by design, so that's the first design flaw. But that still shouldn't warrant unexpected slowdowns within 12-18 months of normal usage; their SoCs are too power hungry at peak performance, and the constant amount of bursts was having its tall on the already smaller batteries that weren't protect with a proper power delivery system. It goes both ways.
Samus - Monday, January 22, 2018 - linkExactly this. Apple still uses 1500mah batteries in 4.7" phones. When more than half the energy is depleted in a cell this small, the nominal voltage drops to 3.6-3.7v from the 3.9-4.0v peak. A sudden spike in demand for a cell hovering around 3.6v could cause it to hit the low-voltage cutoff, normally 3.4v for Li-Ion, and 3.5v for Li-Polymer, to prevent damage to the chemistry the internal power management will shut the phone down, or slow the phone down to prevent these voltage drops.
Apple designed their software to protect the hardware. It isn't necessarily a hardware problem, it's just an inherently flawed design. A larger battery that can sustain voltage drops, or even a capacitor, both of which take up "valuable space" according to Apple, like that headphone jack that was erroneously eliminated for no reason. A guy even successfully reinstalled a Headphone jack in an iPhone 7 without losing any functionality...it was just a matter of relocating some components.
ZolaIII - Wednesday, January 24, 2018 - linkTry with Dolphine emulator & you will see not only how stressed GPU is but also how much more performance it needs.
Shadowfax_25 - Monday, January 22, 2018 - link"Rather than using Exynos as an exclusive keystone component of the Galaxy series, Samsing has instead been dual-sourcing it along with Qualcomm’s Snapdragon SoCs."
This is a bit untrue. It's well known that Qualcomm's CDMA patents are the stumbling block for Samsung. We'll probably see Exynos-based models in the US within the next two versions once Verizon phases out their CDMA network.
Andrei Frumusanu - Monday, January 22, 2018 - linkSamsung has already introduced a CDMA capable Exynos in the 7872 and also offers a standalone CDMA capable modem (S359). Two year's ago when I talked to SLSI's VP they openly said that it's not a technical issue of introducing CDMA and it'll take them two years to bring it to market once they decide they need to do so (hey maybe I was the catalyst!), but they didn't clarify the reason why it wasn't done earlier. Of course the whole topic is a hot mess and we can only speculate as outsiders.
KarlKastor - Thursday, January 25, 2018 - linkUh, how many devices have shipped yet with the 7872?
Why do you think they came with a MDM9635 in the Galaxy S6 in all CDMA2000 regions? In all other regions their used their integrated shannon modem.
The other option is to use a Snapdragon SoC with QC Modem. They also with opt for this alternative but in the S6 they don't wanted to use the crappy Snapdragon 810.
It is possible, that Qualcomm today skip their politics concerning CDMA2000 because it is obsolete.
jjj - Monday, January 22, 2018 - linkDon't forget that Qualcomm is a foundry customer for Samsung and that could be why they still use it.
Also, cost is a major factor when it comes to vertical integration, at sufficient scale integration can be much cheaper.
What Huawei isn't doing is to prioritize the user experience and use their high end SoCs in lower end devices too, that's a huge mistake. They got much lower costs than others in high end and gaining scale by using these SoCs in lower end devices, would decrease costs further. It's an opportunity for much more meaningful differentiation that they fail to exploit. Granted, the upside is being reduced nowadays by upper mid range SoCs with big cores and Huawei might be forced into using their high end SoCs more as the competition between Qualcomm and Mediatek is rather ferocious and upper mid becomes better and better.
Got to wonder about A75 and the clocks it arrives at ... While at it, I hope that maybe you take a close look at the SD670 when it arrives as it seems it will slightly beat SD835 in CPU perf.
On the GPU side, the biggest problem is the lack of real world tests. In PC we have that and we buy what we need, in mobile somehow being anything but first is a disaster and that's nuts. Not everybody needs a Ferrari but mobile reviews are trying to sell one to everybody.
HStewart - Monday, January 22, 2018 - linkThis could be good example why Windows 10 for ARM will failed - it only works for Qualcomm CPU and could explain why Samsung created Intel based Windows Tablets
I do believe that ARM especially Samsung has good market in Phone and Tablets - I love my Samsung Tab S3 but I also love my Samsung TabPro S - both have different purposes.