1. Introduction
The proton density fat fraction (PDFF) has become an indispensable imaging biomarker in the diagnosis of multiple diseases, including metabolic-associated fatty liver disease (MASLD) and metabolic-associated steatohepatitis (MASH) [
1]. MASLD already has a high prevalence, which is expected to increase and may affect over 25% of the population in developed countries [
2,
3]. Large-scale population screening found a 27.5% prevalence of MASLD in adults aged 20–79 in Korea [
4]. Meanwhile, another study reported a 42% prevalence of MASLD in the middle-aged, overweight population in the Netherlands [
5]. Fatty liver disease is also an independent risk factor for increased cardiovascular morbidity [
6]. The cumulative incidence of cardiovascular disease events was also significantly higher in the population diagnosed with MASLD or MASLD with increased alcohol intake (MetALD) than in those without during a long-term follow-up [
4]. Therefore, there is an increasing demand for quantitative MRI techniques enabling accurate estimation of PDFF. Most major vendors of MRI scanners have developed their strategies for measuring PDFF, and comparative studies of these have found good accuracy and reproducibility across testing sites, vendors, and field strengths [
7]. Still, there is a significant need for algorithms that have a publicly accessible source code that can be tailored to the specific needs of the end users and provide vendor-independent alternatives for high-accuracy PDFF estimation.
Chemical-shift-encoded MRI (CSE-MRI) techniques have several advantages over magnetic-resonance spectroscopy (MRS) in non-invasive fat quantification, including their efficiency and the possibility of reconstructing parametric fat fraction and transverse relaxivity (
) maps; thus, these are more popular in clinical imaging. The original two-point Dixon method relies on in-phase (IP) and out-of-phase (OP) echoes and is the predecessor of all CSE-MRI algorithms [
8]. Multi-point methods, which use acquisitions with multiple echo times (TE) for fat/water separation, are less affected by phase errors, off-resonance of the
B0 magnetic field, and image noise and provide more accurate estimates of PDFF [
9]. The correct estimation of the
B0 field map is pivotal for the success of complex-based reconstructions; meanwhile, magnitude-based reconstructions are not influenced by field inhomogeneity but are hindered by the ambiguity of the water/fat decomposition and bias from non-Gaussian noise distribution [
10,
11].
Even low levels of hepatic steatosis can lead to severe adverse health effects in the long term. Current international guidelines define significant hepatic steatosis, a linchpin of diagnosing MASLD, as lipid accumulation in >5% of hepatocytes corresponding to >5.6% PDFF detected by MRS or CSE-MRI [
12]. The more recent guidance from the American Association for the Study of Liver Diseases defines hepatic steatosis as ≥5% of hepatocytes displaying macrovesicular steatosis and recommends a ≥5% MRI-PDFF cutoff for non-invasive diagnosis of significant steatosis [
13]. However, some studies indicated that lowering the diagnostic threshold to 3% PDFF might increase sensitivity while maintaining the specificity of MRI compared to histopathology sampling [
14]. Therefore, the linearity compared to an external reference, the reproducibility, and the reliability of the different fat/water decomposition algorithms must be systematically evaluated, specifically at low levels of PDFF, where even a slight bias of the estimation may interfere with establishing the diagnosis of clinically significant disease.
In the present study, we aimed to compare four publicly available software tools using different algorithms, including complex-based and magnitude-based methods, for PDFF estimation. We evaluated their performance in phantoms and a patient cohort representing various grades of hepatic steatosis by performing acquisitions at 1.5 Tesla (T), currently the most widely available field strength used for abdominal imaging.
Theory
The fat/water decomposition with CSE-MRI exploits the differences in the resonance frequency spectrum of water and fat protons, which is 2.4 parts per million (ppm) or −217 Hertz (Hz) for the main fat peak at 1.5 T. This means that the signals from the water and fat components are IP every 4.6 milliseconds (ms) and OP every
k × 2.3 ms (where
k is an integer). The fat and water components of the signal can be separated by obtaining a sequence of images with different TEs. The signal (
S) from an individual voxel at echo time
t can be described with the Equation (1)
where
w and
f are the magnitudes of water and fat components;
rm and
fF,m are amplitudes and frequencies of the fat spectrum;
fB is the frequency offset of the
B0 magnetic field; and
is the transverse relaxivity of the gradient echo acquisition. The complex methods use both magnitude and phase information from the acquisitions to estimate unknown parameters (
w,
f,
fB, and
) of Equation (1). If
fB and
are known, the model has linear solutions for
w and
f, which can be efficiently calculated with least-squares fitting for multiple TE by transcribing the original Equation (1) on a matrix format [
15,
16]. In the presence of Gaussian noise (
ε), this can be modeled as
where
b is a complex vector of datapoints at
n time points (
t);
W =
diag(
eiψt) is an
n ×
n complex matrix defining time-dependent change in the complex field map
ψ =
B0 +
i;
A is a complex matrix expressing time-dependent shift in water and fat signal intensities
A = [
w f];
x is the transpose of matrix
A,
x =
AT; and
φ is the original phase. The standard approach to solving the complex non-linear fitting problem is to initialize the
B0 field map estimation with
w,
f, and
values close to the expected physiological range and to assume spatial smoothness. Once the
fB is found, the minimization of
can be completed, and
w and
f can be determined, which permits the calculation of the PDFF in the voxel
The challenge is that the estimation of the
B0 offset on a voxel level is ambiguous as the residual from the non-linear fitting function is non-convex and, due to its periodic oscillation, is subject to phase wrap**, which results in multiple local minima [
17]. Erroneous estimation of the field map leads to incorrect calculation of PDFF and
, which is noticeable as phase unwrap** errors on reconstructed images. Multiple methods have been devised to overcome local ambiguities and re-establish the inherent spatial smoothness of the field map. The variable-projection method (VARPRO) provides a global optimal solution for the non-linear least-squares optimization problem with moderate computational requirements [
15]. Region-growing algorithms (RG) rely on the similarity of the
B0 field in neighboring pixels but fail to give reliable solutions in low signal-to-noise (SNR) areas and at disjoint regions of the image [
18,
19]. Smoothness constraints can also be imposed on the off-resonance field map by minimizing its energy function or using low-pass spatial filtering [
15,
20]. Iterative regularization of the field map can further minimize irregularities when combined with graph cut (GC) segmentation [
21]. Direct phase estimation optimizes phase increments and relies on uniformly spaced TEs [
22]. Uniformly spaced echoes constrain the maximum phase offset to less than one whole period. In this case, finding the global optimum becomes a binary problem, which can be solved with quadratic pseudo-Boolean optimization [
17]. Another essential concept is to use multi-scale coarse-to-fine reconstruction techniques. The field maps calculated at coarser resolutions are used to initialize field map estimates on higher-resolution images, which can help resolve spatial inconsistencies [
23,
24,
25]. The confounding effect from the shorter T1 relaxation of fat compared to water can be prevented by carefully selecting the acquisition parameters. The flip angle (FA) must be kept low relative to the repetition time (TR) to minimize the T1 weighing of the images [
26].
A simpler solution for the fat/water decomposition problem uses only the signal magnitude. The advantage of the magnitude-only method is that it does not require estimation of the field map. For signal magnitude (
M), Equation (1) becomes
which has only three unknown parameters:
w,
f, and
. However, without the phase information, the solution for Equation (4) is ambiguous as the
M remains the same irrespective of whether
w or
f is the larger. Therefore, it cannot be determined if fat or water is the dominant component, limiting the diagnostic range to 0–50% PDFF [
27,
28].
A handful of algorithms have been proposed to resolve the magnitude-only decomposition’s ambiguity by utilizing the fat spectrum’s complexity [
28,
29]. In the two-point search method, the optimization of parameters in Equation (4) is completed twice, assuming different initial values. The sum of square residuals (RSS) is calculated separately for fat-dominant and water-dominant initializations, and the final solution at each voxel is chosen to be the parameter set with the lowest RSS. This approach allows for the map** of PDFF from 0% to 100% based on image magnitude.
In the presence of noise, the optimal solution is the one with the highest log-likelihood [
29]. The noise on magnitude images has a Rician distribution, skewed with a non-zero mean at low SNR areas. The bias from Rician noise can cause an overestimation of PDFF at low fat concentrations [
30]. To reduce bias from the Rician noise distribution, a new magnitude-only fat fraction and
estimation with the Rician noise modeling (MAGORINO) algorithm has been proposed [
31]. This method combines Rician-noise-based likelihood optimization with a two-point search to estimate PDFF in the 0–100% range.
3. Results
3.1. Measurements in Phantoms
All four algorithms correctly reconstructed the PDFF and
maps from the phantom datasets. Due to the expected ambiguity of conventional MAG reconstruction, we found that the estimates of fat and water percentages were inverted in the last vial of the publicly available phantom dataset containing pure lipid (100% PDFF). To be able to calculate correlation and regression for the entire 0% to 100% range using MAG, we corrected the fat fraction estimate for the last vial by subtracting the predicted PDFF from 100%, which reversed the fat/water ratio. The correlation between measured and true PDFF was nearly perfect with all methods (
Table 2). The regression analysis also showed a highly significant linear relationship between expected and measured PDFF, with slopes almost equal to one intercept close to zero (
Figure 3).
The Bland-Altman analysis showed good agreement between measured and expected PDFF using all reconstruction methods (
Figure 4). The bias was lowest with MAG-R (0.006%, CI: −2.41–2.42%), followed by QPBO (0.08%, CI: −3.95–4.11%), MAG (−0.22% CI: −3.55–3.12%), and GC (−0.59% CI: −5.03–3.85%) (
Figure 2). The RMSE was also smallest with MAG-R (1.2%) and slightly higher with MAG (1.67%), QPBO (2.0%), and GC (2.28%).
To determine the methods’ effect on the estimated PDFF, we performed multi-way ANCOVA analyses with true PDFF as the covariate variable. The test determined that the type of reconstruction method had no significant effect (F = 1.03, p = 0.39). All the pairwise comparisons between the methods were also non-significant.
Only including tubes with very low levels (<5%) or zero PDFF, ANCOVA indicated a significant difference across the estimates (F = 3.9, p = 0.04). In the pairwise comparisons, the MAG estimates (mean ± SD = 1.82% ± 1.79%) were lower than the GC (2.94% ± 1.22%, p = 0.046) or QPBO (2.74% ± 1.36%, p = 0.034) estimates, but the difference was non-significant after adjustment for multiple testing.
The correlation between the first seven tubes of the phantoms, which had similar true fat fractions (in-house 0–22.32% and public 0–20.9%), was excellent: QPBO (r = 0.997, CI: 0.981–0.999, p < 0.001), MAG-R (r = 0.994, CI: 0.956–0.999, p < 0.001), GC (r = 0.991, CI: 0.935–0.999, p < 0.001), and MAG (r = 0.987, CI: 0.914–0.998, p < 0.001).
3.2. In Vivo Measurement of PDFF
We retrospectively collected a representative patient cohort with non-significant (S0), mild (S1), moderate (S2), and severe (S3) liver steatosis, which included five patients from each steatosis grade. We reconstructed PDFF and
maps of the patients’ livers using all four methods and measured the level of fatty infiltration in three identical ROIs. We calculated the correlation matrix between the algorithms. We determined that GC and QPBO showed the strongest correlation (0.999, CI: 0.999–1) and that the correlation was weakest between MAG and MAG-R (0.934, CI: 0.838–0.974,
p < 0.001) (
Figure 5). The overall agreement of the methods was very good (ICC = 0.891, CI: 0.799–0.995,
p < 0.001).
The ANOVA analysis demonstrated a significant bias from using different methods (F = 7.45,
p = 0.008). The post hoc analysis revealed that PDFF was significantly lower with MAG (−2.30% ± 6.11%,
p = 0.005) than with MAG-R. Additional pairwise comparisons did not show significant differences across the algorithms (
Figure 5).
We also observed typical map** errors inherent to the different reconstruction algorithms. The GC method was prone to producing phase unwrap** errors, which were most likely to occur on the first and last slices of the stack and in areas close to the edge of the FOV. These were most likely caused by erroneous regularization of the
B0 field inhomogeneity (
Figure 6). The MAG-R reconstruction produced a speckled pattern of inverted fat/water ratios, which were most prominent in cases with severe steatosis. This algorithm resolves the fat/water ambiguity of the magnitude-based reconstruction by employing a two-point search method and selecting the solution with the maximum likelihood. However, at fat concentrations close to 50%, the difference in the likelihood of the two local optima is less; thus, the reconstruction gets more susceptible to image noise inverting the true fat/water ratio. The QPBO method did not result in such phase unwrap** errors. The proportion of slices containing phase unwrap** errors was significantly higher with MAG-R (14/20, 70%) compared to GC (70/180, 39%, odds ratio = 0.27, CI: 0.08–0.8,
p < 0.009) and QPBO (0/180, 0%,
p < 0.001). Finally, the dynamic range of the MAG reconstruction is limited to 0–50%, as both water and fat-dominant pixels can have the same signal magnitude.
4. Discussion
Our study collected four open-source software tools for fat/water decomposition and PDFF estimation available in a public web depository. These software tools represent common concepts of mathematical modeling utilized for the complex chemical component decomposition of MRI datasets. We provide a comparative evaluation of these methods by testing their accuracy in the phantom and a representative patient cohort with varying levels of hepatic steatosis.
Our results demonstrate that although a solid theoretical framework supports all methods and generates highly accurate estimates of PDFF in an experimental setting in vitro, there is a significant difference in their accuracy and efficiency when applied to data from clinical MRI scans. We have found that complex-based reconstruction techniques are generally superior to magnitude-based techniques and keep a good balance between accuracy and computational efficiency in both in vivo and in vitro settings.
We used two independent phantom datasets for the models’ in vitro testing. Our in-house built phantom was similar to a previously tested design and contained fat/water mixtures with low- and medium-level fat fractions. The advantage of this design is that it is more similar to the range of PDFF typically detected in patients with hepatic steatosis, thus, providing a realistic assessment of the methods’ accuracy in early-stage steatosis, a critical diagnostic paradigm [
11,
14]. Another benefit was that the resonance spectrum of the phantom’s components was well documented, allowing for the correct formulation of the calculations [
32]. The other phantom dataset had been previously published and included samples from the entire (from 0% to 100% PDFF) fat fraction range [
33]. Although both phantoms were scanned at 1.5 T, there was a difference in the scanner type and acquisition protocols. All methods showed an almost perfect correlation (0.999) and a nearly linear relationship with PDFF in both phantoms. The inter-rater agreement between measured and expected PDFF was also universally high (0.995–0.999). The estimates of the methods in the two different types of phantoms also showed an excellent correlation, indicating that the scanner type did not affect the PDFF estimates. The overall bias was smallest, with MAG-R (0.01%), followed closely by QPBO (0.08%), and it was well below 1% with all methods. Our results align with previous studies, which detected a similarly strong correlation and low bias between estimated and true PDFF in fat–water phantoms using CSE-MRI [
32,
33,
38]. Moreover, open-source algorithms demonstrated accuracy comparable to commercially available CSE-MRI methods developed by major vendors [
7].
The excellent performance of MAG-R in vitro reflects the robustness of the mathematical model. MAG-R represents a new generation of magnitude-based algorithms which exploit the spectral complexity of fat to perform a two-point search with likelihood analysis and to resolve the fat/water ambiguity intrinsic to conventional magnitude-based methods [
28,
29,
31]. The magnitude-only fitting does not require estimating the field map and is insensitive to phase errors. In addition, the Rician noise modeling lowers the bias and increases the chance of finding the optimal PDFF estimate even in low SNR areas [
31]. Interestingly, the estimated PDFF in the saline-only (mean GC: 2.05%, QPBO: 1.58%, MAG: 0.28%, and MAG-R: 0.74%) and low PDFF tubes was slightly higher with the two complex-based algorithms compared to magnitude-based methods in both phantoms. The same bias has been reported in previous phantom experiments, and it also resulted in the false detection of fat in the spleen of human subjects [
10,
39]. A potential explanation is that phase errors caused by eddy currents significantly alter the first echo and introduce bias into the complex-based quantification [
10].
We retrospectively collected a group of twenty patients with a balanced representation of the severity grades of hepatic steatosis. The cause of liver disease was MASLD in most patients (85%). Many of the patients were obese (mean BMI: 27.5 kg/m2) and hardly fit into the scanner’s gantry, which could amplify the inhomogeneity of the magnetic field. To prevent artifacts from the breathing and involuntary movement of the patients, we decided to use a 2D SPGR sequence set acquisition parameters to keep acquisition time short, typically less than 30 s. The correlation was excellent across the algorithms and nearly perfect between the complex-based methods (0.934–0.999). The overall inter-rater agreement was also very good (0.891). The ANOVA showed that PDFF measured with MAG was significantly lower than MAG-R. To a small extent, this can be attributed to the spectral fat model used by MAG. However, the predominant factor is the widespread presence of phase unwrap** errors with MAG-R.
Field inhomogeneity artifacts were common and could be observed on 70% of the MAG-R and 39% of the GC images. The phase unwrap** errors produced by MAG-R typically had a speckled pattern. They were more frequent in high-grade steatosis or focal depositions, indicating that likelihood-based optimization is prone to errors when the PDFF is close to the fat/water equilibrium. Another significant drawback of MAG-R is the excessively long computation time, which can be attributed to pixel-by-pixel curve fitting and multiple initializations of the reconstruction.
The phase unwrap** errors caused by GC mainly affected areas at the perimeter of the images or slices at the end of the stack and were more frequent in obese patients examined with a large FOV. These phase unwrap** errors result from significant inhomogeneity in the main magnetic field. A proposed solution for this problem is to apply a magnitude-based reconstruction to remove phase errors from an initial complex-based fat/water decomposition [
40,
41].
The phase unwrap** errors were absent in the QPBO reconstructions. Meanwhile, the QPBO method proved highly accurate in phantoms, and its estimates showed a high consistency with GC. The QBPO algorithm combines direct phase estimation and constraints on physical parameters with a multi-scale calculation of the parameter maps while using efficient mathematical formulation [
19]. The computation time with the QPBO was also lower than with other methods except the MAG.
Our study has several limitations: first, this was a single-center study, and patient scans were collected with one scanner; therefore, the effect of scanner type on liver PDFF estimates could not be comprehensively evaluated. Second, the components of fat/water emulsions in the two phantoms had different off-resonance spectra compared to the spectral fat model for human subjects; thus, the results in phantoms only provide a good approximation of the accuracy expected in vivo. Third, the algorithms run in MATLAB and Phyton programming environments, and multiple data processing steps must be completed to analyze MRI datasets. Fourth, we only tested the decomposition methods on scans obtained at 1.5 T field strength. The artifacts, due to field inhomogeneity and eddy currents, may be more pronounced at 3 T and higher field strengths, which may lower the accuracy of some of the methods. Fifth, the off-resonance spectrum of lipids used for the phantom experiments slightly differed from the spectrum of lipids in the human liver. We used modified spectra based on prior spectroscopy results to compensate for the bias in calculating PDFF estimates in fat–water phantoms. Sixth, the methods tested in this study are not certified for clinical use and alone cannot be used for diagnostic purposes. Seventh, liver biopsy was unavailable from all human subjects. Thus, it was not used as a reference.