Documentation update - kfr - Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)

commit 2234a30a30f101f81a26ad4f78006555a5dda484
parent f6c2b1c73cac4fb9cdd40b9ee1f353c2394167f6
Author: d.levin256@gmail.com <d.levin256@gmail.com>
Date:   Wed, 14 Feb 2024 08:48:14 +0000

Documentation update

Diffstat:
M CHANGELOG.md  | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M README.md  | 4 +++-
M docs/docs/dft.md  | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++--------------
M docs/docs/dft2.md  | 30 +++++++++++++++---------------
M docs/docs/whatsnew6.md  | 2 ++

5 files changed, 134 insertions(+), 30 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,69 @@
 # Changelog
 
+## 6.0.2
+
+#### Added
+
+- Windows arm64 support
+- Emscripten (wasm/wasm64) support
+
+#### Changed
+
+- `complex_size` now takes dft_pack_format as parameter
+
+## 6.0.0
+
+- DFT performance has been improved up to 40% (backported to KFR 5.2.0 branch)
+- C API for non x86 architectures
+- DSP refactoring with easier initialization
+- Multiarchitecture for resampling, FIR and IIR filters
+- `matrix_transpose`: optimized matrix transpose (square/non-square, inplace/out-of-place, real/complex, scalar/vectors)
+- CMake config file generation (`find_package(KFR CONFIG)` support, see [installation](installation.md))
+- `.npy` format support (reading/writing, v1/v2, c/fortran order, real/complex, bigendian/littleendian)
+- Multidimensional DFT: real/complex
+- `inline_vector`
+
+#### Other changes
+
+- CMake minimum version is 3.12
+- Multidimensional reference DFT
+- Easier cross compilation to ARM64 on x86_64 macOS
+- Automated tests using GitHub Actions (previously Azure Pipelines)
+- GCC 7 and 8: emulate missing avx-512 instrinsics
+- `read_group` and `write_group`
+- [❗breaking change] `reshape_may_copy` and `flatten_may_copy` in `tensor<>` allows copying by default
+- `shape<>::transpose` function
+- `tensor<>::transpose` function
+- `convert_endianess`
+- DFT, DSP and IO sources have been moved to `src/` directory
+- Multiarchitecture is enabled by default
+- `KFR_DFT_NO_NPo2` has been removed (assumed always enabled)
+- Tests refactoring
+- Some tests moved to `tests/internal/`
+- [❗breaking change] Scalars are now passed by value in expressions (this fixes dangling references in some cases)
+- Expression functions should return `expression_make_function` instead of `expression_function`
+- `KFR_WITH_CLANG`
+- `KFR_VERSION` CMake variable
+- Functions to get module versions (`library_version_dft`, `library_version_dsp` etc)
+- Exceptions are no longer enforced in MSVC
+- `kfr::complex` removed (use `std::complex` instead). `KFR_STD_COMPLEX` cmake variable removed too
+- `strides_for_shape` for fortran order
+- AARCH and ARM emulation refactoring (dynamic libraries are now supported)
+- `call_with_temp`
+- `maximum_dims` is now 16 (was 8)
+- `to_fmt`/`from_fmt` supports inplace
+- `shape` refactoring: `rotate_left`, `rotate_right`, `remove_back`, `remove_front`
+- temp argument can be `nullptr` for DFT (temporary buffer will be allocated on stack or heap)
+- `dft_plan` and similar classes have now default and move constructors
+- `-DCMAKE_POSITION_INDEPENDENT_CODE=ON` is required for building C API
+- `ci/run.sh` can now build in a directory outside source tree
+- [❗breaking change]`graphics/color.hpp` and `graphics/geometry.hpp` have been removed
+- Simpler `CMT_CVAL` macro
+- `/Zc:lambda` is now required for building KFR in MSVC
+- `println` for `string_view`
+- MSVC internal compiler error fixed
+- Complex vector operators fixed
+
 ## 5.2.0
 
 2023-11-27
diff --git a/README.md b/README.md
@@ -79,6 +79,8 @@ _Note_: Building the DFT module currently requires Clang due to internal compile
 * Random number generation
 * Template expressions (See examples)
 * Ring (Circular) buffer
+* :star2: Windows arm64 support
+* :star2: Emscripten (wasm/wasm64) support
 
 ### Math
 
@@ -102,7 +104,7 @@ _Note_: Building the DFT module currently requires Clang due to internal compile
 
 ### Multiarchitecture
 
-Multiarchitecture mode enables building algorithms for multiple architectures with runtime dispatch to detect the CPU of the target machine and select the best code path.
+The multiarchitecture mode enables building algorithms for multiple architectures with runtime dispatch to detect the CPU of the target machine and select the best code path
 
 * :star2: Multiarchitecture for DFT, resampling, FIR and IIR filters.
 
diff --git a/docs/docs/dft.md b/docs/docs/dft.md
@@ -1,10 +1,8 @@
 # How to apply Fast Fourier Transform
 
-This article shows how to use Fast Fourier Transform and how to apply forward and inverse FFT on complex and real data using the KFR framework.
+This article demonstrates how to use the Fast Fourier Transform and apply both forward and inverse FFT on complex and real data using the KFR framework.
 
-KFR DFT supports all sizes, KFR automatically chooses the best algorithm to perform DFT for the given size.
-
-For power of 2 sizes, it uses Fast Fourier Transform.
+KFR DFT supports all sizes, and KFR automatically chooses the best algorithm to perform DFT for the given size.
 
 ## Quick example
 
@@ -30,13 +28,13 @@ Scaling is not performed by KFR. To get output in the same scale as input, divid
 
 ### Real input, complex output
 
-Frequency data are stored in [CCS or Perm format](dft_format.md).
+Frequency data is stored in [CCS or Perm format](dft_format.md).
 
 The size of the output data is equal to `size/2+1` for CCS and `size/2` for Perm format.
 
 For the inverse FFT, you have to prepare frequency data in the CCS or Perm format as well.
 
-For CCS format, you must ensure that freq[0] and freq[N/2] are real numbers to get correct real result.
+For CCS format, you must ensure that freq[0] and freq[N/2] are real numbers to get the correct real result.
 
 ```c++
 data = irealdft(freq);
@@ -45,26 +43,26 @@ data = irealdft(freq) / data.size();
 ```
 
 !!! note
-    Real to complex and complex to real transforms are only available for even sizes.
-    This is caused by the way real DFT is calculated. Pair of real values are interpreted as complex for high performance, so there is limitation for real DFT size, it must be even.
+    Real-to-complex and complex-to-real transforms are only available for even sizes.
+    This is caused by the way real DFT is calculated. A pair of real values are interpreted as complex for high performance, so there is a limitation for real DFT size; it must be even.
     Use complex transform and data conversion instead.
     ```c++
     const dft_plan<double> dft(N); // N is odd
     univector<complex<double>> output(N);
     univector<double> input;
-    univector<u8> temp(dft.temp_size);
+    univector<u8> temp(dft.temp_size); // temporary buffer
     dft.execute(output, univector<complex<double>>(input), temp);
     ```
 
 ## Creating FFT plan
 
-Implementation of FFT requires twiddle coefficients to be prepared before actual processing occurs. If FFT will be performed more than once, then it makes sense to store the coefficients and reuse it every time.
+The implementation of FFT requires twiddle coefficients to be prepared before actual processing occurs. If FFT will be performed more than once, then it makes sense to keep dft plan and reuse it every time.
 
 ## FFT Plan caching
 
 If you are using `dft`, `idft`, `realdft` or `irealdft` functions, all plans will be kept in memory, so the next call to these functions will reuse the saved data.
 
-You can manually get the plan from the cache (or create a new if it doesn’t exist in the cache):
+You can manually get plan from the cache (or create a new if it doesn’t exist in the cache):
 
 ```c++
 dft_plan_ptr<T> dft = dft_cache::instance().get(ctype<T>, size);
@@ -123,7 +121,7 @@ dft_plan<double> plan(1024);
 univector<complex<double>, 1024> in;
 univector<complex<double>, 1024> out;
 // here fill `in` array with our data (samples)
-univector<u8> temp(plan.temp_size);
+univector<u8> temp(plan.temp_size); // temporary buffer
 plan.execute(out, in, temp, false); // direct FFT
 // `out` now contains frequencies which have to be processed
 plan.execute(in, out, temp, true);  // inverse FFT
@@ -133,17 +131,55 @@ plan.execute(in, out, temp, true);  // inverse FFT
 plan.execute(out, in, temp, false); // direct FFT
 ```
 
-### Real to complex and complex to real transform
+### Real-to-complex and complex-to-real transform
 
 ```c++
 dft_plan_real<double> plan(1024); // dft_plan_real for real transform
 univector<double, 1024> in;
 univector<complex<double>, 1024> out;
 // here fill `in` array with our data (samples)
-univector<u8> temp(plan.temp_size);
+univector<u8> temp(plan.temp_size); // temporary buffer
 plan.execute(out, in, temp); // direct FFT
 // `out` now contains frequencies which have to be processed
 plan.execute(in, out, temp); // inverse FFT
 // `in` now contains processed data (samples)
 ```
 
+## Multidimensional DFT
+
+The multidimensional DFT can be performed the same way as the 1D transform but using the `dft_plan_md` class.
+
+```c++
+dft_plan_md<double> plan(shape{ 64, 256 });
+const std::complex<double>* in = ...; // the number of samples for in and out must be 
+                                      // the product of sizes (here is 16384)
+std::complex<double>* out = ...;
+// here fill `in` array with our data (samples)
+univector<u8> temp(plan.temp_size); // temporary buffer
+plan.execute(out, in, temp, false); // direct FFT
+// `out` now contains frequencies which have to be processed
+plan.execute(in, out, temp, true);  // inverse FFT
+// `in` now contains processed data (samples)
+...
+// process new data
+plan.execute(out, in, temp, false); // direct FFT
+```
+
+`dft_plan_md` class also supports passing [tensors](basics.md#tensor-multidimensional-array) as `in` and `out`.
+
+## Multidimensional Real DFT
+
+For multidimensional real-to-complex transforms, the DFT performs a real-to-complex transform for the last axis, followed by a number of complex-to-complex transforms for the other axes. The size of the last axis must be even.
+
+Multidimensional DFT is always performed in [CCS format](dft_format.md).
+
+Thus, given the size of the input as $(S_0, S_1, ..., S_{n-1})$, the total number of input samples equals $S_0 \cdot S_1 \cdot ... \cdot S_{n-1}$. The output size will be $(S_0, S_1, ..., \dfrac{S_{n-1}}{2}+1)$.
+
+Use `plan.complex_size(real_size)` to get the exact size of the complex output for a given real input size.
+
+The KFR FFT implementation supports in-place processing for Multidimensional Real DFT.
+
+!!! note
+    For performance reasons, it is advantageous to use a slightly larger output buffer for the complex-to-real transform.
+    `plan.real_out_size()` returns the size of the buffer (in real numbers) required for fast processing.
+    Pass `true` as a second argument to the `dft_plan_md_real` constructor to indicate that the output buffer has enough space for fast processing.
diff --git a/docs/docs/dft2.md b/docs/docs/dft2.md
@@ -1,6 +1,6 @@
 # More about FFT/DFT
 
-Fast Fourier Transform (FFT) can be used to perform:
+The Fast Fourier Transform (FFT) can be used to perform:
 
 * [Convolution (including convolution reverberation)](convolution.md)
 * Cross-correlation and auto-correlation
@@ -11,23 +11,23 @@ Fast Fourier Transform (FFT) can be used to perform:
 * Wavelet transform
 * and many other algorithms
 
-Often FFT is the most efficient way to perform each of these algorithms.
-
+Often, FFT is the most efficient way to perform each of these algorithms.
 
 ## About KFR DFT implementation
 
-KFR implementation of the FFT:
+The KFR implementation of the FFT:
 
-* is fully optimized for X86, X86-64, ARM and AARCH64 processors
-* uses vector intrinsics (if available for cpu)
-* supports both single- and double precision
+* is fully optimized for X86, X86-64, ARM and AARCH64 (ARM64) processors
+* uses vector intrinsics (if available for the cpu)
+* supports both single- and double-precision
+* supports in-place processing
+* supports multidimensional FFT
 * can cache internal data between calls to speed up plan creation
-* can do forward and inverse FFT without a need to create two plans
-* can be used for complex-to-complex, real-to-complex and complex-to-real 1D transforms
-* doesn’t require measure FFT performance at runtime and to find an optimal configuration
-* has special implementations for FFT sizes up to 256
+* can do forward and inverse FFT without the need to create two plans
+* can be used for complex-to-complex, real-to-complex and complex-to-real transforms
+* doesn’t require measuring FFT performance at runtime to find an optimal configuration
+* has special implementations for FFT sizes up to 1024
 * has no external dependencies
-* is thread-safe, no global data
-* is written in modern C++14
-* is open source (GPL v2+ license)
-
+* is thread-safe, with no global data
+* is written in modern C++17
+* is open source (GPL v2+ license, commercial license is availalble for closed source projects, see https://kfr.dev )
diff --git a/docs/docs/whatsnew6.md b/docs/docs/whatsnew6.md
@@ -9,6 +9,8 @@
 * `.npy` format support (reading/writing, v1/v2, c/fortran order, real/complex, bigendian/littleendian)
 * Multidimensional DFT: real/complex
 * `inline_vector`
+* Windows arm64 support
+* Emscripten (wasm/wasm64) support
 
 ### Other changes

	kfr Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
	Log \| Files \| Refs \| README

M	CHANGELOG.md	\|	64	++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	README.md	\|	4	+++-
M	docs/docs/dft.md	\|	64	++++++++++++++++++++++++++++++++++++++++++++++++++--------------
M	docs/docs/dft2.md	\|	30	+++++++++++++++---------------
M	docs/docs/whatsnew6.md	\|	2	++