Implement precision timing recovery functions:
- _refine_symbol_boundary(): Scans at 1/32-symbol resolution to find
exact chirp boundary by maximizing dechirped SNR
- _find_sfd_boundary(): FFT-based correlation with downchirp template
to find exact data start position
Bug fixes:
- Fix _is_downchirp() false positives by comparing both correlations
- Fix _estimate_cfo() to return values in [0, N) range
The improved sync_from_samples() now produces bins identical to the
reference lora_decode_gpu decoder.