Skip to content

Commit

Permalink
Update documentation and make method names more expressive.
Browse files Browse the repository at this point in the history
This ensures any ambiguity from the method names would not cause
potential issues among maintainers.
  • Loading branch information
Alexhuszagh committed Sep 15, 2024
1 parent 222009e commit 89f6b67
Show file tree
Hide file tree
Showing 11 changed files with 50 additions and 185 deletions.
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ Lexical is highly customizable, and contains numerous other optional features:
- **f16**:   Add support for numeric conversions to-and-from 16-bit floats.
<blockquote>Adds <code>f16</code>, a half-precision IEEE-754 floating-point type, and <code>bf16</code>, the Brain Float 16 type, and numeric conversions to-and-from these floats. Note that since these are storage formats, and therefore do not have native arithmetic operations, all conversions are done using an intermediate <code>f32</code>.</blockquote>

To ensure the safety when bounds checking is disabled, we extensively fuzz the all numeric conversion routines. See the [Safety](#safety) section below for more information.
To ensure memory safety, we extensively fuzz the all numeric conversion routines. See the [Safety](#safety) section below for more information.

Lexical also places a heavy focus on code bloat: with algorithms both optimized for performance and size. By default, this focuses on performance, however, using the `compact` feature, you can also opt-in to reduced code size at the cost of performance. The compact algorithms minimize the use of pre-computed tables and other optimizations at a major cost to performance.

Expand Down Expand Up @@ -304,9 +304,7 @@ A benchmark on writing floats generated via a random-number generator and parsed

## Safety

Due to the use of memory unsafe code in the integer and float writers, we extensively fuzz our float writers and parsers. The fuzz harnesses may be found under [fuzz](https://github.com/Alexhuszagh/rust-lexical/tree/main/fuzz), and are run continuously. So far, we've parsed and written over 72 billion floats.

Due to the simple logic of the integer writers, and the lack of memory safety in the integer parsers, we minimally fuzz both, and test it with edge-cases, which has shown no memory safety issues to date.
Due to the use of memory unsafe code in the library, we extensively fuzz our float writers and parsers. The fuzz harnesses may be found under [fuzz](https://github.com/Alexhuszagh/rust-lexical/tree/main/fuzz), and are run continuously. So far, we've parsed and written over 72 billion floats.

## Platform Support

Expand Down
18 changes: 8 additions & 10 deletions lexical-parse-float/src/parse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,7 @@ pub fn parse_partial_number<'a, const FORMAT: u128>(
let decimal_point = options.decimal_point();
let exponent_character = options.exponent();
debug_assert!(format.is_valid());
debug_assert!(!byte.is_done());
debug_assert!(!byte.is_buffer_empty());
let bits_per_digit = shared::log2(format.mantissa_radix()) as i64;
let bits_per_base = shared::log2(format.exponent_base()) as i64;

Expand All @@ -503,7 +503,7 @@ pub fn parse_partial_number<'a, const FORMAT: u128>(
// We must have a format like `0x`, `0d`, `0o`. Note:
is_prefix = true;
if iter.read_if_value(base_prefix, format.case_sensitive_base_prefix()).is_some()
&& iter.is_done()
&& iter.is_buffer_empty()
{
return Err(Error::Empty(iter.cursor()));
}
Expand Down Expand Up @@ -553,7 +553,6 @@ pub fn parse_partial_number<'a, const FORMAT: u128>(
let mut implicit_exponent: i64;
let int_end = n_digits as i64;
let mut fraction_digits = None;
// TODO: Change this to something different from read_if_value but same idea
if byte.first_is_cased(decimal_point) {
// SAFETY: byte cannot be empty due to first_is
unsafe { byte.step_unchecked() };
Expand Down Expand Up @@ -596,23 +595,23 @@ pub fn parse_partial_number<'a, const FORMAT: u128>(
let is_exponent = byte
.first_is(exponent_character, format.case_sensitive_exponent() && cfg!(feature = "format"));
if is_exponent {
// SAFETY: byte cannot be empty due to `first_is` from `is_exponent`.`
unsafe { byte.step_unchecked() };

// Check float format syntax checks.
#[cfg(feature = "format")]
{
// NOTE: We've overstepped for the safety invariant before.
if format.no_exponent_notation() {
return Err(Error::InvalidExponent(byte.cursor()));
return Err(Error::InvalidExponent(byte.cursor() - 1));
}
// Check if we have no fraction but we required exponent notation.
if format.no_exponent_without_fraction() && fraction_digits.is_none() {
return Err(Error::ExponentWithoutFraction(byte.cursor()));
return Err(Error::ExponentWithoutFraction(byte.cursor() - 1));
}
}

// SAFETY: byte cannot be empty due to `first_is` from `is_exponent`.`
// TODO: Fix: we need a read_if for bytes themselves?
unsafe { byte.step_unchecked() };
let is_negative = parse_exponent_sign(&mut byte)?;

let before = byte.current_count();
parse_digits::<_, _, FORMAT>(byte.exponent_iter(), |digit| {
if explicit_exponent < 0x10000000 {
Expand Down Expand Up @@ -679,7 +678,6 @@ pub fn parse_partial_number<'a, const FORMAT: u128>(
n_digits = n_digits.saturating_sub(1);
}
if zeros.first_is_cased(decimal_point) {
// TODO: Fix with some read_if like logic
// SAFETY: safe since zeros cannot be empty due to first_is
unsafe { zeros.step_unchecked() };
}
Expand Down
2 changes: 1 addition & 1 deletion lexical-parse-float/src/slow.rs
Original file line number Diff line number Diff line change
Expand Up @@ -685,7 +685,7 @@ pub fn compare_bytes<const FORMAT: u128>(
let mut integer = number.integer.bytes::<{ FORMAT }>();
let mut integer_iter = integer.integer_iter();
integer_iter.skip_zeros();
if integer_iter.is_done() {
if integer_iter.is_buffer_empty() {
// Cannot be empty, since we must have at least **some** significant digits.
let mut fraction = number.fraction.unwrap().bytes::<{ FORMAT }>();
let mut fraction_iter = fraction.fraction_iter();
Expand Down
14 changes: 6 additions & 8 deletions lexical-parse-integer/src/algorithm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -116,19 +116,17 @@ macro_rules! fmt_invalid_digit {
// NOTE: If we're using the `take_n` optimization where it can't
// be the end, then the iterator cannot be done. So, in that case,
// we need to end.
if is_suffix && $is_end && $iter.is_done() {
if is_suffix && $is_end && $iter.is_buffer_empty() {
// Break out of the loop, we've finished parsing.
break;
} else if is_suffix {
} else if !$iter.is_buffer_empty() {
// Haven't finished parsing, so we're going to call
// invalid_digit!. Need to ensure we include the
// base suffix in that.

// TODO: This isn't localized and needs to be fixed

// SAFETY: safe since the iterator is not empty, as checked
// in `$iter.is_done()` and `$is_end`. If `$is_end` is false,
// then we have more elements in our
// in `$iter.is_buffer_empty()`. Adding in the check hopefully
// will be elided since it's a known constant.
unsafe { $iter.step_unchecked() };
}
}
Expand Down Expand Up @@ -580,7 +578,7 @@ macro_rules! algorithm {

let is_negative = parse_sign::<T, FORMAT>(&mut byte)?;
let mut iter = byte.integer_iter();
if iter.is_done() {
if iter.is_buffer_empty() {
return into_error!(Empty, iter.cursor());
}

Expand All @@ -603,7 +601,7 @@ macro_rules! algorithm {
// We must have a format like `0x`, `0d`, `0o`. Note:
if iter.read_if_value(base_prefix, format.case_sensitive_base_prefix()).is_some() {
is_prefix = true;
if iter.is_done() {
if iter.is_buffer_empty() {
return into_error!(Empty, iter.cursor());
} else {
start_index += 1;
Expand Down
73 changes: 18 additions & 55 deletions lexical-util/src/iterator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,8 @@ pub use crate::skip::{AsBytes, Bytes};
/// methods for reading data and accessing underlying data. The readers
/// can either be contiguous or non-contiguous, although performance and
/// some API methods may not be available for both.
///
/// # Safety
///
/// // TODO: FIX CORRECTNESS DOCUMENTATION
/// This trait is effectively safe but the implementor must guarantee that
/// `is_empty` is implemented correctly. For most implementations, this can
/// be `self.as_slice().is_empty()`, where `as_slice` is implemented as
/// `&self.bytes[self.index..]`.
#[cfg(feature = "parse")]
pub unsafe trait Iter<'a> {
pub trait Iter<'a> {
/// Determine if the buffer is contiguous in memory.
const IS_CONTIGUOUS: bool;

Expand Down Expand Up @@ -59,6 +51,17 @@ pub unsafe trait Iter<'a> {
self.get_buffer().len()
}

/// Get if no bytes are available in the buffer.
///
/// This operators on the underlying buffer: that is,
/// it returns if [as_slice] would return an empty slice.
///
/// [as_slice]: Iter::as_slice
#[inline(always)]
fn is_buffer_empty(&self) -> bool {
self.cursor() >= self.get_buffer().len()
}

/// Get the current index of the iterator in the slice.
fn cursor(&self) -> usize;

Expand Down Expand Up @@ -86,42 +89,22 @@ pub unsafe trait Iter<'a> {
/// non-contiguous iterators this can be smaller.
fn current_count(&self) -> usize;

// TODO: DOCUMENT
// PROPERTIES

// TODO: Fix this naming convention
/// Get if no bytes are available in the buffer.
///
/// This operators on the underlying buffer: that is,
/// it returns if [as_slice] would return an empty slice.
///
/// [as_slice]: Iter::as_slice
#[inline(always)]
fn is_empty(&self) -> bool {
self.as_slice().is_empty()
}

/// Determine if the buffer is contiguous.
#[inline(always)]
fn is_contiguous(&self) -> bool {
Self::IS_CONTIGUOUS
}

// TODO: Ensure we get this **RIGHT**

/// Get the next value available without consuming it.
///
/// This does **NOT** skip digits, and directly fetches the item
/// from the underlying buffer.
///
/// # Safety
///
/// An implementor must implement `is_empty` correctly in
/// order to guarantee the traitt is safe: `is_empty` **MUST**
/// ensure that one value remains, if the iterator is non-
/// contiguous this means advancing the iterator to the next
/// position.
fn first(&self) -> Option<&'a u8>;
#[inline(always)]
fn first(&self) -> Option<&'a u8> {
self.get_buffer().get(self.cursor())
}

/// Check if the next element is a given value.
#[inline(always)]
Expand Down Expand Up @@ -249,14 +232,6 @@ pub unsafe trait Iter<'a> {
/// A default implementation is provided for slice iterators.
/// This trait **should never** return `null` from `as_ptr`, or be
/// implemented for non-contiguous data.
///
/// # Safety
///
/// TODO: Fix the safety documentation here...
/// The safe methods are sound as long as the caller ensures that
/// the methods for `read_32`, `read_64`, etc. check the bounds
/// of the underlying contiguous buffer and is only called on
/// contiguous buffers.
pub trait DigitsIter<'a>: Iterator<Item = &'a u8> + Iter<'a> {
// TODO: Fix the documentation
/// Get if the iterator cannot return any more elements.
Expand All @@ -278,28 +253,16 @@ pub trait DigitsIter<'a>: Iterator<Item = &'a u8> + Iter<'a> {
/// ensure [peek] is always safe.
///
/// If you would like to see if the cursor is at the end of the buffer,
/// see [is_done] or [is_empty] instead.
/// see [is_buffer_empty] instead.
///
/// [is_done]: DigitsIter::is_done
/// [is_empty]: Iter::is_empty
/// [is_buffer_empty]: Iter::is_buffer_empty
/// [peek]: DigitsIter::peek
#[inline(always)]
#[allow(clippy::wrong_self_convention)]
fn is_consumed(&mut self) -> bool {
self.peek().is_none()
}

/// Get if the buffer underlying the iterator is empty.
///
/// This might not be the same thing as [is_consumed]: [is_consumed]
/// checks if any more elements may be returned, which may require
/// peeking the next value. Consumed merely checks if the
/// iterator has an empty slice. It is effectively a cheaper,
/// but weaker variant of [is_consumed].
///
/// [is_consumed]: DigitsIter::is_consumed
fn is_done(&self) -> bool;

/// Peek the next value of the iterator, without consuming it.
///
/// Note that this can modify the internal state, by skipping digits
Expand Down
6 changes: 3 additions & 3 deletions lexical-util/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
//! correctly.
//!
//! To see if the cursor is at the end of the buffer, use
//! [DigitsIter::is_done][is_done].
//! [is_buffer_empty][crate::iterator::Iter::is_buffer_empty].
//!
//! Any iterators must be peekable: you must be able to read and return the next
//! value without advancing the iterator past that point. For iterators that
Expand All @@ -66,7 +66,7 @@
//! like:
//!
//! ```rust,ignore
//! unsafe impl<_> DigitsIter<_> for MyIter {
//! impl<_> DigitsIter<_> for MyIter {
//! fn peek(&mut self) -> Option<u8> {
//! loop {
//! let value = self.bytes.get(self.index)?;
Expand Down Expand Up @@ -95,7 +95,7 @@
//! }
//! ```
//!
//! [is_done]: iterator::DigitsIter::is_done
//! [is_buffer_empty]: iterator::Iter::is_buffer_empty
//! [is_consumed]: iterator::DigitsIter::is_consumed
//! [peek]: iterator::DigitsIter::peek

Expand Down
36 changes: 4 additions & 32 deletions lexical-util/src/noskip.rs
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,10 @@ impl<'a, const __: u128> Bytes<'a, __> {
}
}

// TODO: Move to `Iter` as a trait along with `new` as well`
/// Initialize the slice from raw parts.
///
/// # Safety
///
/// This is safe if and only if the index is <= slc.len().
/// For this reason, since it's easy to get wrong, we only
/// expose it to `DigitsIterator` and nothing else.
Expand All @@ -67,13 +67,6 @@ impl<'a, const __: u128> Bytes<'a, __> {
}
}

/// Get if the buffer underlying the iterator is empty.
/// Same as `is_consumed`.
#[inline(always)]
pub fn is_done(&self) -> bool {
self.index >= self.slc.len()
}

/// Get iterator over integer digits.
#[inline(always)]
pub fn integer_iter<'b>(&'b mut self) -> DigitsIterator<'a, 'b, __> {
Expand Down Expand Up @@ -107,7 +100,7 @@ impl<'a, const __: u128> Bytes<'a, __> {
}
}

unsafe impl<'a, const __: u128> Iter<'a> for Bytes<'a, __> {
impl<'a, const __: u128> Iter<'a> for Bytes<'a, __> {
const IS_CONTIGUOUS: bool = true;

#[inline(always)]
Expand Down Expand Up @@ -138,12 +131,6 @@ unsafe impl<'a, const __: u128> Iter<'a> for Bytes<'a, __> {
self.index
}

// TODO: Rename
#[inline(always)]
fn is_empty(&self) -> bool {
self.as_slice().is_empty()
}

#[inline(always)]
fn first(&self) -> Option<&'a u8> {
self.slc.get(self.index)
Expand Down Expand Up @@ -212,7 +199,7 @@ impl<'a: 'b, 'b, const __: u128> DigitsIterator<'a, 'b, __> {
}
}

unsafe impl<'a: 'b, 'b, const __: u128> Iter<'a> for DigitsIterator<'a, 'b, __> {
impl<'a: 'b, 'b, const __: u128> Iter<'a> for DigitsIterator<'a, 'b, __> {
const IS_CONTIGUOUS: bool = Bytes::<'a, __>::IS_CONTIGUOUS;

#[inline(always)]
Expand All @@ -237,16 +224,6 @@ unsafe impl<'a: 'b, 'b, const __: u128> Iter<'a> for DigitsIterator<'a, 'b, __>
self.byte.current_count()
}

#[inline(always)]
fn is_empty(&self) -> bool {
self.byte.is_done()
}

#[inline(always)]
fn first(&self) -> Option<&'a u8> {
self.byte.first()
}

#[inline(always)]
unsafe fn step_by_unchecked(&mut self, count: usize) {
debug_assert!(self.as_slice().len() >= count);
Expand All @@ -265,12 +242,7 @@ unsafe impl<'a: 'b, 'b, const __: u128> Iter<'a> for DigitsIterator<'a, 'b, __>
impl<'a: 'b, 'b, const FORMAT: u128> DigitsIter<'a> for DigitsIterator<'a, 'b, FORMAT> {
#[inline(always)]
fn is_consumed(&mut self) -> bool {
Self::is_done(self)
}

#[inline(always)]
fn is_done(&self) -> bool {
self.byte.is_done()
self.is_buffer_empty()
}

#[inline(always)]
Expand Down
Loading

0 comments on commit 89f6b67

Please sign in to comment.