my-new-rust-binary-search

59 pointsposted 6 days ago
by ChadNauseam

70 Comments

eknkc

6 days ago

While the implementation is straightforward enough, this post clearly shows me why I don’t like Rust.

So, the function accepts two values that implement Mid trait. And there is a blanket implementation for Mid that applies to anything can be added, substracted, divided etc and can be derived from a u8. So now this will make every numeric type a Mid. (Which, as the author states only an example I guess)

I can read and understand how this works here. But when I’m in a larger code base, I can not find what trait implementation made what possible.

With all the trait bounds and implicit conversions happening around and macros implementing traits on stuff it becomes a shit stew in a hurry.

And everyone always reaches to the most clever way to do things. I mean, does Mid need to be a trait? I guess it works beautifully yeah. I’d prefer to pass a midpoint calculator function instead.

I guess this turned into a rant. I actually really like Rust the language but I think I don’t like Rust codebases basically.

It is the least readable language I’ve seen not because of the syntax but because of all the cleverness going on around the traits and macros.

ChadNauseam

6 days ago

Having the function take a midpoint calculator is fine and there are some advantages to it (e.g. it makes it easy to turn into a linear search if you want to do that for some reason). However since writing a midpoint function isn’t always totally straightforward (an incorrectly implemented midpoint will cause correctness issues or even infinite loops), I think it’s nice to have it as a trait. That way there’s little risk of someone using a bad `|x,y| (x + y) / 2` implementation out of laziness.

However I 100% agree with you that traits make it really hard to figure out what actual function you’re calling. I wish there was a way to see something like “in your code, this generic function is called with types T1, T2, etc.” and see the trait implementations for those types as well. I’ve actually wanted to contribute this to rust-analyzer for a while.

aaronblohowiak

6 days ago

I have good news for you, it gets worse! Look at very large and old ruby codebases :)

Ygg2

5 days ago

Ruby and Rust have outside Turing completeness very little to do with each other.

steveklabnik

5 days ago

I think your parent’s point is something along the lines of “method lookup being complex doesn’t inherently limit the popularity of a programming language.”

inferiorhuman

6 days ago

   I can not find what trait implementation made what possible.
That's what your IDE is for? If I pull up the code docs for e.g. a call to an iterator function I see which object or trait that function came from.

https://imgur.com/a/TvKlmMH

ithkuil

5 days ago

Not only IDEs. Also generated documentation clearly states which traits a type implements and why

forrestthewoods

6 days ago

I like Rust a lot, but I strongly agree. “Advanced” Rust becomes as inscrutable as C++ template crap.

I really wish Rust had a way to define “template style” generic functions rather than “trait style”. Some things simply can not be done gracefully with trait spaghetti.

Ygg2

6 days ago

> I really wish Rust had a way to define “template style”

It has! It's called proc macro [1]. I'm only half joking.

[1] It can generate anything, errors generally suck ass in it as much in C++ templates and it feels similarly brittle.

> “Advanced” Rust becomes as inscrutable as C++ template crap.

I think people are abusing traits a bit too much. They are very nifty and often compile to very optimized code.

That said this trait is not an example of such abuse. Having a trait to help with overloading is the most benign form of trait use.

Solution to grandparent's problem is called rust analyzer. It will show you what methods are usable on your variable.

forrestthewoods

6 days ago

Rust macro hell is even worse than Rust trait spaghetti. I regularly refuse to use crates because they’re macro based and there’s a no macro alternative.

But you’re not wrong :)

tcfhgj

6 days ago

There's always a no-macro alternative: expanding the macro by hand.

What's macro spaghetti?

inferiorhuman

6 days ago

Certainly it's possible to generate meaningful error messages within proc macros. Most folks don't bother though.

pjmlp

4 days ago

I would assert it is actually worse than C++, because at least templates and constexpr execution still looks like plain C++, while macros in Rust, with two additional levels of syntax, add even more complexity.

LegionMammal978

6 days ago

Yeah, trait-heavy code is one of Rust's weaker points in practice. I find tower and some of the database crates to be some of the worst offenders: even knowing the language pretty well, I've often gotten lost in a maze of projections and blanket impls when an error occurs. There's no easy way to trace the implementation back to the intended behavior, to compare the expected and actual outcomes.

jms55

6 days ago

This has actually gotten better recently. You can now implement custom error diagnostics.

E.g. Bevy implements a trait (QueryData) for tuples of up to 32 items (A,) + (A, B) + (A, B, C)...

If you went over that 32 items, you used to get a confusing error about not your type not implementing QueryData.

Now you get a nice error message explaining the common reasons why this your type does not satisfy the trait.

unshavedyak

6 days ago

One of my annoyances is for something like `impl Foo for T: Deref<U>, U: Foo`. Ie a Foo impl for something that can deref to another type, which impls Foo. Eg an Arc, Box, etc. So now that you impl that to support, say, Box - well now if something fails to impl Foo, you'll instead get an error about it not being a Box (iirc, been a while). Ie, my memory is old so forgive me, but the error says something to the effect of your `T` type not being Box<T>`, instead of it simply saying that your T doesn't impl Foo.

I ran into this confusing error so many times i just stopped doing blanket deref impls and instead just impl it for `Box<T>` directly. Doing that avoids the weird error entirely.

Still, i love Rust. However it, like any lang, is not perfect. Nor should it be.

pimeys

6 days ago

Also tracing and rust-opentelemetry gives you a pretty wild ride if needing to do something more special with them.

worik

6 days ago

> I actually really like Rust the language but I think I don’t like Rust codebases basically.

And

> everyone always reaches to the most clever way to do things.

Yes oh yes

Rust was a brilliant idea spoilt by too much cleverness

I'm hoping for Rust II. A smaller language, with a bigger standard library and no dependence on run times.

By "no dependence on run times." I mean no async/await. It is an abomination in Rust

Conscat

6 days ago

There already exist many affine-typed languages with fewer features than Rust. When you phrase it like "Rust but with weaker type safety and higher-overhead abstractions" one might see why this premise fails to gain much traction, though.

aw1621107

6 days ago

> By "no dependence on run times." I mean no async/await. It is an abomination in Rust

Out of curiosity, what alternative would you have preferred to see, if any?

worik

6 days ago

Non blocking standard library functions

I find it astounding that so many people think asynchronous programming means async/await

Asyc/await is a way of letting programmers who do not want to learn how, do asynchronous programming.

It has poisoned the well

dwattttt

6 days ago

I very much do not want to go back to the days of "read up to 4 bytes off the wire to see how big the next chunk is. Oh I only read 3 bytes? Guess I'd better save those 3 + the fact I need to read 1 more, then drop back out to the event loop"

worik

5 days ago

> I very much do not want to go back to the days of ....

Not what I recommend. The C library does much better than that.

In asynchronous programming you can only deal with the data you have, so you buffer it. I have not counted bytes like that - for the purposes of reading from a file, in decades.

dwattttt

5 days ago

But that's where non-blocking leads; it can't block until it reads all the bytes you asked for, that has to be handled somehow.

A fancy C library could buffer the partial read for you rather than you needing to do it, and it could even maybe deal with turning your function into the state machine required to resume the function at the partial read.

But then you look around and realise you've created another async/await.

xedrac

6 days ago

I've been doing asynchronous programming for decades, and async/await is a notable improvement for many use cases. Event loop + callback hell, or condition variable/mutex/semaphore approaches are great in some cases, but they suck in others.

worik

5 days ago

> I've been doing asynchronous programming for decades, and async/await is a notable improvement for many use cases.

I think that goes to taste.

> Event loop + callback hell

That was always due to indisciplined programming. Async/await offers up other hells.

I think it makes some sense in memory managed systems. But in a system with static compile time management like Rust async/await forces all sorts of compromises. Mostly these fall on library authors, but it is its own hellscape. `Pin`?

> great in some cases, but they suck in others

That is a universal refrain for software systems!

aw1621107

6 days ago

> Non blocking standard library functions

What approach are you thinking of for the API of those functions, since async/await is not something you'd like to see?

mfenniak

6 days ago

I believe this is reference to Go's approach, where code can be written in a linear straightforward fashion with normal function calls, but, if they are syscalls then they become async automatically.

I don't think it's appropriate for the level of coding that Rust is targeting... But... It would be nice.

aw1621107

5 days ago

I suppose that makes sense given OP's definition of "no dependence on run times", but I think I'd have to agree that that approach probably wouldn't work well for the niche Rust is trying to target.

worik

5 days ago

> I believe this is reference to Go's approach,

I am unfamiliar with GO.

I am thinking of the way the C library does it.

> I don't think it's appropriate for the level of coding that Rust is targeting

That would be low level system programming. In part

aw1621107

5 days ago

> I am thinking of the way the C library does it.

What way is that? I didn't think the C standard library had any understanding of blocking/non-blocking.

worik

4 days ago

O_NONBLOCK

aw1621107

4 days ago

That's technically POSIX, not standard C, isn't it?

But in any case, I'm pretty sure Rust has supported the nonblocking functionality you want for quite a while now (maybe since 1.1.0)? You'll need to piece it together yourself and use the libc crate since it's a platform-specific thing, but it doesn't seem too horrendous. Ultimately it comes down to the fact that the fundamental method for the std:io::Read is

    fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize>;
which is perfectly compatible with non-blocking reads. For example, std::io::File implements Read and std::os::fd::FromRawFd, so if you can get a non-blocking file descriptor you should have non-blocking reads for files.

NoboruWataya

6 days ago

> It is the least readable language I’ve seen not because of the syntax but because of all the cleverness going on around the traits and macros.

Definitely agree about macros. A lot of crates use them extensively, but they are inherently less intuitive to reason about, tend to be less well documented and produce less helpful error messages when you use them incorrectly.

I think a lot of trait misuse probably stems from the fact that structs kind of look like classes and traits kind of look like interfaces, so people may try to do Java-style OOP with them.

(As an aside, the other thing that makes it difficult for me to grok Rust code is when everything is nested in 3+ levels of smart pointers.)

jules

6 days ago

The modifications that this article makes to the original version [1,2] are not a good idea. The modified version in this article cannot properly be used to search an array, for example (due to the extra "sanity checks", which would cause out-of-bounds errors). The API is also made more complex, returning a pair of options of indices rather than a pair of indices. You really just want the pair of indices!

If you do sanity checks in Rust then you should panic instead of returning options. However, in this case you cannot do the sanity checks, because the contract of the function is that it will only call the predicate strictly between l and r (i.e., only on indices returned by mid). Further, given the initial sanity checks, the sanity checks inside the loop body are redundant, as the loop invariant ensures that those checks never fail.

[1] https://julesjacobs.com/notes/binarysearch/binarysearch.pdf [2] https://byorgey.wordpress.com/2023/01/01/competitive-program...

Here is a Rust implementation of the original:

  pub fn search<T: Mid, G>(predicate: G, mut l: T, mut r: T) -> (T, T)
    where G: Fn(&T) -> bool
  {
    loop {
      match l.mid(&r) {
        None => return (l, r),
        Some(m) => {
          if predicate(&m) { r = m } 
          else { l = m }
        }
      }
    }
  }
Or, traitless:

  pub fn search<T, H, G>(mid: H, predicate: G, mut l: T, mut r: T) -> (T, T)
    where H: Fn(&T,&T) -> Option<T>, G: Fn(&T) -> bool
  {
    loop {
      match mid(&l, &r) {
        None => return (l, r),
        Some(m) => {
          if predicate(&m) { r = m } 
          else { l = m }
        }
      }
    }
  }

re

6 days ago

> You want one that doesn't require a consolation with a manual to remember if it returns the first value that's greater or the last value that's lower. ... No more needing to remember "does this return the highest index whose value is lower than the value I'm searching for, or the lowest index whose value is greater than the value I'm searching for?". This one returns both, so you can use whichever you need for your use case.

This (italicized part) feels like a weird requirement. Function/API documentation (which this author's code lacks) is important, and I don't think I ever write a call to an unfamiliar function without checking the docs for it. Assuming that you know what a function does based on the name alone is a recipe for buggy code.

This is also a kind of unusual formulation of binary search which searches across a monotonic function space rather than a sequence. I get that this is more general and can be used to implement the latter, but a less general interface that only works on sequences is IMO more intuitive: in that case, I expect it to return either the index of the target number, or the index where it should be inserted if not found. It's somewhat telling that the author doesn't offer any code examples demonstrating how to use this wonderful general function to solve the common use case of finding an element in a sorted vector, or inserting a new one into the correct location.

edflsafoiewq

6 days ago

The interface is nicer for keyframe interpolation eg. the problem where you have a list of (X,Y) pairs and you want to find the two pairs that straddle a given x and interpolate between them. In that case, search(|&i| v[i].0 >= x, 0, v.len() - 1) handles the three cases "before the beginning", "after the end", and "between two pairs" fairly naturally.

gnuvince

6 days ago

> Just look at the code. How could it be simpler?

Cannot tell if this is a joke.

Aeolos

6 days ago

Which part do you find complicated?

The code itself looks very straightforward, two preconditions and a loop. I don't see any significant difference in terms of complexity compared to any other low-level systems programming language. It's obviously a bit more verbose than python or haskell, but those are not targeting the same audience.

IshKebab

6 days ago

It looks extremely simple to me. I definitely have never seen a simpler binary search implementation (though in fairness it punts the midpoint calculation). Why do you think it's a joke?

johnisgood

5 days ago

Yeah, Perl is simple, too, then.

cranky908canuck

6 days ago

I really like Rust, and what it enables, but this seems like a wonderful counterexample.

For a practical programmer, bsearch is on a sorted array. Implement that, with the necessary type abstractions. So: build the objects into an array of references, sort that, then bsearch it already. If you need to keep that array around, do that. Way easier to look at the code and be reasonably confident that it's ok.

Anything else is fobbing off the abstractions to something not-nearly-obvious ( eg, a 'Mid' trait?). I disagree that this is a defect in Rust.

But I suppose that enabling this kind of misabstraction may indeed be a defect in Rust.

Sound an awful lot like Fermat: "the truly marvellous proof which the margin is too narrow to contain".

IncreasePosts

6 days ago

> This one also makes that assumption, but in some cases if that assumption is violated it returns an error rather than silently returning a meaningless

Sometimes returning an error and sometimes returning an incorrect value doesn't sound like a binary search that is "absolutely reliable".

Also, s/consolation/consultation/

fallingsquirrel

6 days ago

It seems to me that it's impossible for a binary search to detect all errors. The precondition is that the list must be sorted, and to fully verify that assumption would be O(n). That would make your binary search no better than a linear search. So I can forgive the best-effort error checking.

ratmice

5 days ago

In theory at least that precondition could be guaranteed by a newtype like SortedList without requiring any linear scan, that would work at least for most cases where lists are sorted by an algorithm. Though not necessarily custom data which is constructed sorted/maintains it's sorting outside the algorithm, but you could perhaps have an unsafe constructor for your SortedList type. That should avoid a linear scan...

_proofs

5 days ago

i am not sure i see a use case where that precondition is anything but mandatory (and at least a requirement in app space for the related data), if youre intending on making practical use of the algo.

if theres a risk that your input is unstable in ordering, then reaching for bsearch seems like a "premature optimization", and youd be better off with a straightforward linear scan until requirements (or your kanguage of choice's bst/heap/rbt equivalent), or data is preprocessed upstream in a way that makes sense for the contexts wanting to use it.

Almondsetat

6 days ago

Binary search only works if you suppose the data is well structured. If you perform the search and in the meantime you spot a passing inconsistency, you can interrupt everything and throw an error. But other than that, checking the entire structure negates the entire point of the algorithm

edflsafoiewq

6 days ago

It doesn't seem possible to determine if the predicate is monotonic without doing a full scan, thus not being a binary search.

user

6 days ago

[deleted]

orlp

6 days ago

This binary search is pretty bad in my opinion. Some issues:

1. It doesn't support the empty array, because l and r are both inclusive.

2. It performs way more predicate tests than necessary.

3. An incorrectly implemented Mid trait causes a silent infinite loop.

4. If a fundamental programming error in the sanity check is detected it just silently returns Nones instead of panicking.

dvt

6 days ago

Agree with basically everything here, plus I would add that this method also has weird ergonomics. For example there is technically a midpoint between these two: `0b10u64.mid(&-20)`--but you get a weird compiler error ("unsigned values cannot be negated") which will be confusing to debug when you're just trying to find a midpoint between two numbers.

What I don't quite understand is why the opposite (`-20.mid(&0b10u64)`) still errors out with the same exact "unsigned values cannot be negated" error. Why is the unsigned and not the -20 taking prececence here? Doing `-20i64.mid(&0b10u64)` I get the expected error: "expected `&i64`, found `&u64`."

Ygg2

6 days ago

Because you can't find midpoint between two different types. One is unsigned other is signed. How would that even work?

ramon156

6 days ago

3 is not a good reason. i It's the developer's job to not screw up. What's the alternative?

gpm

6 days ago

This is a case where less abstraction would make the bug a lot more obvious. Binary search is simple enough that I don't think a generic implementation of it is worth introducing the possibilities of errors at a distance like that one.

orlp

6 days ago

I would normally agree, but the author went out of their way to check for error conditions in their binary search. I think a silent infinite loop is antithetical to that.

user

6 days ago

[deleted]

dmitrygr

6 days ago

Oh god. A trait to average two numbers? As a separate function that nobody will ever find in a larger codebase since it can live anywhere? I am yet again convinced that C is better. Thanks.

Ygg2

6 days ago

Not two numbers. Two somethings. It can average colors, sounds, etc.

dmitrygr

4 days ago

Rust ... for when you need to binary search colors

C ... for ... sane tings?

leni536

6 days ago

> The only case where it returns (None, None) are those when it detects non-monotonicity in your predicate (returning false when a lower input returned true). So if you know your predicate is monotonic, feel free to hit this case with a panic!("should never happen").

Empty range?

edflsafoiewq

6 days ago

No such thing, the range is inclusive of the l and r endpoints.

user

6 days ago

[deleted]

mmastrac

6 days ago

Style nits: this would be much better with an enum Return:

enum Found { Empty, Before(T), After(T), Between(T, T), }

... and a range parameter instead of L and R.

And, TBH, this should probably just panic! on an invalid predicate rather than returning an error. Programming errors should panic.

ChadNauseam

6 days ago

All good points, thank you. A range parameter would be especially good because I think you could use RangeInclusive to emphasize that it’s an inclusive range which is not obvious from the type signature.

rurban

6 days ago

You can omit the runtime sorted checks via proper types. Only accept sorted arrays or lists, and enforce them with such a class. This would make it convenient, faster and safer. It would also take the sort check from the object, and not as extra argument.

im3w1l

6 days ago

In theory I like this, but it raises the question of how it would work in practice; in practice people create non-sorted lists and then sort them in-place. Maybe you could have the sort method return a sorted-token that certifies that a given array is sorted. And the token would read-only borrow the array, to ensure no one can modify the array as long as the token is valid.

Is it even possible for a rust function to take a mutable borrow as a parameter, decay it to an immutable borrow and store that somewhere, such that other people can borrow the parameter after the function has returned?

aw1621107

6 days ago

> Is it even possible for a rust function to take a mutable borrow as a parameter, decay it to an immutable borrow and store that somewhere, such that other people can borrow the parameter after the function has returned?

I think it's possible under some scenarios with the appropriate lifetimes? For example, https://rust.godbolt.org/z/6fe8c4sf4:

    struct Magic<'a> {
        reference: &'a i32
    }
    
    impl<'a> Magic<'a> {
        fn new<'b: 'a>(source: &'b mut i32) -> Magic<'a> {
            Magic { reference: source }
        }
    }
    
    fn double(i: &i32) -> i32 {
        i * 2
    }
    
    fn main() {
        let mut i = 0;
        let m = Magic::new(&mut i);
        double(m.reference);
    }
I think this will stop working once the compiler can no longer see that the lifetimes work out, but I think that makes sense - there's a mutable object underlying the mutable borrow, and the compiler needs to verify that no one else can get another mutable borrow while the derived immutable borrows are live.

im3w1l

6 days ago

So let's for completenes circle back to the usecase of a certified sorted array, it would be something like this

    struct Sorted<'a> {
        reference: &'a [i32]
    }

    impl<'a> Sorted<'a> {
        fn in_place<'b: 'a>(source: &'b mut [i32]) -> Sorted<'a> {
            source.sort();
            Sorted { reference: source }
        }
    }

    fn use_sorted_array(sorted_arr: &Sorted) {
        let arr = sorted_arr.reference;
        println!("{} {} {}", arr[0], arr[1], arr[2]);
    }

    fn main() {
        let mut arr = [2, 3, 1];
        let sorted_arr = Sorted::in_place(&mut arr);
        use_sorted_array(&sorted_arr);
    }
You might also want to add methods on Sorted for creating a Option<Sorted> by verifying the sortedness of an immutable reference and a method that copies an array and sorts it (it would need to return a struct of a Box for managing the lifetime and a Sorted pointing into the box, or something I guess, probably need to use Pin?)

---

On a completely separate note, this will actually not work for the binary search in the article in question, as it uses binary search on a predicate. Having a sorted array doesn't guarantee that the predicate will be monotonic.

amjoshuamichael

6 days ago

Wow, this binary search implementation is sure to reduce code duplication. It's so elegant! Can I try?! Here's my implementation of a rust program that can do anything!

```

pub trait Program<T> {

    pub fn run(self);
}

pub fn run_program<T: Program>(program: T) {

    program.run();
}

```

Boom. Just look at the code. How could it be simpler? This works for any struct that implements the `Program` trait.

/s

While the code in the original post is elegant, and cool from a "hey look what i can do!" perspective, I really feel like at some point we lost the plot on templates with Rust. When you try to finagle an implementation of a general algorithm, you just end up with a solution that works poorly in all cases, instead of one that works well in most cases.

I've written a lot of Rust, and I definitely know what it's like to get high off of your own template fumes. I once wrote a function with five generics and two trait constraints and I thought it was the coolest shit I'd ever done. But really, I could have just written, like, three discrete functions, and it would have been clearer and smaller.

If someone used something with binary search algorithm other than the Mid implementation shown in the post, that would categorically be bad code from someone who was trying to be too clever. If someone has a unique binary search to do, they should write it themselves. That way, the whole algorithm will be on display, and it will be easier to see ways to make it simpler or improve it in the context of the unique search operation. Also, it will be much clearer.

https://en.wiktionary.org/wiki/thneed