Sorry, isn't there quite a bit of a difference between O(1) and O(log N) space? ...

Const-me · on Nov 11, 2019

> isn't there quite a bit of a difference between O(1) and O(log N) space?

On a 64-bit CPU, y=x+1 is O(1) time when Y < 2^64, but becomes O(log N) time for larger numbers. With space it’s even more complex, sometimes for space saving it makes sense to keep less than 64 bits of these integers, i.e. it may become log(N) for much smaller values.

For this reason, I’m not a bug fan of big O. Useful at job interviews. Probably useful for people working on extremely generic stuff like C++ STL which needs to scale both ways by many orders of magnitude, it’s not uncommon to have a vector with 1 element or 100M elements, the standard library must be good at both ends. But I don’t often solve such generic problems, I usually have some ideas how large my N-s are, and can act accordingly.

diabeetusman · on Nov 11, 2019

Let's say that our file consists of nothing but the new line character (1 byte). At 2^64 lines, that's 18.45 EB.

Sure, for files larger than 18.45 EB at the minimum, computing the new line count will take O(log N) time.

I'm all for nitpicking the issues with Big O notation, but this instance doesn't strike me as one.

OskarS · on Nov 12, 2019

OP said “to be really precise, O(log n)”, to avoid nitpickers that might have added corrections. And it is absolutely true: to store the number N, you need O(log N) bits. That is how Big O notation works.

The lesson to take away here is that O(log N) is REALLY REALLY small, for all sensible values of N. If an O(log N) algorithm is slow, it’s not because of the O(log N) part, it’s because of the hidden factors in O notation.

lorenzhs · on Nov 12, 2019

> That is how Big O notation works.

No it is not. "Big O" is a way of expressing asymptotic growth of some quantity, most often running time. But you need a machine model to even express running time. And sure, you can define your machine model to require O(log n) time for adding two numbers, but that's not particularly useful. In particular, the word RAM model—which is what is normally used—assumes a word size of θ(log n), and that basic operations on those words take a constant amount of time.

We use models because otherwise even the simplest things would be impossible to analyse. Or do you want to include a three-level cache hierarchy, micro-op decoding, the paging system, branch mispredictions and memory timings in your complexity analysis? Why is it always the "bUt AdDinG tWo nuMbERs tAkeS O(log n) tImE" hill that HN commenters are willing to die on?

nothrabannosir · on Nov 12, 2019

To be fair he did say O(1), and qualified the O(log N) with both parentheses and “to be really precise”. If you can’t be precise about the definition of O on HN, then where? (which, “to be really precise”, is a polynomial time tape based Turing machine, not a modern RAM machine with log N words. It’s pointless, but fun to remember :) )

lorenzhs · on Nov 12, 2019

No no no! Big O has nothing to do with Turing machines at all! You can, of course, express the number of steps a single-tape deterministic Turing machine needs to solve some problem using Big O notation. But you can just as well express the number of Qubits that a quantum computer needs to solve some other problem in Big O notation. It has nothing to do with the quantity being measured. Are you perhaps mixing up the definition of complexity classes such as P and NP with Big O notation?

First you need to settle on a model. Then you can express time, space, or some other quantity in that model using Big O notation.

Arguably, I've found that HN is one of the worst places to be precise about Big O notation. Everyone here seems to think they know what it is, only to lay bare their misconceptions about it a second later.

OskarS · on Nov 12, 2019

You're simply wrong here. The algorithm is O(log n), whether you agree to it or not.

First of all, there's absolutely nothing that says that we have to assume that the number of lines is less than 2^64. Big O notation does not care what happens below that number, it only cares what happens when you go to infinity, which is a lot larger than 2^64. At that point you need arbitrarily sized integers, which are not fixed-space. Hence O(log n). Word size or RAM model does not matter one whit.

This is how Big O notation works, it has nothing to do with practicality. As another example of a similar thing, consider Sorting Networks. The AKS sorting networks has the best "asymptotic depth" of any known sorting networks (O(n log n) depth, i think), but the hidden constants are so massive that there are no situations where the AKS networks are actually practical, for any reasonable input they are always beaten by other (asymptotically worse) networks. That doesn't mean that they "aren't O(n log n)" or whatever.

Second: if we were to actually write this algorithm in the way which is most conservative with space, we wouldn't use a 64-bit integer in the first place. If we only care about memory, we'd start the count with a single-byte datatype (i.e. unsigned char) to keep the count. The algorithm now uses 1 byte of space. When that overflows (i.e. reaches 256 lines), we would change to a bigger data type, i.e. unsigned short. The algorithm now takes 2 bytes of data.

When THAT overflows, we again promote the datatype to an unsigned int. Now we use 4 bytes of data. When that overflows, we finally promote it to a uint64_t. Now we're on 8 bytes of data. When that overflows... (etc.)

See how the memory usage is growing? And how it's growing exactly with the logarithm of the number of lines?

You're just wrong about this. Yes, you can absolutely say "assuming storage of all numbers is O(1), then the algorithm is O(1)" (essentially what OP said) but that is not a default assumption and it's not accurate to reality (since actually storing numbers requires O(log n) of space). And yes, branch prediction misses and cache misses are far more important factors here, but that doesn't change the fact that it uses O(log n) space. That's just simply true.

Also: we're not dying on a hill. OP made an entirely accurate and true statement, which was met with an avalanche of people criticizing him using incorrect arguments and misunderstandings of Big O notation. We're just saying that he was correct all along.

lorenzhs · on Nov 12, 2019

By your logic, memory access takes time Ω(n^(1/3)) because memory cells need to occupy physical space and we live in a three-dimensional reality, so the time to access a memory cell must necessarily increase with the physical distance to that memory cell. But using a model that made such assumptions would be needlessly complex, so nobody does it. (Incidentally, there was a post on HN that argued this a few years ago, and the comments were full of people who felt the need to ignore the reasoning behind the assumptions backing commonly-used machine models, much like it is happening in this thread.)

Arbitrarily large numbers, represented in binary, certainly take time that is linear in the length of their representation to add. Nobody's arguing with that. But using a model that assumes bitwise operations would be needlessly complex, so outside of some really specialized contexts, nobody does it.

> You're just wrong about this. Yes, you can absolutely say "assuming storage of all numbers is O(1), then the algorithm is O(1)" (essentially what OP said) but that is not a default assumption

I don't know the basis on which you argue that the word RAM model isn't the default assumption, but I feel confident in claiming that outside of some niche communities, it most certainly is.

> and it's not accurate to reality (since actually storing numbers requires O(log n) of space).

That's why it's called a model. We use models because reality is too messy to work with, and we usually don't care about the exact details. Just ask a physicist, they make simplifying assumptions all the time because without them, they'd be getting absolutely nowhere. It's pretty much the same in algorithmics.

roelschroeven · on Nov 12, 2019

> By your logic, memory access takes time Ω(n^(1/3)) because memory cells need to occupy physical space and we live in a three-dimensional reality, so the time to access a memory cell must necessarily increase with the physical distance to that memory cell.

The issue was about space complexity, not time complexity. Very large numbers take more space than smaller numbers.

lorenzhs · on Nov 13, 2019

No, the argument was that because a binary representation of the number n needs ⌈log n⌉ bits, incrementing it would require O(log n) time. And my argument was that accessing these bits in memory takes an amount of time that grows with the cube root of your memory size because of physical constraints. But of course I'm not arguing that we should incorporate that into our models. Doing so would not solve any problems, quite the opposite. But assuming that basic operations on machine words can be performed in constant time solves actual problems such as "why is algorithm analysis in this model so bothersome", "why are there logarithms everywhere", and "why do the theory people say this should get slower for larger numbers? It always takes 1 cycle on my machine".

mk89 · on Nov 12, 2019

Do you have any resource mentioning that the algorithm runs in O(logn) time or space? This is really new to me, sorry. I don't mean to argue, it's just the first time I hear such a thing.

robotresearcher · on Nov 12, 2019

It takes O(log n) bits to represent the value n, here the number of lines counted so far.

Does it make sense when expressed compactly like that?

mk89 · on Nov 12, 2019

For God's sake, for a second I thought I had forgotten everything about the O notation. Thanks for confirming what I thought is the correct definition of big O.

lorenzhs · on Nov 12, 2019

> On a 64-bit CPU, y=x+1 is O(1) time when Y < 2^64, but becomes O(log N) time for larger numbers.

No. Incrementing a 128-bit counter is also O(1) time on amd64/ARM8/... Only when your numbers are arbitrarily large would that become true.

But that's why we use models that make clear what our assumptions are. In particular, the word RAM model (which is normally used to talk about sequential algorithms) assumes a word size of θ(log n), and that some basic operations on words (such as arithmetic) are possible in constant time.

Models obviously have flaws (otherwise they wouldn't be models), but practitioners often make the mistake of overthinking them. How big, exactly, is your file that a constant number of 64-bit integers doesn't suffice? You can easily increment a 512-bit counter in constant time on an 8-bit Arduino. It doesn't matter whether I have to check one hypothetical 512-bit counter, 8 64-bit counters or 64 8-bit counters, it's a constant number and therefore O(1).

Whenever this "Big O is useful for job interviews but not much else" sentiment comes up on HN, it's usually driven by a misunderstanding of what Big O and asymptotic analysis actually mean.

Const-me · on Nov 12, 2019

> Only when your numbers are arbitrarily large would that become true.

Technically you’re right, I should have put “arbitrarily large” there, but assumed it’s obvious from the context.

> but practitioners often make the mistake of overthinking them

As an experienced practitioner, I happen to know how extremely inaccurate are these models.

When I was inexperienced, the models helped more then they do now, after I have leared about NUMA, cache hierarchy, micro-ops fusion, MMU, prefetcher, DMA IO, interrupts, and more of these hairy implementation details. Before I knew all these things, heuristics based on asymptotic analysis were generally helpful. Now they’re less so, I sometimes deliberately pick an asymptotically slower algorithm or data structure because it’s faster in reality.

lorenzhs · on Nov 12, 2019

Well yeah, for many problems, there are solutions that have wonderful running time in the RAM model, like van Emde Boas trees or Fibonacci heaps, but are approximately 100% useless in practice because of the constants involved.

I guess this is where I pitch what we call "Algorithm Engineering", which considers both theoretical guarantees and practical performance, including properties of real-world hardware such as the ones you mentioned :) https://en.wikipedia.org/wiki/Algorithm_engineering

Const-me · on Nov 12, 2019

> including properties of real-world hardware such as the ones you mentioned :) https://en.wikipedia.org/wiki/Algorithm_engineering

Are you sure about that? The most recent source of that article is from 2000. Half of the stuff I have mentioned didn’t exist back then.

As far as I’m aware, there’re very few people in academia who study modern hardware while focused on software performance. I only know about Agner Fog, from technical university of Denmark.

lorenzhs · on Nov 12, 2019

It's a meta-article, it doesn't contain all the results of the field. But the field is alive and well. Just look at the proceedings of SEA and ALENEX (there was no SEA this year for stupid reasons that have nothing to do with the field).

Agner Fog does some cool stuff (and I've used his RNG libraries many times) but as you said he studies modern hardware with a focus on software performance. Algorithm engineering is about finding algorithms that have nice theoretical guarantees (e.g., proving correctness or that there aren't any inputs which would result in substantially worse running time) while keeping in mind that the result would be quite useless if it couldn't be implemented efficiently on the machines we have, so the goal is to come up with algorithms that are also really fast in practice.

TravHatesMe · on Nov 11, 2019

> On a 64-bit CPU, y=x+1 is O(1) time when Y < 2^64, but becomes O(log N) time for larger numbers.

I thought that big-O does not actually consider the underlying hardware. How can you specify the CPU model / instruction conditions and still use the semantics of big-O analysis? The mathematical definition of O(_) does not involve hardware, it implies that there exists some coefficients and a constant. I get the point you are making but is big-O an appropriate way of measuring in this context?

tshaddox · on Nov 12, 2019

As far as I know, big O notation definitely requires that we define the model of computation model we're using and the operations that this model can perform in constant time. Usually that definition is not made explicit or rigorous.

When say that comparison sort is O(n * lg n), we're assuming a computational model where comparing two integers is a constant time operation. That's true in most physical computers when we use the CPU's comparison instructions. It's also true in any theoretical model where we define a (finite) largest possible integer. So it works well enough.

TravHatesMe · on Nov 12, 2019

You are using O(n) to measure the actual performance of an algorithm in practice. This is not the intention of O(n) -- rather it is used to classify functions into groups depending on "how they grow" with respect to the input as it goes to infinity. This is good because different instructions, or sequences of instructions, will perform differently across CPU architectures, for example two CPUs might have different cycles-per-instruction. Big-O makes claims independent of the hardware that remain true forever. Just because algorithm A is O(n) and B is O(n^2) does not mean that A will always outperform B.

tshaddox · on Nov 12, 2019

I’m not conflating big O notation with benchmarking. My point applies to theoretical models of computation.

I’m pointing out that, for example, comparison sort is only O(n * lg n) in theory if the theoretical model of computation we’re dealing with can compare any two items in constant time. Comparing arbitrarily large integers can not be done in constant time in theory, so if that’s the problem we’re discussing then the complexity will not be O(n * lg n). But again, we generally are implicitly concerned with models of computation where comparison is a constant time operation.

lorenzhs · on Nov 12, 2019

No. Stop. That's not how this works.

Yes, Big O notation measures asymptotic growth of functions. But what do those functions express? For them to express running times, you need to define a machine model. It's a model because it's not a real machine – we don't stroll into a computer shop and pick up some computer and use that as our reference. Instead, we define an abstract machine model. Often, that's the word RAM model which assumes that for an input size n, the machine can process words of size θ(log n) in constant time. This is reasonable close to how our actual computers work: 64-bit words, and we can do stuff with them in constant time. But it's not perfect, because the RAM model doesn't have any caches (let alone multiple levels of cache hierarchy), doesn't have disks, doesn't model the paging system, etc. If any of these things are of significant concern to you, pick a different model (e.g. an external-memory model to consider memory hierarchy, or look into cache-oblivious algorithms, ...).

But talking about asymptotic running time without specifying which model you're using is absolutely pointless.

gdxhyrd · on Nov 12, 2019

What you are missing is that " algorithm A is O(n)" is a meaningless statement unless you define under which operations.

ouid · on Nov 12, 2019

typically the n refers to the length of the input. bigger numbers to compare gets cancelled out by the largerness of n.

tshaddox · on Nov 12, 2019

I don’t think that’s accurate. The length of a collection is not necessarily correlated to the largeness of its largest element. The n in comparison sort is indeed the length of the collection, and that’s the only variable we generally discuss because we generally assume some fixed maximum allowable element.

ouid · on Nov 12, 2019

I am referring to the bit length of the input.

Const-me · on Nov 12, 2019

> is big-O an appropriate way of measuring in this context?

I don't have formal CS education. I'm less interested in math and more oriented towards practical aspects. Infinity doesn't exist in practice (except these IEEE float magic values), but we can reason about shape of functions, e.g. "count of executed machine instructions as a function of N".

Not the same thing as CS-defined big O, but close.

Can be useful to reason about complexity of algorithms. More precisely, about complexity of their implementations.

Usually useless to reason about time: a machine instruction takes variable time to run on a modern CPU, the same instruction can take less than 1 cycle or much more than 1M cycles.

patrickthebold · on Nov 12, 2019

The amount of memory needed to hold an integer is log(N). That's just because the log is about how many digits it has. That fact doesn't depend on the hardware, it's just a math fact.

tshaddox · on Nov 12, 2019

There's generally an assumption that the elementary values we're using fit into native hardware values, and that some elementary operations on those values can be done in constant time (e.g. a CPU instruction).

If you want to talk about something like supporting arbitrarily large integers or defining the memory of the physical computer we're using, we need to be explicit about that, because it changes a lot of things. After all, a finite-memory computer is not Turing complete, and everything it can compute can be computed in constant time. :)

ouid · on Nov 12, 2019

it's a little more complicated than that.

patrickthebold · on Nov 12, 2019

how so?

ouid · on Nov 12, 2019

well I can hold somewhat arbitrarily large integers in memory with the ackermann function, for instance.

slavik81 · on Nov 12, 2019

If I've read between the lines correctly, your point is that N bits allows you to represent 2^N values, but they don't have to be the values from 0 to 2^N.

ouid · on Nov 13, 2019

yes, sorry for being terse.

naniwaduni · on Nov 12, 2019

everything that terminates is O(1) when you upper-bound the input size

DarkWiiPlayer · on Nov 12, 2019

That's still pretty naive. You can get away with one char and one int.

- Read byte from input

- Write byte to output

- When byte is "\n": write the counter, write a space, increase the counter by one

Most likely, whatever logic you use to turn the counter into a string will already use more memory than your actual loop.

Technically it's still not O(1), as you will need one more bit for your counter variable every time you double the number of lines; realistically, you will most likely be using an integer anyway, so you can consider it "constant".

yifanl · on Nov 11, 2019

Storing the number n takes log n bits, if we're being super pedantic :)

thfuran · on Nov 11, 2019

In practice, I'd say it takes exactly 64 bits for any file.

_pd19 · on Nov 11, 2019

Asymptotic notation isn't about what's "in practice."

We have very good asymptotic complexity for multiplication, but don't use the ones that scale the closest to linear because the time constant is enormous.

darkmighty · on Nov 12, 2019

There are different models of computation. You can have a model in which integers can be arbitrarily large yet still count as constant memory (for analyzing practical-size algorithms on 64-bit machines, for example); I'm not sure how common those are. I know for example arithmetic is often assumed to be constant time (it indeed varies by input size, even x=y+1, which is clearly O(log(y))).

Also relevant here is the RAM machine model that is often employed, where you can have essentially unlimited memory that can be randomly accessed at fixed time. This again is roughly in line with real practical machines running algorithms that use, and fit within RAM. In reality if you wanted a more physically consistent model eventually your access times must differentiate and increase for increasing memory -- data occupies physical space and must be fetched at increasingly long distances, limited by the speed of light. In principle this means m bits of memory at best can be accessed in O(m^(1/3)) time [1]. Of course, this realism isn't always practically relevant (maybe for datacenter-scale problems, perhaps even larger) and complicates the analysis of algorithms.

[1]: Just for fun, if you want to get really physically accurate, this isn't quite right either I suspect -- that's because with enough data (physical bits) occupying a volume at constant density it will eventually collapse into a black hole, which kinda destroys your computer :). So for planetary-scale computers you eventually need to spread your data across a disk, like a small galaxy (or Discworld, if you prefer :)), so it won't collapse, giving it O(sqrt(m)) access time. Surprise, very large worlds must be flat.

This is related to the BH entropy formula, and the so called Holographic principle, I guess -- the entropy and hence information content of a volume is surprisingly bounded by its area, since a Black Hole's area is proportional to its mass. The weird thing of course is how the universe itself perhaps should collapse to a black hole since it has a lot of stuff in any sufficiently large volume, being young and dynamic it hasn't happened yet. It's alsogiven by its fractal scale, but I guess this digression has grown large enough already :)

(do tell if you want to learn more)

gpm · on Nov 12, 2019

If my integers can be arbitrarily large and count as constant memory, why can't I just reduce everything to a 2-counter machine and solve every problem in constant space (and a lot of time)?

crubier · on Nov 12, 2019

Because the Big O notation as used by engineers is actually about practical problems and not asymptotical behavior.

So integers count as constant memory AND can be considered to be arbitrarily large for the purpose of the current problem. Because having more than 2^64 lines in a file will never happen in practice, and even if it does then 2 integers would do the job for the rest of the observable universe.

Be pragmatic, you are on Hacker News.

lorenzhs · on Nov 12, 2019

Uh, that's what the theory people do, too. We make assumptions about the model, for example, that if our input is of size n, a machine word should be able to store that number (word size θ(log n) bits). Otherwise we'd never get anywhere with asymptotic analysis.

In my experience, practitioners just have a lot of misconceptions about "Big O".

Veedrac · on Nov 12, 2019

You can, it just makes for a useless result.

dfox · on Nov 12, 2019

With the O() notation the units are some arbitrary computation step or arbitrary unit of memory, which could even mean matrix multiplication of two 1024x1024 matrices of doubles. The whole point of big-O notation is that you don’t care about how fast/expensive it is, but how it scales.

_pd19 · on Nov 12, 2019

Yes and the memory to store a number scales with the log of the number