NumPy Illustrated: The Visual Guide to NumPy

fouronnes3 · on Dec 29, 2020

I like to stop and appreciate the ndarray abstraction. It's a really beautiful interface that hits all the right spots to me. Anytime I'm working with numerical data in a language without an ndarray I miss it quite strongly. Indexing, slices, broadcasting... it's all so brilliant.

ssl232 · on Dec 29, 2020

Agreed. I occasionally imagine a feature that it would be cool to have in Numpy, then find that it's already there. Most recently I discovered (somewhat lately) that you can create arbitrarily structured arrays and store things like strings alongside array elements [1].

[1] https://numpy.org/doc/stable/user/basics.rec.html

enriquto · on Dec 29, 2020

As a counterpoint, if you are used to real, language-supported numerical arrays, then numpy's ndarray seems like a joke.

Jenz · on Dec 29, 2020

This must have piqued a number of readers interedts, mine included. Are you reffering to something like Matlab? Julia?

enriquto · on Dec 29, 2020

There's many programming languages where multidimensional arrays of floats are a natural language construct. From low-level (e.g. Fortran) to high level languages (e.g. Octave/Matlab, Julia). Even in C you can work with multidimensional arrays of floats, but it is a bit cumbersome. This is not the case for Python, where the language cannot express in itself this data structure, and it needs external library support like numpy.

EDIT: I'm always very surprised by the wide acceptance of python/numpy for numerical algorithms, as it seems a notion quite foreign and awkward for this language. Python has excellent native support for string processing and advanced data structures like dictionaries, but those are mostly useless for numerical computation. If you want to do math in Python, you need to write libraries in other languages and provide a Phython interface. This is what scipy does, and it is a very good job, but it does not feel "natural" from the point of view of the language. If Python was a good scientific programming language, the most natural way to implement scipy/numpy and all that would be using Python itself, but this is not at all the case.

aldanor · on Dec 29, 2020

”Can not express this data structure"? It can of course, there's not much to express. Go and write a (slow) tensor library yourself in pure Python.

To be fair, there's even a standard library `array` module that is a direct wrapper around C arrays, and NumPy shares some functionality with it (namely in regards to buffers, memoryviews, data type encoding etc).

Edit: there's also a built-in buffer type that supports multi-dimensional strides, suboffsets and all that (which, again both `array` and `numpy` make use of in establishing a standardized buffer api).

enriquto · on Dec 29, 2020

> Go and write a (slow) tensor library yourself in pure Python.

Paraphrasing Alexander Stepanov, "complexity is part of of the specification" [0]. If your data is not contiguous (as the third figure in the article shows), it is not the same data structure. Thus you cannot implement this data structure in pure Python, period. The "array" module is just a wrapper to C data just like numpy is, it is not a pure Python construction.

[0] Here's the full quote, from http://www.stlport.org/resources/StepanovUSA.html

In 1976, still back in the USSR, I got a very serious case of food poisoning from eating raw fish. While in the hospital, in the state of delirium, I suddenly realized that the ability to add numbers in parallel depends on the fact that addition is associative. (So, putting it simply, STL is the result of a bacterial infection.) In other words, I realized that a parallel reduction algorithm is associated with a semigroup structure type. That is the fundamental point: algorithms are defined on algebraic structures. It took me another couple of years to realize that you have to extend the notion of structure by adding complexity requirements to regular axioms.

aldanor · on Dec 29, 2020

If you're talking about non-contiguity due to multi-dimensionality (or strided slicing), PyBuffer already internally supports that (and thus memoryviews, stdlib array, etc) - [0], is that "pure Python" enough?

[0] https://docs.python.org/3/c-api/buffer.html#c.Py_buffer.stri...

enriquto · on Dec 29, 2020

> is that "pure Python" enough?

Not at all! The PyBuffer library is all about interfacing to C data. As far as I know, creating a (mutable) contiguous array of floats is impossible in the Python language. This seems to be a deliberate language design decision.

axi1 · on Dec 30, 2020

https://docs.python.org/3/library/array.html

enriquto · on Dec 30, 2020

My point exactly. The "array" class cannot be implemented in Python.

aldanor · on Jan 1, 2021

Uhh, so: Python has a language library feature that allows creating contiguous arrays, and hence it cannot be done in Python. Ok.

// Pretty much the entirety of CPython is written in C, including lists, dicts and everything else, not much different than the array.

lgessler · on Dec 29, 2020

What if NumPy and SciPy were part of the Python stdlib--would this change your view at all? (Note that for the majority of Python's scientific users, this might as well be the case, since they're using Anaconda to get their Python distribution set up.)

Like I mean sure, maybe it'd be nice to have some syntactic sugar at the language level for tensors (though IMO the slice syntax for numpy.ndarray is already plenty), and maybe it's a microannoyance to have to write `import numpy as np` over and over, but what else is lacking?

Chris_Newton · on Dec 29, 2020

If Python was a good scientific programming language, the most natural way to implement scipy/numpy and all that would be using Python itself, but this is not at all the case.

To be fair, with modern hardware capabilities, I still wouldn’t be writing out linear algebra algorithms in the language itself even if I were programming in a relatively low-level language like C or C++. I’d be using an implementation of BLAS/LAPACK or some other suitable library, which in turn might well have been written in carefully tuned assembly language on the target platform in order to use any parallel or otherwise specialised operations provided by a CPU or GPU, partition data for optimal cache use, etc.

The situation with numpy and other numerical libraries in Python seems analogous. We are still importing highly optimised code to do the numerical heavy lifting and then using a higher-level language to write the less performance-sensitive logic and glue everything together conveniently.

dunefox · on Dec 30, 2020

But as soon as you actually need to implement an efficient library you must use C/C++/Fortran. Julia, for example, allows you to stay in Julia and get comparable performance.

rbanffy · on Dec 30, 2020

If you need to implement an efficient vector math library on modern hardware you'll end up using intrinsics, padding structures for cache alignments, identifying which CPU, GPU, NPU, and vector extensions you are running on to better use specific features, and so on, and it won't look like idiomatic sequential C/C++/Fortran for almost every value of "idiomatic".

pletnes · on Dec 29, 2020

I think numpy is mostly great. Although I do agree - numpy should be in the stdlib, period. So many users come from the scientific/data/simulation communities. It’s just that there are no (or too few) core devs from that tradition. Python lacking numpy in the base language is as awful as Fortran lacking dict-like data structures in its runtime.

enriquto · on Dec 29, 2020

> numpy should be in the stdlib, period.

I disagree strongly with this. Numerical facilities should not be in the stdlib, they should be in the language itself, without need to "import" anything. I can create strings and dictionaries without any imports. I should be able to create a multidimensional array of floats, and perform natural operations with it (like matrix multiplication) without importing anything.

As per the dictionaries in Fortran, there are libraries for that. But I think that not having complex data structures in Fortran is a feature, not a bug. If you actually need these data structures, many Fortran programmers will tell you to just use a different language. You'll never hear such a plainly honest answer from Python programmers; they'll tell you instead to use bizarre libraries with unnatural interfaces, and you'll be forced to multiply matrices using a function "numpy.dot" or an ugly operator "@".

rbanffy · on Dec 29, 2020

> I disagree strongly with this. Numerical facilities should not be in the stdlib, they should be in the language itself,

Overloading the language with features that have limited use for most of their users would tie their release cycles and end up a disservice to both users of the language in general and those who use it for heavy number crunching.

As for @, my guess is that Python is running out of ASCII symbols for operators (and unwilling to go full APL with operators). I would imagine that, if NumPy arrays don't support @ (don't they?!), it'd be a desire to be compatible with pre 3.5, which is when the @ infix operator was introduced. It shouldn't be much trouble to implement __matmul__ and __rmatmul__ for them, assuming NumPy's own interfaces are sane.

zhd · on Jan 8, 2021

Numpy arrays do indeed support the @-operator for matrix multiplication.

In fact, one of the common design questions when adding the @-operator to Python was "why do this, when it's not used anywhere in the core language or standard library?" - and the answer was "because it's important for Numpy and the numerical user community".

(and finally, adding __matmul__ and __rmatmul__ is entirely backwards compatible; you can use objects with those methods on Python 3.4 - it's only `A @ B` which is a syntax error)

pletnes · on Dec 29, 2020

Very good argument for having that in the python language, that is true. It’s an even bigger ask, though.

In Fortran you often need such data structures to keep track of results or to do IO. It’s a pain and I’ve so far not seen a usable library for it.

fishmaster · on Dec 29, 2020

This is exactly why I'm going to Julia. It either has these constructs in the language itself or is powerful enough to express them adequately. Numpy is a necessary crutch for Python.

pc86 · on Dec 29, 2020

You're going to Julia so that you can avoid typing "import numpy as np" at the beginning of a file? Or is there more to it?

dklend122 · on Dec 30, 2020

Mathematical programming in Julia is just more natural and integrated with the language and type system.

From dot broadcasting, to multiple dispatch, to parametric types (if you want ), multidimensional array comprehensions. No worrying about 4 different container types (list, np array, pytorch tensor, symbolic etc etc)

Everything works with one array abstraction, including gpu and multi threaded

fishmaster · on Dec 31, 2020

This is what I mean with it being more natural and elegant in Julia. Also, "@code_warntype" is incredibly useful for efficient code.

axi1 · on Dec 30, 2020

As a sidenote, I don't write "import numpy as np" in jupyter notebooks (at least as long as I don't share them with others) for I have this line in my ipython startup file.

fishmaster · on Dec 29, 2020

Also so that I don't have to use things like "np.dot(A, B)" and similar annoyances.

jofer · on Dec 29, 2020

I've used fortran, C/C++, matlab, and IDL for many years but primarily work in python these days. I really do think numpy gets the interface correct. Modern fortran is quite nice as well, but I'm a _big_ fan of the way numpy does broadcasting. I think this is one thing it gets _much_ more correct than, say, matlab does. Also, the fact that numpy arrays are semi-contiguous chunks of memory just like a C arrays makes them _very_ easy to reason about. It's easy to predict memory usage and understand what operations will make temporary copies (assuming you understand how numpy works, anyway).

The way numpy uses python's built in operators makes it feel very native. I really don't find it awkward at all.

I used numeric and numarray in the pre-numpy days, and those did feel more "bolted on". While numpy is really similar to numeric, a lot of little things were fixed during the transition to make numpy very much a native part of python.

Everyone keeps insisting that numpy/scipy/etc are all really "just written in other languages with a python interface", but that's _far_ from being true. Core parts are (e.g. ndarray itself + built-in ufuncs), and a lot more is wrappers around widely-used libraries (think BLAS), but you might be surprised at how much of numpy really is python. Ditto for scipy (though for purely historical reasons, scipy is a dumping ground for X-implemented-in-C stuff). Ditto for things like skimage/sklearn/etc. The bulk of it really is implemented in python. Sure, operations like + aren't, but it might be worth looking through the numpy codebase before stating that it's not written in python.

Next, yes, if you naievely loop over a python array it will be slow, and this is a real limitation of the language + numpy model. Yes, you need to use a different mental model when implementing things. Some problems (e.g. finite-difference-like problems) are best implemented in a way that has poor performance with python+numpy. You do need to drop down to fortran/C/etc (cython is _great_ for those cases) when you have problems that can't be modelled in a particular way. However, that's not the limitation most folks seem to think it is. It's a well-known limitation that there's a ton of support for switching approaches when you hit it (e.g. cython, numba, f2py, etc). Yes, fortran and julia are both nicer in those cases (though my experience with julia is mostly "toy"/learning projects). However, the problems where python+numpy breaks down performance-wise are not anywhere near as common as folks seem to think they are. In my experience, the other types of non-scientific problems are _much_ more common, so it's nice to have a very general purpose language to draw on.

As for matrix multiplication, is it really that bad to do things like x.T.dot(y) instead of x' * y? I really don't understand the argument that the `dot` method of an ndarray is ugly. Also, I actually hate that matlab/et al treat any 2d array as though it's a matrix and fundamentally don't have the concept of a 1d array (row vectors and column vectors are 2d, not 1d). I really do want a 1d array and not a row vector or a column vector most of the time. That's simply impossible in matlab. Also, if you want that behavior in python, it's absolutely possible! Use `np.matrix` instead of `np.array`. Arrays don't behave that way because element-wise operations are more common in practice.

I get the impression you're looking at things from a Julia perspective. Julia is great, but similar to matlab and fortran it falls apart when you start to move outside its core domain. I wouldn't want to write a web service in Julia (though I'm sure it's possible). I do very frequently need to implement web services that do relatively heavy number crunching or deal with things that are fundamentally math-on-big-arrays-of-numbers. Python is _great_ for that type of application. It's also actually pretty good for many "hardcore" number crunching tasks. I've done things where python falls short, sure. However, python + numpy/et al hits a sweet spot that other languages currently don't in what it combines. Domain specific languages are better in some ways, but surprisingly little of a scientific codebase winds up being pure numerical operations. There's an awful lot of other stuff in there, and that's where the scientific python stack is incredibly effective.

amkkma · on Dec 29, 2020

Julia is perfectly fine for non numerical programming, better in ways than python inheritance spaghetti.

See: https://github.com/GenieFramework/Genie.jl

Multiple dispatch + parametric types is a very general and powerful programming model.

It doesn't all apart at all, and anything python is a pycall away.

fouronnes3 · on Dec 29, 2020

In my opinion it's math not python which got the matrix multiply operator wrong. It's not commutative so it shouldn't be written with the same operator as the regular product. It's abuse of notation again from math notation.

enriquto · on Dec 29, 2020

> In my opinion it's math not python which got the matrix multiply operator wrong. It's not commutative so it shouldn't be written with the same operator as the regular product.

The "regular product" in math is never intended to be commutative (e.g., the product in a group or in a ring, and that includes the matrix product). If you want to indicate that an operator is commutative, in math, you use always the "+" operator. Thus, it is python that gets it wrong by using a blatantly commutative operator for non-commutative operations such as string concatenation. That python was designed by a mathematician adds sadness to this notational tragedy.

michaelrpeskin · on Dec 30, 2020

+1 for mentioning IDL. I’ve always thought that was the best native array based language.

beagle3 · on Dec 30, 2020

I don’t know what your metric for being best is, but APL and K do arrays more natively and completely than anything else.

nxpnsv · on Dec 29, 2020

I really don’t care if np is not part of python dist. Install is trivial and It does what it does really well and is very usable.

emmelaich · on Dec 29, 2020

elaborate please

axi1 · on Dec 29, 2020

Friends link: https://medium.com/better-programming/numpy-illustrated-the-...

knolan · on Dec 29, 2020

I guess it isn’t popular in these parts, but there is a simplicity to Matlab that’s hard to beat for lowly mechanical engineers. The documentation is also generally excellent.

Unfortunately most undergraduates get locked in early with free student versions and continue to use it in industry. Currently I’m trying to expose my students to both Matlab and Python/numpy in a matter of fact manner.

sfifs · on Dec 29, 2020

I have found that in any situation that goes beyond one off research or experimentation into something that needs to be orchestrated oe deployed in production, you need to move to a language with a lot of bells and whistles libraries which are usually better accessed proceedurally rather than functionally. This is because at that stage, your users are no longer people who understand your code base or are even know programming necessarily.

You can do algorithm development in MATLAB or R or Julia or whatever and figure out ways to either translate the logic to Python/C++ later or setup some kind of IPC for production software engineers to call (platform/orchestration stuff is usually Python or Java)...

...OR you can just develop your algorithms in Python and keep code comprehension and platform integration much simpler for everyone

analog31 · on Dec 29, 2020

I'm a senior member of a small industry R&D team that's adjacent to a product development department. New engineering recruits have typically been exposed to Matlab. But they cover a pretty wide range in terms of their actual aptitude and interest in it. Only a fraction of people can really learn to program, and I don't think we understand why.

What I've noticed is that the ones who have gained some proficiency in programming can switch to Python in a jiffy. So I don't think it's a life or death choice.

pc86 · on Dec 29, 2020

In my experience, anyone can learn to program but only a fraction of people can learn enough or get good enough to move into senior and beyond roles. You can learn the language syntax, and have enough best practices (that you may or may not understand) jammed into your head, that you are still a net benefit to the team. But the folks who move into senior/staff/principal roles either have a natural aptitude for it, or spend a lot more than 40 hours a week on it, or both.

analog31 · on Dec 29, 2020

Indeed, what I'm guessing is that we're not talking about people who are destined to become software developers at all, since those folks tend to learn something other than Matlab. In my own case, I'm at a "principal" level for reasons other than programming, at least I hope. ;-)

The use of Python in my work group is largely as a problem solving and prototyping tool. The product development teams that we serve have their own software departments, with their own choices of languages and so forth.

sgillen · on Dec 29, 2020

Yeah, if your problem is mostly banging around matrices, Matlab feels really great (it is, after all, Matrix Laboratory). I prefer python for the most part, but it does feel a lot more verbose in comparison whenever you are in Matlabs wheelhouse.

Not to mention the IDE is actually very good, and you almost never have to deal with package management etc.

enriquto · on Dec 29, 2020

> Unfortunately most undergraduates get locked in early with free student versions and continue to use it in industry.

Why no octave? Or julia?

pc86 · on Dec 29, 2020

I think the question is "why Julia?" instead of "why not Julia?" Python is the de facto standard, for better or worse. If someone wants to unseat it, they need a compelling reason. You shouldn't put the onus on folks defaulting to the industry standard to justify their choice over a much less well-known competitor.

dklend122 · on Dec 30, 2020

There are many compelling reasons. Doesn't seem that way if Julia is just construed as a faster python with some Matlab.

Goes far beyond that

pjmlp · on Dec 30, 2020

Depends on how much people love to write "Python" that is actually C with Python bindings.

divtiwari · on Dec 29, 2020

I'm afraid Julia will always be ignored in favour of Python. Python/C++ have a firm grip over its niche for years to come.

stjohnswarts · on Dec 30, 2020

I hope you're just alternating between the two rather than making them double up on homework. As a student I would find that quite annoying.

knolan · on Dec 30, 2020

Definitely not doubling up. Some parts of my modules are in Jupyter Notebooks, others inherited PowerPoint that I’m slowly phasing out. I’ve added various examples where I show how a particular equation might be plotted with Matlab and better understood.

Schiphol · on Dec 29, 2020

This is fantastic, I will share it with students. A very minor comment:

>So, there’s a total of three types of vectors in NumPy: 1D arrays, 2D row vectors, and 2D column vectors.

I think you can have 'vectors' of arbitrarily high dimension, by adding empty axes, or doing, say, .reshape(-1, 1, 1, 1).

I would also tie this point with the rule for broadcasting that dimensions which do not match need to have a value of one. I'm endlessly confused about broadcasting, and find the explicit rule useful.

zhd · on Jan 8, 2021

Alas, Numpy arrays are limited to thirty-two dimensions.

This is fine, because with high numbers of dimensions you really can't afford to store a dense-matrix representation in RAM and 32 is plenty for any low-dimensional problem.

axi1 · on Dec 31, 2020

I've finally fixed the article, thanks. Yes, broadcasting is a cool thing, but the docs mostly explains 'how' but not 'why' or 'what for'.

Schiphol · on Dec 31, 2020

I'm glad I could be of help!

> Yes, broadcasting is a cool thing, but the docs mostly explains 'how' but not 'why' or 'what for'.

I'd say that, even for the 'how', thinking "OK, I have a [1, 1, 2, 2, 1] and a [3, 2, 1, 2, 5] arrays, which means that element-wise multiplication would result in a [3, 2, 2, 2, 5] array" is very useful in reasoning about broadcasting.

Just relying on simple examples and trying to build an intuition for the general case (which is the route favored in the numpy docs, I'd say) doesn't quite work, at least for me.

axi1 · on Dec 29, 2020

Thanks, good point! I'll think how to fit it best into the article.

Ultcyber · on Dec 29, 2020

(in np.sum)

> The value of the axis argument is, as a matter of fact, the number of the index in question: The first index is axis=0, the second one is axis=1, and so on

If I'm not wrong - this statement is wrong - the axis does not represent index (or otherwise it would be axis 0 = row, axis 1 = column). The actual idea behind is a little unintuitive, but explained well here: https://aerinykim.medium.com/numpy-sum-axis-intuition-6eb949...

svara · on Dec 29, 2020

It's explained in more detail further down in that paragraph, I think the wording is fine.

The axis argument gives the index of the axis along which your summate, and which therefore disappears after summation.

There might be a confusion about whether "axis 0" is rows or columns? Its length is the number of rows, but it "points" along columns.

I prefer just calling them axis 0, 1, 2, ... and avoid thinking about rows and columns in numpy, I've found that sometimes avoids confusion.

throwawayiionqz · on Dec 29, 2020

I could never grasp the conventions for "axis=". Everyone has some memory trick, some of them being confusing/wrong.

On the other hand einsum is Crystal clear and not prone to confusion.

> np.einsum("ij -> j", B) # sum along rows to create one column-like array

> np.einsum("ij -> i", B) # sum along columns to create one row-like array

Edit: More Einstein sum fun at https://stackoverflow.com/questions/26089893/understanding-n...

mbeex · on Dec 29, 2020

Never could I. What helps me often is to consider the C language heritage of Python. There, the beginner also has the slightly confusing some_var[y][x]=some_value, caused by the computers memory model behind. Consequently, I'm always looking for the hierarchy of fastest changing indexes. This explains/justifies a lot of the design decisions made also for numpy.

bonoboTP · on Dec 29, 2020

It's simple. It's the axis that will disappear (or become length 1 with keepdims) after performing the operation.

throwawayiionqz · on Dec 29, 2020

It's not simple because there are two axis of interest: the axis along you sum, and the axis that are preserved (ie, that gives the dimension of the returned array).

Both of them are of interest and after you were confused once about which one to supply to axis=..., there is no way back to clear the confusion. With einsum there is no confusion.

evanb · on Dec 29, 2020

If you have a high-dimensional array then in your latter convention if you had to specify the “remaining” axes you’d need to provide a lengthy list. That’d be user-hostile.

Schiphol · on Dec 29, 2020

The usual matrix notation is a_ij for the element in the ith row and the jth column. If you order axes in this way you get the numpy convention.

Ultcyber · on Dec 29, 2020

that's what I mean - in this sense, the axis 0 should be row, but instead it's column in numpy

Schiphol · on Dec 29, 2020

The way I think about it is: as you change the 0th index you move up and down the column. Same with the 1st index and the row. That's how I remember it anyway!

bonoboTP · on Dec 29, 2020

Think about multidimensional arrays, not just matrices. How would your convention work there?

geoalchimista · on Dec 30, 2020

Off-topic: I couldn't help noticing that there is such a recurrent pattern that under every NumPy-related topic, there are fervent disciples of Julia who prophecy that the end (of Python) is nigh and you should all be converting to Julia.

But why not use whatever suits your problem the best? I use both Python and Julia in my research, and sometimes R when I need some niche statistical packages. "My language is superior to your language" is such a self-limiting mindset.

dklend122 · on Dec 30, 2020

It wasn't too long ago that the reverse happened.

People are excited about Julia, because frankly it's amazing and a breath of fresh air coming from python. Also, it's really now starting to pick up steam.

Curious though, how are limited by using Julia? Any python or r package is a pycall/rcall away.

Otherwise it's faster, more ergonomic and makes coding fun again

amznbyebyebye · on Dec 29, 2020

Wow this is great. Perfect example of how something simple like this can be super helpful to new users. I learn super well when there’s lots of visuals. Thanks!!!

metalrain · on Dec 29, 2020

Great article full of examples, learned quite a bit new things.

Personally I don't quite get why something like

a[a > 5] = 0

is possible (a is NumPy array).

I think it is confusing to use same name for element wise comparison and the array itself. Are comparison operators overloaded to generate this kind of predicate function or view in other contexts as well or is this some sort of special case that is handled by indexing (__getitem__ call)?

niklasbuschmann · on Dec 29, 2020

It’s the array accessing that is overloaded. a > 5 is a boolean array with the same dimension as a, and now the overloaded method for array access gets called with this boolean array.

I also found it incredibly confusing the first time I saw this, but like all the numpy magic based on overloading, it’s confusing the first time you see it, but incredibly handy afterwards.

metalrain · on Dec 29, 2020

That kind of makes sense. You use logical expressions to build mask that you use to generate view in the outer expression.

Schiphol · on Dec 29, 2020

a > 5 results in a boolean array, via 'broadcasting'. These are rules that allow numpy to interpret the 5 as an array with the same dimensions as a, but full of 5s.

fantod · on Dec 29, 2020

Both are overloaded (__gt__ and __getitem__). My question to you is: How would expect this to be written (in Python)?

metalrain · on Dec 29, 2020

I guess lambda is too verbose

a[lambda i: i > 5] = 0

Probably would expect something like this

np.set(a, lambda i: i > 5, 0)

But that might be too cumbersome when combining values from multiple arrays.

fantod · on Dec 29, 2020

You made me realize, I think of the brackets as saying "where". vec[2] is vec where the index is 2. vec[vec > 5] is vec where the entries of vec are greater than 5, etc.

beagle3 · on Dec 30, 2020

K makes this explicit with the ‘where’ function which converts a Boolean array into the list of indices of true values. Thus,

    x[4 5 6] is index access (“at”); also x@4 5 6
    
    x[&x>3] is “x at where x is greater than 3” ; can also be written x@&x>3

febstar · on Dec 29, 2020

Is the random matrix generation section in here up to date? The official numpy docs [0] seem to indicate that `randn` is legacy, and that `randint` is deprecated.

[0] https://numpy.org/doc/stable/reference/random/index.html

axi1 · on Dec 29, 2020

Yes, I'm aware that are preparing a mini-revolution in the random number generation. But I was waiting for the interface to settle down a bit prior to including it. I didn't know they were officially deprecated though, thanks for the info! Actually, I don't believe they will ever remove those legacy functions, though, as matlab is still way more popular than numpy overall.

Niccizero · on Dec 29, 2020

Archive.is link: https://archive.is/WAZKr

proverbialbunny · on Dec 29, 2020

>Careful though; Python “ternary” comparisons like 3<=a<=5 don’t work here.

Anyone have any idea why? I imagine if they could support it they would support it. I wonder if Python doesn't allow this kind of operator overloading?

adamjb · on Dec 29, 2020

Here's the rejected PEP for operator overloading.

https://www.python.org/dev/peps/pep-0335/

and deferred PEP for rich comparison chaining specifically

https://www.python.org/dev/peps/pep-0535/

The central issue is that `3<=a<=5` expands to `3<=a and a<=5`, and `and` coerces the variables to booleans in order to work.

proverbialbunny · on Dec 29, 2020

>and deferred PEP for rich comparison chaining specifically

>https://www.python.org/dev/peps/pep-0535/

Oh nice! Thank you for sharing. I'm sure the functionality will eventually come.

roudaki · on Dec 29, 2020

I have been doing math in python for a year and this explains so much. I feel so stupid now. Thank you for this from self thought dev.

axi1 · on Dec 29, 2020

Apparently there're problems with overloading this. In pandas it is between(left,right), so they also didn't find a way to get it done properly.

pansa2 · on Dec 29, 2020

`3 <= a <= 5` expands to `3 <= a and a <= 5`. However, based on the examples NumPy seems to require `&` instead of `and` - I’m not sure why.

Pinus · on Dec 29, 2020

A type can't implement its own 'and' and 'or' operators. (Probably because it is non-obvious how to handle their short-circuiting behaviour.)

tobmlt · on Dec 29, 2020

I have a friend who complains that that it would be nice, on some occasions, to specify the shape of the result from a numpy computation. -that it is all to easy to duck type your way into a linear algebra bug that isn’t easily caught.

My response is that, in those cases, go ahead and declare the array you want to catch the result (and it’s shape), then use [:] etc. appropriately to ensure that if the size doesn’t match between computed and expected at the point of assignment, it will fail.

tgb · on Dec 29, 2020

One thing to add: the search section is missing the argsort function, which is invaluable in my experience. It's how you work around the lack of key sorting, for example.

Edit: though now I see it is brought up later on in the matrix section.

sirpunch · on Dec 29, 2020

Probably I'm not reading it right. In 2D, .sum(axis=0) operation is column-wise but for .insert() and .delete() it seems row-wise?

sirpunch · on Dec 29, 2020

I mean the insert and delete operations with axis value set to 0.

axi1 · on Dec 29, 2020

It probably makes more sense if you think of matrices as n-dimentional arrays: sum eliminates the n-th dimention, insert and delete increase or decrease the n-th dimension size.

giu · on Dec 29, 2020

Very good point. Additionally, with insert and delete we might need to apply the rules of broadcasting [0].

For example, if I have a 2-dimensional array and I insert a scalar value by using axis=0, the scalar value needs to be broadcasted to match the column dimension of the array:

  >>> a = np.array([[1,2],[3,4],[5,6]])
  >>> np.insert(a, 1, 11, axis=0)
  array([[ 1,  2],
         [11, 11],
         [ 3,  4],
         [ 5,  6]])

[0] https://jakevdp.github.io/PythonDataScienceHandbook/02.05-co...

baby · on Dec 29, 2020

Flagged due to soft paywall

axi1 · on Dec 29, 2020

There're two ways to walk it around in the comments here: friends link and archive.is

blackbear_ · on Dec 29, 2020

Disabled javascript also works

jasode · on Dec 29, 2020

>Disabled javascript also works

This isn't true. I just tested it with Javascript disabled. After about 5 articles, the text is greyed out and you can't read the rest of of the article:

https://imgur.com/a/KCF9yPE

blackbear_ · on Dec 29, 2020

Try to disable cookies as well ;)

platelets2020 · on Dec 29, 2020

As far as I can tell, this "visual guide" is nothing more than:

1. Taking screenshots of code.

2. Taking screenshots of code and putting the array values inside of blue boxes.

lhomdee · on Dec 29, 2020

Wow those diagrams are extremely well thought out and clear to understand. Saying they’re just blue boxes is a huge understatement. Visualising highly conceptual topics like numerical programming is not an easy skill, and the more the better.

systemvoltage · on Dec 29, 2020

Your comment is just a bunch of letters typed out as words in a sentence without any sort of insight and just providing a reductionist view.

It’s so much more than that. This is not what HN community is about.

axi1 · on Dec 29, 2020

Actually, it also has

3. A thought-provoking text in between, but apparently didn't hit the target in your case.

Ultcyber · on Dec 29, 2020

Yes. Does it mean it's bad? For people that just start to learn the library, it's sometimes useful to have a visual representation. Also, helpful to have a list of main features.

turndown · on Dec 29, 2020

The second half of #2 qualifies as fine art to my meager artistic abilities.

andai · on Dec 29, 2020

Need it be more?

randomfool · on Dec 29, 2020

Yes! Text which can be copy pasted!

axi1 · on Dec 29, 2020

3245 words should be more than enough for that :) My greatest concern about the text that it would be too much of it for something called "Visual Guide", so I tried hard to keep it to bare minimum.