Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My understanding is that the Julia community is quite interested in having SIMD via e.g. AVX “just work”. I recall reading this post on it a while back: https://juliacomputing.com/blog/2017/09/27/auto-vectorizatio...


Sure, but you can only do that if either that's a way to express what you want directly or, in the example you gave, there's a common idiomatic style that the compiler can recognize and handle.

What is the idiomatic way to write the popcount of the intersection of two 256-byte byte strings? My C code is:

  static int
  byte_intersect_256(const unsigned char *fp1, const unsigned char *fp2) {
      int num_words = 2048 / 64;
      int intersect_popcount = 0;

      /* Interpret as 64-bit integers and assume possible mis-alignment is okay. */
      uint64_t *fp1_64 = (uint64_t *) fp1, *fp2_64 = (uint64_t *) fp2;

      for (int i=0; i<num_words; i++) {
          intersect_popcount += __builtin_popcountll(fp1_64[i] & fp2_64[i]);
      }
      return intersect_popcount;
  }
I haven't figured out the Julia way to write it so it would use the POPCNT instruction (if available), the AVX2 popcount technique (if available), or the VPOPCNTDQ AVX-512 instruction (if available) - falling back, I suppose, to the SSSE3 and Lauradoux implementations - the last being the fastest generic C implementation I found. (See https://jcheminf.biomedcentral.com/articles/10.1186/s13321-0... ).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: