Module core::arch::x86
[−]
[src]
stdsimd
)Platformspecific intrinsics for the x86
platform.
See the module documentation for more details.
Structs
CpuidResult 
[ Experimental ] [x86 ] Result of the 
__m64 
[ Experimental ] [x86 ] 64bit wide integer vector type, x86specific 
__m128 
[ Experimental ] [x86 ] 128bit wide set of four 
__m256 
[ Experimental ] [x86 ] 256bit wide set of eight 
__m128d 
[ Experimental ] [x86 ] 128bit wide set of two 
__m128i 
[ Experimental ] [x86 ] 128bit wide integer vector type, x86specific 
__m256d 
[ Experimental ] [x86 ] 256bit wide set of four 
__m256i 
[ Experimental ] [x86 ] 256bit wide integer vector type, x86specific 
Constants
_CMP_EQ_OQ 
[ Experimental ] [x86 ] Equal (ordered, nonsignaling) 
_CMP_EQ_OS 
[ Experimental ] [x86 ] Equal (ordered, signaling) 
_CMP_EQ_UQ 
[ Experimental ] [x86 ] Equal (unordered, nonsignaling) 
_CMP_EQ_US 
[ Experimental ] [x86 ] Equal (unordered, signaling) 
_CMP_FALSE_OQ 
[ Experimental ] [x86 ] False (ordered, nonsignaling) 
_CMP_FALSE_OS 
[ Experimental ] [x86 ] False (ordered, signaling) 
_CMP_GE_OQ 
[ Experimental ] [x86 ] Greaterthanorequal (ordered, nonsignaling) 
_CMP_GE_OS 
[ Experimental ] [x86 ] Greaterthanorequal (ordered, signaling) 
_CMP_GT_OQ 
[ Experimental ] [x86 ] Greaterthan (ordered, nonsignaling) 
_CMP_GT_OS 
[ Experimental ] [x86 ] Greaterthan (ordered, signaling) 
_CMP_LE_OQ 
[ Experimental ] [x86 ] Lessthanorequal (ordered, nonsignaling) 
_CMP_LE_OS 
[ Experimental ] [x86 ] Lessthanorequal (ordered, signaling) 
_CMP_LT_OQ 
[ Experimental ] [x86 ] Lessthan (ordered, nonsignaling) 
_CMP_LT_OS 
[ Experimental ] [x86 ] Lessthan (ordered, signaling) 
_CMP_NEQ_OQ 
[ Experimental ] [x86 ] Notequal (ordered, nonsignaling) 
_CMP_NEQ_OS 
[ Experimental ] [x86 ] Notequal (ordered, signaling) 
_CMP_NEQ_UQ 
[ Experimental ] [x86 ] Notequal (unordered, nonsignaling) 
_CMP_NEQ_US 
[ Experimental ] [x86 ] Notequal (unordered, signaling) 
_CMP_NGE_UQ 
[ Experimental ] [x86 ] Notgreaterthanorequal (unordered, nonsignaling) 
_CMP_NGE_US 
[ Experimental ] [x86 ] Notgreaterthanorequal (unordered, signaling) 
_CMP_NGT_UQ 
[ Experimental ] [x86 ] Notgreaterthan (unordered, nonsignaling) 
_CMP_NGT_US 
[ Experimental ] [x86 ] Notgreaterthan (unordered, signaling) 
_CMP_NLE_UQ 
[ Experimental ] [x86 ] Notlessthanorequal (unordered, nonsignaling) 
_CMP_NLE_US 
[ Experimental ] [x86 ] Notlessthanorequal (unordered, signaling) 
_CMP_NLT_UQ 
[ Experimental ] [x86 ] Notlessthan (unordered, nonsignaling) 
_CMP_NLT_US 
[ Experimental ] [x86 ] Notlessthan (unordered, signaling) 
_CMP_ORD_Q 
[ Experimental ] [x86 ] Ordered (nonsignaling) 
_CMP_ORD_S 
[ Experimental ] [x86 ] Ordered (signaling) 
_CMP_TRUE_UQ 
[ Experimental ] [x86 ] True (unordered, nonsignaling) 
_CMP_TRUE_US 
[ Experimental ] [x86 ] True (unordered, signaling) 
_CMP_UNORD_Q 
[ Experimental ] [x86 ] Unordered (nonsignaling) 
_CMP_UNORD_S 
[ Experimental ] [x86 ] Unordered (signaling) 
_MM_EXCEPT_DENORM 
[ Experimental ] [x86 ] See 
_MM_EXCEPT_DIV_ZERO 
[ Experimental ] [x86 ] See 
_MM_EXCEPT_INEXACT 
[ Experimental ] [x86 ] See 
_MM_EXCEPT_INVALID 
[ Experimental ] [x86 ] See 
_MM_EXCEPT_MASK 
[ Experimental ] [x86 ]

_MM_EXCEPT_OVERFLOW 
[ Experimental ] [x86 ] See 
_MM_EXCEPT_UNDERFLOW 
[ Experimental ] [x86 ] See 
_MM_FLUSH_ZERO_MASK 
[ Experimental ] [x86 ]

_MM_FLUSH_ZERO_OFF 
[ Experimental ] [x86 ] See 
_MM_FLUSH_ZERO_ON 
[ Experimental ] [x86 ] See 
_MM_FROUND_CEIL 
[ Experimental ] [x86 ] round up and do not suppress exceptions 
_MM_FROUND_CUR_DIRECTION 
[ Experimental ] [x86 ] use MXCSR.RC; see 
_MM_FROUND_FLOOR 
[ Experimental ] [x86 ] round down and do not suppress exceptions 
_MM_FROUND_NEARBYINT 
[ Experimental ] [x86 ] use MXCSR.RC and suppress exceptions; see 
_MM_FROUND_NINT 
[ Experimental ] [x86 ] round to nearest and do not suppress exceptions 
_MM_FROUND_NO_EXC 
[ Experimental ] [x86 ] suppress exceptions 
_MM_FROUND_RAISE_EXC 
[ Experimental ] [x86 ] do not suppress exceptions 
_MM_FROUND_RINT 
[ Experimental ] [x86 ] use MXCSR.RC and do not suppress exceptions; see

_MM_FROUND_TO_NEAREST_INT 
[ Experimental ] [x86 ] round to nearest 
_MM_FROUND_TO_NEG_INF 
[ Experimental ] [x86 ] round down 
_MM_FROUND_TO_POS_INF 
[ Experimental ] [x86 ] round up 
_MM_FROUND_TO_ZERO 
[ Experimental ] [x86 ] truncate 
_MM_FROUND_TRUNC 
[ Experimental ] [x86 ] truncate and do not suppress exceptions 
_MM_HINT_NTA 
[ Experimental ] [x86 ] See 
_MM_HINT_T0 
[ Experimental ] [x86 ] See 
_MM_HINT_T1 
[ Experimental ] [x86 ] See 
_MM_HINT_T2 
[ Experimental ] [x86 ] See 
_MM_MASK_DENORM 
[ Experimental ] [x86 ] See 
_MM_MASK_DIV_ZERO 
[ Experimental ] [x86 ] See 
_MM_MASK_INEXACT 
[ Experimental ] [x86 ] See 
_MM_MASK_INVALID 
[ Experimental ] [x86 ] See 
_MM_MASK_MASK 
[ Experimental ] [x86 ]

_MM_MASK_OVERFLOW 
[ Experimental ] [x86 ] See 
_MM_MASK_UNDERFLOW 
[ Experimental ] [x86 ] See 
_MM_ROUND_DOWN 
[ Experimental ] [x86 ] See 
_MM_ROUND_MASK 
[ Experimental ] [x86 ]

_MM_ROUND_NEAREST 
[ Experimental ] [x86 ] See 
_MM_ROUND_TOWARD_ZERO 
[ Experimental ] [x86 ] See 
_MM_ROUND_UP 
[ Experimental ] [x86 ] See 
_SIDD_BIT_MASK 
[ Experimental ] [x86 ] Mask only: return the bit mask 
_SIDD_CMP_EQUAL_ANY 
[ Experimental ] [x86 ] For each character in 
_SIDD_CMP_EQUAL_EACH 
[ Experimental ] [x86 ] The strings defined by 
_SIDD_CMP_EQUAL_ORDERED 
[ Experimental ] [x86 ] Search for the defined substring in the target 
_SIDD_CMP_RANGES 
[ Experimental ] [x86 ] For each character in 
_SIDD_LEAST_SIGNIFICANT 
[ Experimental ] [x86 ] Index only: return the least significant bit (Default) 
_SIDD_MASKED_NEGATIVE_POLARITY 
[ Experimental ] [x86 ] Negate results only before the end of the string 
_SIDD_MASKED_POSITIVE_POLARITY 
[ Experimental ] [x86 ] Do not negate results before the end of the string 
_SIDD_MOST_SIGNIFICANT 
[ Experimental ] [x86 ] Index only: return the most significant bit 
_SIDD_NEGATIVE_POLARITY 
[ Experimental ] [x86 ] Negate results 
_SIDD_POSITIVE_POLARITY 
[ Experimental ] [x86 ] Do not negate results (Default) 
_SIDD_SBYTE_OPS 
[ Experimental ] [x86 ] String contains signed 8bit characters 
_SIDD_SWORD_OPS 
[ Experimental ] [x86 ] String contains unsigned 16bit characters 
_SIDD_UBYTE_OPS 
[ Experimental ] [x86 ] String contains unsigned 8bit characters (Default) 
_SIDD_UNIT_MASK 
[ Experimental ] [x86 ] Mask only: return the byte mask 
_SIDD_UWORD_OPS 
[ Experimental ] [x86 ] String contains unsigned 16bit characters 
_XCR_XFEATURE_ENABLED_MASK 
[ Experimental ] [x86 ]

Functions
_MM_GET_EXCEPTION_MASK^{⚠} 
[ Experimental ] [x86 and target feature ] sse See 
_MM_GET_EXCEPTION_STATE^{⚠} 
[ Experimental ] [x86 and target feature ] sse See 
_MM_GET_FLUSH_ZERO_MODE^{⚠} 
[ Experimental ] [x86 and target feature ] sse See 
_MM_GET_ROUNDING_MODE^{⚠} 
[ Experimental ] [x86 and target feature ] sse See 
_MM_SET_EXCEPTION_MASK^{⚠} 
[ Experimental ] [x86 and target feature ] sse See 
_MM_SET_EXCEPTION_STATE^{⚠} 
[ Experimental ] [x86 and target feature ] sse See 
_MM_SET_FLUSH_ZERO_MODE^{⚠} 
[ Experimental ] [x86 and target feature ] sse See 
_MM_SET_ROUNDING_MODE^{⚠} 
[ Experimental ] [x86 and target feature ] sse See 
_MM_TRANSPOSE4_PS^{⚠} 
[ Experimental ] [x86 and target feature ] sse Transpose the 4x4 matrix formed by 4 rows of __m128 in place. 
__cpuid^{⚠} 
[ Experimental ] [x86 ] See 
__cpuid_count^{⚠} 
[ Experimental ] [x86 ] Returns the result of the 
__get_cpuid_max^{⚠} 
[ Experimental ] [x86 ] Returns the highestsupported 
__rdtscp^{⚠} 
[ Experimental ] [x86 ] Reads the current value of the processor’s timestamp counter and
the 
__readeflags^{⚠} 
[ Experimental ] [x86 ] Reads EFLAGS. 
__writeeflags^{⚠} 
[ Experimental ] [x86 ] Write EFLAGS. 
_andn_u32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi1 Bitwise logical 
_bextr2_u32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi1 Extracts bits of 
_bextr_u32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi1 Extracts bits in range [ 
_blcfill_u32^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Clears all bits below the least significant zero bit of 
_blcfill_u64^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Clears all bits below the least significant zero bit of 
_blci_u32^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets all bits of 
_blci_u64^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets all bits of 
_blcic_u32^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets the least significant zero bit of 
_blcic_u64^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets the least significant zero bit of 
_blcmsk_u32^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets the least significant zero bit of 
_blcmsk_u64^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets the least significant zero bit of 
_blcs_u32^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets the least significant zero bit of 
_blcs_u64^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets the least significant zero bit of 
_blsfill_u32^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets all bits of 
_blsfill_u64^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets all bits of 
_blsi_u32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi1 Extract lowest set isolated bit. 
_blsic_u32^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Clears least significant bit and sets all other bits. 
_blsic_u64^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Clears least significant bit and sets all other bits. 
_blsmsk_u32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi1 Get mask up to lowest set bit. 
_blsr_u32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi1 Resets the lowest set bit of 
_bswap^{⚠} 
[ Experimental ] [x86 ] Return an integer with the reversed byte order of x 
_bzhi_u32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi2 Zero higher bits of 
_lzcnt_u32^{⚠} 
[ Experimental ] [x86 and target feature ] lzcnt Counts the leading most significant zero bits. 
_m_maskmovq^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Conditionally copies the values from each 8bit element in the first 64bit integer vector operand to the specified memory location, as specified by the most significant bit in the corresponding element in the second 64bit integer vector operand. 
_m_paddb^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed 8bit integers in 
_m_paddd^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed 32bit integers in 
_m_paddsb^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed 8bit integers in 
_m_paddsw^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed 16bit integers in 
_m_paddusb^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed unsigned 8bit integers in 
_m_paddusw^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed unsigned 16bit integers in 
_m_paddw^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed 16bit integers in 
_m_pavgb^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Computes the rounded averages of the packed unsigned 8bit integer values and writes the averages to the corresponding bits in the destination. 
_m_pavgw^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Computes the rounded averages of the packed unsigned 16bit integer values and writes the averages to the corresponding bits in the destination. 
_m_pextrw^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Extracts 16bit element from a 64bit vector of [4 x i16] and returns it, as specified by the immediate integer operand. 
_m_pinsrw^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Copies data from the 64bit vector of [4 x i16] to the destination,
and inserts the lower 16bits of an integer operand at the 16bit offset
specified by the immediate operand 
_m_pmaxsw^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Compares the packed 16bit signed integers of 
_m_pmaxub^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Compares the packed 8bit signed integers of 
_m_pminsw^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Compares the packed 16bit signed integers of 
_m_pminub^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Compares the packed 8bit signed integers of 
_m_pmovmskb^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Takes the most significant bit from each 8bit element in a 64bit integer vector to create a 16bit mask value. Zeroextends the value to 32bit integer and writes it to the destination. 
_m_pmulhuw^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Multiplies packed 16bit unsigned integer values and writes the highorder 16 bits of each 32bit product to the corresponding bits in the destination. 
_m_psadbw^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Subtracts the corresponding 8bit unsigned integer values of the two 64bit vector operands and computes the absolute value for each of the difference. Then sum of the 8 absolute differences is written to the bits [15:0] of the destination; the remaining bits [63:16] are cleared. 
_m_pshufw^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Shuffles the 4 16bit integers from a 64bit integer vector to the destination, as specified by the immediate value operand. 
_m_psubb^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed 8bit integers in 
_m_psubd^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed 32bit integers in 
_m_psubsb^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed 8bit integers in 
_m_psubsw^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed 16bit integers in 
_m_psubusb^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed unsigned 8bit integers in 
_m_psubusw^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed unsigned 16bit integers in 
_m_psubw^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed 16bit integers in 
_mm256_abs_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Computes the absolute values of packed 8bit integers in 
_mm256_abs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Computes the absolute values of packed 16bit integers in 
_mm256_abs_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Computes the absolute values of packed 32bit integers in 
_mm256_add_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Add packed 8bit integers in 
_mm256_add_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Add packed 16bit integers in 
_mm256_add_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Add packed 32bit integers in 
_mm256_add_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Add packed 64bit integers in 
_mm256_add_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Add packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_add_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Add packed singleprecision (32bit) floatingpoint elements in 
_mm256_adds_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Add packed 8bit integers in 
_mm256_adds_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Add packed 16bit integers in 
_mm256_adds_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Add packed unsigned 8bit integers in 
_mm256_adds_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Add packed unsigned 16bit integers in 
_mm256_addsub_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Alternatively add and subtract packed doubleprecision (64bit)
floatingpoint elements in 
_mm256_addsub_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Alternatively add and subtract packed singleprecision (32bit)
floatingpoint elements in 
_mm256_alignr_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Concatenate pairs of 16byte blocks in 
_mm256_and_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of a packed doubleprecision (64bit)
floatingpoint elements
in 
_mm256_and_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of packed singleprecision (32bit) floatingpoint
elements in 
_mm256_and_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compute the bitwise AND of 256 bits (representing integer data)
in 
_mm256_andnot_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise NOT of packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_andnot_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise NOT of packed singleprecision (32bit) floatingpoint
elements in 
_mm256_andnot_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compute the bitwise NOT of 256 bits (representing integer data)
in 
_mm256_avg_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Average packed unsigned 8bit integers in 
_mm256_avg_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Average packed unsigned 16bit integers in 
_mm256_blend_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Blend packed 16bit integers from 
_mm256_blend_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Blend packed 32bit integers from 
_mm256_blend_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Blend packed doubleprecision (64bit) floatingpoint elements from

_mm256_blend_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Blend packed singleprecision (32bit) floatingpoint elements from

_mm256_blendv_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Blend packed 8bit integers from 
_mm256_blendv_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Blend packed doubleprecision (64bit) floatingpoint elements from

_mm256_blendv_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Blend packed singleprecision (32bit) floatingpoint elements from

_mm256_broadcast_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Broadcast 128 bits from memory (composed of 2 packed doubleprecision (64bit) floatingpoint elements) to all elements of the returned vector. 
_mm256_broadcast_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Broadcast 128 bits from memory (composed of 4 packed singleprecision (32bit) floatingpoint elements) to all elements of the returned vector. 
_mm256_broadcast_sd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Broadcast a doubleprecision (64bit) floatingpoint element from memory to all elements of the returned vector. 
_mm256_broadcast_ss^{⚠} 
[ Experimental ] [x86 and target feature ] avx Broadcast a singleprecision (32bit) floatingpoint element from memory to all elements of the returned vector. 
_mm256_broadcastb_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low packed 8bit integer from 
_mm256_broadcastd_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low packed 32bit integer from 
_mm256_broadcastq_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low packed 64bit integer from 
_mm256_broadcastsd_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low doubleprecision (64bit) floatingpoint element
from 
_mm256_broadcastsi128_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast 128 bits of integer data from a to all 128bit lanes in the 256bit returned value. 
_mm256_broadcastss_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low singleprecision (32bit) floatingpoint element
from 
_mm256_broadcastw_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low packed 16bit integer from a to all elements of the 256bit returned value 
_mm256_bslli_epi128^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift 128bit lanes in 
_mm256_bsrli_epi128^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift 128bit lanes in 
_mm256_castpd128_pd256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Casts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. 
_mm256_castpd256_pd128^{⚠} 
[ Experimental ] [x86 and target feature ] avx Casts vector of type __m256d to type __m128d. 
_mm256_castpd_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Cast vector of type __m256d to type __m256. 
_mm256_castpd_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Casts vector of type __m256d to type __m256i. 
_mm256_castps128_ps256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Casts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. 
_mm256_castps256_ps128^{⚠} 
[ Experimental ] [x86 and target feature ] avx Casts vector of type __m256 to type __m128. 
_mm256_castps_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Cast vector of type __m256 to type __m256d. 
_mm256_castps_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Casts vector of type __m256 to type __m256i. 
_mm256_castsi128_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Casts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. 
_mm256_castsi256_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Casts vector of type __m256i to type __m256d. 
_mm256_castsi256_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Casts vector of type __m256i to type __m256. 
_mm256_castsi256_si128^{⚠} 
[ Experimental ] [x86 and target feature ] avx Casts vector of type __m256i to type __m128i. 
_mm256_ceil_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Round packed doubleprecision (64bit) floating point elements in 
_mm256_ceil_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Round packed singleprecision (32bit) floating point elements in 
_mm256_cmp_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compare packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_cmp_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compare packed singleprecision (32bit) floatingpoint
elements in 
_mm256_cmpeq_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 8bit integers in 
_mm256_cmpeq_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 16bit integers in 
_mm256_cmpeq_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 32bit integers in 
_mm256_cmpeq_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 64bit integers in 
_mm256_cmpgt_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 8bit integers in 
_mm256_cmpgt_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 16bit integers in 
_mm256_cmpgt_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 32bit integers in 
_mm256_cmpgt_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 64bit integers in 
_mm256_cvtepi16_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Signextend 16bit integers to 32bit integers. 
_mm256_cvtepi16_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Signextend 16bit integers to 64bit integers. 
_mm256_cvtepi32_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Signextend 32bit integers to 64bit integers. 
_mm256_cvtepi32_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Convert packed 32bit integers in 
_mm256_cvtepi32_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Convert packed 32bit integers in 
_mm256_cvtepi8_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Signextend 8bit integers to 16bit integers. 
_mm256_cvtepi8_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Signextend 8bit integers to 32bit integers. 
_mm256_cvtepi8_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Signextend 8bit integers to 64bit integers. 
_mm256_cvtepu16_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Zero extend packed unsigned 16bit integers in 
_mm256_cvtepu16_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Zeroextend the lower four unsigned 16bit integers in 
_mm256_cvtepu32_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Zeroextend unsigned 32bit integers in 
_mm256_cvtepu8_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Zeroextend unsigned 8bit integers in 
_mm256_cvtepu8_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Zeroextend the lower eight unsigned 8bit integers in 
_mm256_cvtepu8_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Zeroextend the lower four unsigned 8bit integers in 
_mm256_cvtpd_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx Convert packed doubleprecision (64bit) floatingpoint elements in 
_mm256_cvtpd_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Convert packed doubleprecision (64bit) floatingpoint elements in 
_mm256_cvtps_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx Convert packed singleprecision (32bit) floatingpoint elements in 
_mm256_cvtps_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Convert packed singleprecision (32bit) floatingpoint elements in 
_mm256_cvtsd_f64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Returns the first element of the input vector of [4 x double]. 
_mm256_cvtsi256_si32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Returns the first element of the input vector of [8 x i32]. 
_mm256_cvtss_f32^{⚠} 
[ Experimental ] [x86 and target feature ] avx Returns the first element of the input vector of [8 x float]. 
_mm256_cvttpd_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx Convert packed doubleprecision (64bit) floatingpoint elements in 
_mm256_cvttps_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx Convert packed singleprecision (32bit) floatingpoint elements in 
_mm256_div_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the division of each of the 4 packed 64bit floatingpoint elements
in 
_mm256_div_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the division of each of the 8 packed 32bit floatingpoint elements
in 
_mm256_dp_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Conditionally multiply the packed singleprecision (32bit) floatingpoint
elements in 
_mm256_extract_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Extract an 8bit integer from 
_mm256_extract_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Extract a 16bit integer from 
_mm256_extract_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Extract a 32bit integer from 
_mm256_extractf128_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Extract 128 bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm256_extractf128_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Extract 128 bits (composed of 4 packed singleprecision (32bit)
floatingpoint elements) from 
_mm256_extractf128_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Extract 128 bits (composed of integer data) from 
_mm256_extracti128_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Extract 128 bits (of integer data) from 
_mm256_floor_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Round packed doubleprecision (64bit) floating point elements in 
_mm256_floor_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Round packed singleprecision (32bit) floating point elements in 
_mm256_hadd_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Horizontally add adjacent pairs of 16bit integers in 
_mm256_hadd_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Horizontally add adjacent pairs of 32bit integers in 
_mm256_hadd_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Horizontal addition of adjacent pairs in the two packed vectors
of 4 64bit floating points 
_mm256_hadd_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Horizontal addition of adjacent pairs in the two packed vectors
of 8 32bit floating points 
_mm256_hadds_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Horizontally add adjacent pairs of 16bit integers in 
_mm256_hsub_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Horizontally subtract adjacent pairs of 16bit integers in 
_mm256_hsub_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Horizontally subtract adjacent pairs of 32bit integers in 
_mm256_hsub_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Horizontal subtraction of adjacent pairs in the two packed vectors
of 4 64bit floating points 
_mm256_hsub_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Horizontal subtraction of adjacent pairs in the two packed vectors
of 8 32bit floating points 
_mm256_hsubs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Horizontally subtract adjacent pairs of 16bit integers in 
_mm256_i32gather_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_i32gather_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_i32gather_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_i32gather_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_i64gather_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_i64gather_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_i64gather_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_i64gather_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_insert_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx Copy 
_mm256_insert_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx Copy 
_mm256_insert_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx Copy 
_mm256_insertf128_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Copy 
_mm256_insertf128_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Copy 
_mm256_insertf128_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Copy 
_mm256_inserti128_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Copy 
_mm256_lddqu_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Load 256bits of integer data from unaligned memory into result.
This intrinsic may perform better than 
_mm256_load_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Load 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from memory into result.

_mm256_load_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Load 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from memory into result.

_mm256_load_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Load 256bits of integer data from memory into result.

_mm256_loadu2_m128^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse Load two 128bit values (composed of 4 packed singleprecision (32bit)
floatingpoint elements) from memory, and combine them into a 256bit
value.

_mm256_loadu2_m128d^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse2 Load two 128bit values (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from memory, and combine them into a 256bit
value.

_mm256_loadu2_m128i^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse2 Load two 128bit values (composed of integer data) from memory, and combine
them into a 256bit value.

_mm256_loadu_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Load 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from memory into result.

_mm256_loadu_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Load 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from memory into result.

_mm256_loadu_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Load 256bits of integer data from memory into result.

_mm256_madd_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Multiply packed signed 16bit integers in 
_mm256_maddubs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Vertically multiply each unsigned 8bit integer from 
_mm256_mask_i32gather_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_mask_i32gather_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_mask_i32gather_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_mask_i32gather_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_mask_i64gather_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_mask_i64gather_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_mask_i64gather_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_mask_i64gather_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm256_maskload_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Load packed 32bit integers from memory pointed by 
_mm256_maskload_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Load packed 64bit integers from memory pointed by 
_mm256_maskload_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Load packed doubleprecision (64bit) floatingpoint elements from memory
into result using 
_mm256_maskload_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Load packed singleprecision (32bit) floatingpoint elements from memory
into result using 
_mm256_maskstore_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Store packed 32bit integers from 
_mm256_maskstore_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Store packed 64bit integers from 
_mm256_maskstore_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Store packed doubleprecision (64bit) floatingpoint elements from 
_mm256_maskstore_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Store packed singleprecision (32bit) floatingpoint elements from 
_mm256_max_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 8bit integers in 
_mm256_max_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 16bit integers in 
_mm256_max_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 32bit integers in 
_mm256_max_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed unsigned 8bit integers in 
_mm256_max_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed unsigned 16bit integers in 
_mm256_max_epu32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed unsigned 32bit integers in 
_mm256_max_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compare packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_max_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compare packed singleprecision (32bit) floatingpoint elements in 
_mm256_min_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 8bit integers in 
_mm256_min_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 16bit integers in 
_mm256_min_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed 32bit integers in 
_mm256_min_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed unsigned 8bit integers in 
_mm256_min_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed unsigned 16bit integers in 
_mm256_min_epu32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compare packed unsigned 32bit integers in 
_mm256_min_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compare packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_min_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compare packed singleprecision (32bit) floatingpoint elements in 
_mm256_movedup_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Duplicate evenindexed doubleprecision (64bit) floatingpoint elements from "a", and return the results. 
_mm256_movehdup_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Duplicate oddindexed singleprecision (32bit) floatingpoint elements
from 
_mm256_moveldup_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Duplicate evenindexed singleprecision (32bit) floatingpoint elements
from 
_mm256_movemask_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Create mask from the most significant bit of each 8bit element in 
_mm256_movemask_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set each bit of the returned mask based on the most significant bit of the
corresponding packed doubleprecision (64bit) floatingpoint element in

_mm256_movemask_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set each bit of the returned mask based on the most significant bit of the
corresponding packed singleprecision (32bit) floatingpoint element in

_mm256_mpsadbw_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compute the sum of absolute differences (SADs) of quadruplets of unsigned
8bit integers in 
_mm256_mul_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Multiply the low 32bit integers from each packed 64bit element in

_mm256_mul_epu32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Multiply the low unsigned 32bit integers from each packed 64bit
element in 
_mm256_mul_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Add packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_mul_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Add packed singleprecision (32bit) floatingpoint elements in 
_mm256_mulhi_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Multiply the packed 16bit integers in 
_mm256_mulhi_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Multiply the packed unsigned 16bit integers in 
_mm256_mulhrs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Multiply packed 16bit integers in 
_mm256_mullo_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Multiply the packed 16bit integers in 
_mm256_mullo_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Multiply the packed 32bit integers in 
_mm256_or_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise OR packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_or_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise OR packed singleprecision (32bit) floatingpoint
elements in 
_mm256_or_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compute the bitwise OR of 256 bits (representing integer data) in 
_mm256_packs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Convert packed 16bit integers from 
_mm256_packs_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Convert packed 32bit integers from 
_mm256_packus_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Convert packed 16bit integers from 
_mm256_packus_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Convert packed 32bit integers from 
_mm256_permute2f128_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Shuffle 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) selected by 
_mm256_permute2f128_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Shuffle 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) selected by 
_mm256_permute2f128_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Shuffle 258bits (composed of integer data) selected by 
_mm256_permute2x128_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shuffle 128bits of integer data selected by 
_mm256_permute4x64_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Permutes 64bit integers from 
_mm256_permute4x64_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shuffle 64bit floatingpoint elements in 
_mm256_permute_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Shuffle doubleprecision (64bit) floatingpoint elements in 
_mm256_permute_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Shuffle singleprecision (32bit) floatingpoint elements in 
_mm256_permutevar8x32_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Permutes packed 32bit integers from 
_mm256_permutevar8x32_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shuffle eight 32bit foatingpoint elements in 
_mm256_permutevar_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Shuffle doubleprecision (64bit) floatingpoint elements in 
_mm256_permutevar_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Shuffle singleprecision (32bit) floatingpoint elements in 
_mm256_rcp_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the approximate reciprocal of packed singleprecision (32bit)
floatingpoint elements in 
_mm256_round_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Round packed doubleprecision (64bit) floating point elements in 
_mm256_round_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Round packed singleprecision (32bit) floating point elements in 
_mm256_rsqrt_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the approximate reciprocal square root of packed singleprecision
(32bit) floatingpoint elements in 
_mm256_sad_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compute the absolute differences of packed unsigned 8bit integers in 
_mm256_set1_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx Broadcast 8bit integer 
_mm256_set1_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx Broadcast 16bit integer 
_mm256_set1_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx Broadcast 32bit integer 
_mm256_set1_epi64x^{⚠} 
[ Experimental ] [x86 and target feature ] avx Broadcast 64bit integer 
_mm256_set1_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Broadcast doubleprecision (64bit) floatingpoint value 
_mm256_set1_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Broadcast singleprecision (32bit) floatingpoint value 
_mm256_set_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed 8bit integers in returned vector with the supplied values in reverse order. 
_mm256_set_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed 16bit integers in returned vector with the supplied values. 
_mm256_set_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed 32bit integers in returned vector with the supplied values. 
_mm256_set_epi64x^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed 64bit integers in returned vector with the supplied values. 
_mm256_set_m128^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed __m256 returned vector with the supplied values. 
_mm256_set_m128d^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed __m256d returned vector with the supplied values. 
_mm256_set_m128i^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed __m256i returned vector with the supplied values. 
_mm256_set_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed doubleprecision (64bit) floatingpoint elements in returned vector with the supplied values. 
_mm256_set_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed singleprecision (32bit) floatingpoint elements in returned vector with the supplied values. 
_mm256_setr_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed 8bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed 16bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed 32bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_epi64x^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed 64bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_m128^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed __m256 returned vector with the supplied values. 
_mm256_setr_m128d^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed __m256d returned vector with the supplied values. 
_mm256_setr_m128i^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed __m256i returned vector with the supplied values. 
_mm256_setr_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed doubleprecision (64bit) floatingpoint elements in returned vector with the supplied values in reverse order. 
_mm256_setr_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Set packed singleprecision (32bit) floatingpoint elements in returned vector with the supplied values in reverse order. 
_mm256_setzero_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Return vector of type __m256d with all elements set to zero. 
_mm256_setzero_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Return vector of type __m256 with all elements set to zero. 
_mm256_setzero_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Return vector of type __m256i with all elements set to zero. 
_mm256_shuffle_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shuffle bytes from 
_mm256_shuffle_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shuffle 32bit integers in 128bit lanes of 
_mm256_shuffle_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Shuffle doubleprecision (64bit) floatingpoint elements within 128bit
lanes using the control in 
_mm256_shuffle_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Shuffle singleprecision (32bit) floatingpoint elements in 
_mm256_shufflehi_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shuffle 16bit integers in the high 64 bits of 128bit lanes of 
_mm256_shufflelo_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shuffle 16bit integers in the low 64 bits of 128bit lanes of 
_mm256_sign_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Negate packed 8bit integers in 
_mm256_sign_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Negate packed 16bit integers in 
_mm256_sign_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Negate packed 32bit integers in 
_mm256_sll_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 16bit integers in 
_mm256_sll_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm256_sll_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 64bit integers in 
_mm256_slli_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 16bit integers in 
_mm256_slli_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm256_slli_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 64bit integers in 
_mm256_slli_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift 128bit lanes in 
_mm256_sllv_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm256_sllv_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 64bit integers in 
_mm256_sqrt_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Return the square root of packed doubleprecision (64bit) floating point
elements in 
_mm256_sqrt_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Return the square root of packed singleprecision (32bit) floating point
elements in 
_mm256_sra_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 16bit integers in 
_mm256_sra_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm256_srai_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 16bit integers in 
_mm256_srai_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm256_srav_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm256_srl_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 16bit integers in 
_mm256_srl_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm256_srl_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 64bit integers in 
_mm256_srli_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 16bit integers in 
_mm256_srli_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm256_srli_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 64bit integers in 
_mm256_srli_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift 128bit lanes in 
_mm256_srlv_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm256_srlv_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 64bit integers in 
_mm256_store_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Store 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm256_store_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Store 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from 
_mm256_store_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Store 256bits of integer data from 
_mm256_storeu2_m128^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse Store the high and low 128bit halves (each composed of 4 packed
singleprecision (32bit) floatingpoint elements) from 
_mm256_storeu2_m128d^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse2 Store the high and low 128bit halves (each composed of 2 packed
doubleprecision (64bit) floatingpoint elements) from 
_mm256_storeu2_m128i^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse2 Store the high and low 128bit halves (each composed of integer data) from

_mm256_storeu_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Store 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm256_storeu_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Store 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from 
_mm256_storeu_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Store 256bits of integer data from 
_mm256_stream_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Moves doubleprecision values from a 256bit vector of [4 x double] to a 32byte aligned memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon). 
_mm256_stream_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Moves singleprecision floating point values from a 256bit vector of [8 x float] to a 32byte aligned memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon). 
_mm256_stream_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Moves integer data from a 256bit integer vector to a 32byte aligned memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon) 
_mm256_sub_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Subtract packed 8bit integers in 
_mm256_sub_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Subtract packed 16bit integers in 
_mm256_sub_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Subtract packed 32bit integers in 
_mm256_sub_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Subtract packed 64bit integers in 
_mm256_sub_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Subtract packed doubleprecision (64bit) floatingpoint elements in 
_mm256_sub_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Subtract packed singleprecision (32bit) floatingpoint elements in 
_mm256_subs_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Subtract packed 8bit integers in 
_mm256_subs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Subtract packed 16bit integers in 
_mm256_subs_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Subtract packed unsigned 8bit integers in 
_mm256_subs_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Subtract packed unsigned 16bit integers in 
_mm256_testc_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 256 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm256_testc_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 256 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm256_testc_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 256 bits (representing integer data) in 
_mm256_testnzc_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 256 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm256_testnzc_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 256 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm256_testnzc_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 256 bits (representing integer data) in 
_mm256_testz_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 256 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm256_testz_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 256 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm256_testz_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 256 bits (representing integer data) in 
_mm256_undefined_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Return vector of type 
_mm256_undefined_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Return vector of type 
_mm256_undefined_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx Return vector of type __m256i with undefined elements. 
_mm256_unpackhi_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Unpack and interleave 8bit integers from the high half of each
128bit lane in 
_mm256_unpackhi_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Unpack and interleave 16bit integers from the high half of each
128bit lane of 
_mm256_unpackhi_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Unpack and interleave 32bit integers from the high half of each
128bit lane of 
_mm256_unpackhi_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Unpack and interleave 64bit integers from the high half of each
128bit lane of 
_mm256_unpackhi_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Unpack and interleave doubleprecision (64bit) floatingpoint elements
from the high half of each 128bit lane in 
_mm256_unpackhi_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Unpack and interleave singleprecision (32bit) floatingpoint elements
from the high half of each 128bit lane in 
_mm256_unpacklo_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Unpack and interleave 8bit integers from the low half of each
128bit lane of 
_mm256_unpacklo_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Unpack and interleave 16bit integers from the low half of each
128bit lane of 
_mm256_unpacklo_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Unpack and interleave 32bit integers from the low half of each
128bit lane of 
_mm256_unpacklo_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Unpack and interleave 64bit integers from the low half of each
128bit lane of 
_mm256_unpacklo_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Unpack and interleave doubleprecision (64bit) floatingpoint elements
from the low half of each 128bit lane in 
_mm256_unpacklo_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Unpack and interleave singleprecision (32bit) floatingpoint elements
from the low half of each 128bit lane in 
_mm256_xor_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise XOR of packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_xor_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise XOR of packed singleprecision (32bit) floatingpoint
elements in 
_mm256_xor_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Compute the bitwise XOR of 256 bits (representing integer data)
in 
_mm256_zeroall^{⚠} 
[ Experimental ] [x86 and target feature ] avx Zero the contents of all XMM or YMM registers. 
_mm256_zeroupper^{⚠} 
[ Experimental ] [x86 and target feature ] avx Zero the upper 128 bits of all YMM registers; the lower 128bits of the registers are unmodified. 
_mm256_zextpd128_pd256^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse2 Constructs a 256bit floatingpoint vector of [4 x double] from a 128bit floatingpoint vector of [2 x double]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. 
_mm256_zextps128_ps256^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse Constructs a 256bit floatingpoint vector of [8 x float] from a 128bit floatingpoint vector of [4 x float]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. 
_mm256_zextsi128_si256^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse2 Constructs a 256bit integer vector from a 128bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. 
_mm_abs_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Compute the absolute value of packed 8bit signed integers in 
_mm_abs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Compute the absolute value of each of the packed 16bit signed integers in

_mm_abs_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Compute the absolute value of each of the packed 32bit signed integers in

_mm_abs_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Compute the absolute value of packed 8bit integers in 
_mm_abs_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Compute the absolute value of packed 8bit integers in 
_mm_abs_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Compute the absolute value of packed 32bit integers in 
_mm_add_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Add packed 8bit integers in 
_mm_add_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Add packed 16bit integers in 
_mm_add_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Add packed 32bit integers in 
_mm_add_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Add packed 64bit integers in 
_mm_add_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Add packed doubleprecision (64bit) floatingpoint elements in 
_mm_add_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed 8bit integers in 
_mm_add_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed 16bit integers in 
_mm_add_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed 32bit integers in 
_mm_add_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Adds __m128 vectors. 
_mm_add_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_add_si64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2,mmx Adds two signed or unsigned 64bit integer values, returning the lower 64 bits of the sum. 
_mm_add_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Adds the first component of 
_mm_adds_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Add packed 8bit integers in 
_mm_adds_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Add packed 16bit integers in 
_mm_adds_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Add packed unsigned 8bit integers in 
_mm_adds_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Add packed unsigned 16bit integers in 
_mm_adds_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed 8bit integers in 
_mm_adds_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed 16bit integers in 
_mm_adds_pu8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed unsigned 8bit integers in 
_mm_adds_pu16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Add packed unsigned 16bit integers in 
_mm_addsub_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse3 Alternatively add and subtract packed doubleprecision (64bit)
floatingpoint elements in 
_mm_addsub_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse3 Alternatively add and subtract packed singleprecision (32bit)
floatingpoint elements in 
_mm_aesdec_si128^{⚠} 
[ Experimental ] [x86 and target feature ] aes Perform one round of an AES decryption flow on data (state) in 
_mm_aesdeclast_si128^{⚠} 
[ Experimental ] [x86 and target feature ] aes Perform the last round of an AES decryption flow on data (state) in 
_mm_aesenc_si128^{⚠} 
[ Experimental ] [x86 and target feature ] aes Perform one round of an AES encryption flow on data (state) in 
_mm_aesenclast_si128^{⚠} 
[ Experimental ] [x86 and target feature ] aes Perform the last round of an AES encryption flow on data (state) in 
_mm_aesimc_si128^{⚠} 
[ Experimental ] [x86 and target feature ] aes Perform the 
_mm_aeskeygenassist_si128^{⚠} 
[ Experimental ] [x86 and target feature ] aes Assist in expanding the AES cipher key. 
_mm_alignr_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Concatenate 16byte blocks in 
_mm_alignr_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Concatenates the two 64bit integer vector operands, and rightshifts the result by the number of bytes specified in the immediate operand. 
_mm_and_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compute the bitwise AND of packed doubleprecision (64bit) floatingpoint
elements in 
_mm_and_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Bitwise AND of packed singleprecision (32bit) floatingpoint elements. 
_mm_and_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compute the bitwise AND of 128 bits (representing integer data) in 
_mm_andnot_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compute the bitwise NOT of 
_mm_andnot_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Bitwise ANDNOT of packed singleprecision (32bit) floatingpoint elements. 
_mm_andnot_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compute the bitwise NOT of 128 bits (representing integer data) in 
_mm_avg_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Average packed unsigned 8bit integers in 
_mm_avg_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Average packed unsigned 16bit integers in 
_mm_avg_pu8^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Computes the rounded averages of the packed unsigned 8bit integer values and writes the averages to the corresponding bits in the destination. 
_mm_avg_pu16^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Computes the rounded averages of the packed unsigned 16bit integer values and writes the averages to the corresponding bits in the destination. 
_mm_blend_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Blend packed 16bit integers from 
_mm_blend_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Blend packed 32bit integers from 
_mm_blend_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Blend packed doubleprecision (64bit) floatingpoint elements from 
_mm_blend_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Blend packed singleprecision (32bit) floatingpoint elements from 
_mm_blendv_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Blend packed 8bit integers from 
_mm_blendv_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Blend packed doubleprecision (64bit) floatingpoint elements from 
_mm_blendv_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Blend packed singleprecision (32bit) floatingpoint elements from 
_mm_broadcast_ss^{⚠} 
[ Experimental ] [x86 and target feature ] avx Broadcast a singleprecision (32bit) floatingpoint element from memory to all elements of the returned vector. 
_mm_broadcastb_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low packed 8bit integer from 
_mm_broadcastd_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low packed 32bit integer from 
_mm_broadcastq_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low packed 64bit integer from 
_mm_broadcastsd_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low doubleprecision (64bit) floatingpoint element
from 
_mm_broadcastss_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low singleprecision (32bit) floatingpoint element
from 
_mm_broadcastw_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Broadcast the low packed 16bit integer from a to all elements of the 128bit returned value 
_mm_bslli_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift 
_mm_bsrli_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift 
_mm_castpd_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Casts a 128bit floatingpoint vector of [2 x double] into a 128bit floatingpoint vector of [4 x float]. 
_mm_castpd_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Casts a 128bit floatingpoint vector of [2 x double] into a 128bit integer vector. 
_mm_castps_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Casts a 128bit floatingpoint vector of [4 x float] into a 128bit floatingpoint vector of [2 x double]. 
_mm_castps_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Casts a 128bit floatingpoint vector of [4 x float] into a 128bit integer vector. 
_mm_castsi128_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Casts a 128bit integer vector into a 128bit floatingpoint vector of [2 x double]. 
_mm_castsi128_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Casts a 128bit integer vector into a 128bit floatingpoint vector of [4 x float]. 
_mm_ceil_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the packed doubleprecision (64bit) floatingpoint elements in 
_mm_ceil_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the packed singleprecision (32bit) floatingpoint elements in 
_mm_ceil_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the lower doubleprecision (64bit) floatingpoint element in 
_mm_ceil_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the lower singleprecision (32bit) floatingpoint element in 
_mm_clflush^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Invalidate and flush the cache line that contains 
_mm_clmulepi64_si128^{⚠} 
[ Experimental ] [x86 and target feature ] pclmulqdq Perform a carryless multiplication of two 64bit polynomials over the finite field GF(2^k). 
_mm_cmp_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse2 Compare packed doubleprecision (64bit) floatingpoint
elements in 
_mm_cmp_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse Compare packed singleprecision (32bit) floatingpoint
elements in 
_mm_cmp_sd^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse2 Compare the lower doubleprecision (64bit) floatingpoint element in

_mm_cmp_ss^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse Compare the lower singleprecision (32bit) floatingpoint element in

_mm_cmpeq_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed 8bit integers in 
_mm_cmpeq_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed 16bit integers in 
_mm_cmpeq_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed 32bit integers in 
_mm_cmpeq_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Compare packed 64bit integers in 
_mm_cmpeq_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmpeq_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmpeq_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmpeq_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the lowest 
_mm_cmpestra^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings in 
_mm_cmpestrc^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings in 
_mm_cmpestri^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings 
_mm_cmpestrm^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings in 
_mm_cmpestro^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings in 
_mm_cmpestrs^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings in 
_mm_cmpestrz^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings in 
_mm_cmpge_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmpge_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmpge_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmpge_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the lowest 
_mm_cmpgt_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed 8bit integers in 
_mm_cmpgt_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed 16bit integers in 
_mm_cmpgt_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed 32bit integers in 
_mm_cmpgt_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed 64bit integers in 
_mm_cmpgt_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmpgt_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Compares whether each element of 
_mm_cmpgt_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Compares whether each element of 
_mm_cmpgt_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Compares whether each element of 
_mm_cmpgt_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmpgt_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmpgt_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the lowest 
_mm_cmpistra^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistrc^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistri^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistrm^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistro^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistrs^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings with implicit lengths in 
_mm_cmpistrz^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Compare packed strings with implicit lengths in 
_mm_cmple_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmple_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmple_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmple_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the lowest 
_mm_cmplt_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed 8bit integers in 
_mm_cmplt_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed 16bit integers in 
_mm_cmplt_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed 32bit integers in 
_mm_cmplt_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmplt_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmplt_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmplt_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the lowest 
_mm_cmpneq_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmpneq_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmpneq_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmpneq_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the lowest 
_mm_cmpnge_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmpnge_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmpnge_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmpnge_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the lowest 
_mm_cmpngt_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmpngt_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmpngt_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmpngt_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the lowest 
_mm_cmpnle_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmpnle_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmpnle_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmpnle_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the lowest 
_mm_cmpnlt_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmpnlt_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmpnlt_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmpnlt_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the lowest 
_mm_cmpord_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmpord_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmpord_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmpord_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Check if the lowest 
_mm_cmpunord_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare corresponding elements in 
_mm_cmpunord_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare each of the four floats in 
_mm_cmpunord_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_cmpunord_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Check if the lowest 
_mm_comieq_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_comieq_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_comige_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_comige_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_comigt_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_comigt_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_comile_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_comile_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_comilt_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_comilt_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_comineq_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_comineq_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_crc32_u8^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Starting with the initial value in 
_mm_crc32_u16^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Starting with the initial value in 
_mm_crc32_u32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.2 Starting with the initial value in 
_mm_cvt_pi2ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Converts two elements of a 64bit vector of [2 x i32] into two floating point values and writes them to the lower 64bits of the destination. The remaining higher order elements of the destination are copied from the corresponding elements in the first operand. 
_mm_cvt_ps2pi^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Convert the two lower packed singleprecision (32bit) floatingpoint
elements in 
_mm_cvt_si2ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Alias for 
_mm_cvt_ss2si^{⚠} 
[ Experimental ] [x86 and target feature ] sse Alias for 
_mm_cvtepi16_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Sign extend packed 16bit integers in 
_mm_cvtepi16_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Sign extend packed 16bit integers in 
_mm_cvtepi32_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Sign extend packed 32bit integers in 
_mm_cvtepi32_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert the lower two packed 32bit integers in 
_mm_cvtepi32_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert packed 32bit integers in 
_mm_cvtepi8_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Sign extend packed 8bit integers in 
_mm_cvtepi8_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Sign extend packed 8bit integers in 
_mm_cvtepi8_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Sign extend packed 8bit integers in the low 8 bytes of 
_mm_cvtepu16_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Zero extend packed unsigned 16bit integers in 
_mm_cvtepu16_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Zero extend packed unsigned 16bit integers in 
_mm_cvtepu32_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Zero extend packed unsigned 32bit integers in 
_mm_cvtepu8_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Zero extend packed unsigned 8bit integers in 
_mm_cvtepu8_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Zero extend packed unsigned 8bit integers in 
_mm_cvtepu8_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Zero extend packed unsigned 8bit integers in 
_mm_cvtpd_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert packed doubleprecision (64bit) floatingpoint elements in 
_mm_cvtpd_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2,mmx Converts the two doubleprecision floatingpoint elements of a 128bit vector of [2 x double] into two signed 32bit integer values, returned in a 64bit vector of [2 x i32]. 
_mm_cvtpd_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert packed doubleprecision (64bit) floatingpoint elements in "a" to packed singleprecision (32bit) floatingpoint elements 
_mm_cvtpi16_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Converts a 64bit vector of 
_mm_cvtpi32_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2,mmx Converts the two signed 32bit integer elements of a 64bit vector of [2 x i32] into two doubleprecision floatingpoint values, returned in a 128bit vector of [2 x double]. 
_mm_cvtpi32_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Converts two elements of a 64bit vector of [2 x i32] into two floating point values and writes them to the lower 64bits of the destination. The remaining higher order elements of the destination are copied from the corresponding elements in the first operand. 
_mm_cvtpi32x2_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Converts the two 32bit signed integer values from each 64bit vector operand of [2 x i32] into a 128bit vector of [4 x float]. 
_mm_cvtpi8_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Converts the lower 4 8bit values of 
_mm_cvtps_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert packed singleprecision (32bit) floatingpoint elements in 
_mm_cvtps_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert packed singleprecision (32bit) floatingpoint elements in 
_mm_cvtps_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Convert packed singleprecision (32bit) floatingpoint elements in 
_mm_cvtps_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Convert packed singleprecision (32bit) floatingpoint elements in 
_mm_cvtps_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Convert the two lower packed singleprecision (32bit) floatingpoint
elements in 
_mm_cvtpu16_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Converts a 64bit vector of 
_mm_cvtpu8_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Converts the lower 4 8bit values of 
_mm_cvtsd_f64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return the lower doubleprecision (64bit) floatingpoint element of "a". 
_mm_cvtsd_si32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert the lower doubleprecision (64bit) floatingpoint element in a to a 32bit integer. 
_mm_cvtsd_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert the lower doubleprecision (64bit) floatingpoint element in 
_mm_cvtsi128_si32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return the lowest element of 
_mm_cvtsi32_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return 
_mm_cvtsi32_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a vector whose lowest element is 
_mm_cvtsi32_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Convert a 32 bit integer to a 32 bit float. The result vector is the input
vector 
_mm_cvtss_f32^{⚠} 
[ Experimental ] [x86 and target feature ] sse Extract the lowest 32 bit float from the input vector. 
_mm_cvtss_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert the lower singleprecision (32bit) floatingpoint element in 
_mm_cvtss_si32^{⚠} 
[ Experimental ] [x86 and target feature ] sse Convert the lowest 32 bit float in the input vector to a 32 bit integer. 
_mm_cvtt_ps2pi^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Convert the two lower packed singleprecision (32bit) floatingpoint
elements in 
_mm_cvtt_ss2si^{⚠} 
[ Experimental ] [x86 and target feature ] sse Alias for 
_mm_cvttpd_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert packed doubleprecision (64bit) floatingpoint elements in 
_mm_cvttpd_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2,mmx Converts the two doubleprecision floatingpoint elements of a 128bit vector of [2 x double] into two signed 32bit integer values, returned in a 64bit vector of [2 x i32]. If the result of either conversion is inexact, the result is truncated (rounded towards zero) regardless of the current MXCSR setting. 
_mm_cvttps_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert packed singleprecision (32bit) floatingpoint elements in 
_mm_cvttps_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Convert the two lower packed singleprecision (32bit) floatingpoint
elements in 
_mm_cvttsd_si32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert the lower doubleprecision (64bit) floatingpoint element in 
_mm_cvttss_si32^{⚠} 
[ Experimental ] [x86 and target feature ] sse Convert the lowest 32 bit float in the input vector to a 32 bit integer with truncation. 
_mm_div_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Divide packed doubleprecision (64bit) floatingpoint elements in 
_mm_div_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Divides __m128 vectors. 
_mm_div_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_div_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Divides the first component of 
_mm_dp_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Returns the dot product of two __m128d vectors. 
_mm_dp_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Returns the dot product of two __m128 vectors. 
_mm_extract_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Extract an 8bit integer from 
_mm_extract_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return the 
_mm_extract_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Extract an 32bit integer from 
_mm_extract_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Extracts 16bit element from a 64bit vector of [4 x i16] and returns it, as specified by the immediate integer operand. 
_mm_extract_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Extract a singleprecision (32bit) floatingpoint element from 
_mm_extract_si64^{⚠} 
[ Experimental ] [x86 and target feature ] sse4a Extracts the bit range specified by 
_mm_floor_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the packed doubleprecision (64bit) floatingpoint elements in 
_mm_floor_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the packed singleprecision (32bit) floatingpoint elements in 
_mm_floor_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the lower doubleprecision (64bit) floatingpoint element in 
_mm_floor_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the lower singleprecision (32bit) floatingpoint element in 
_mm_getcsr^{⚠} 
[ Experimental ] [x86 and target feature ] sse Get the unsigned 32bit value of the MXCSR control and status register. 
_mm_hadd_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Horizontally add the adjacent pairs of values contained in 2 packed 128bit vectors of [8 x i16]. 
_mm_hadd_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Horizontally add the adjacent pairs of values contained in 2 packed 128bit vectors of [4 x i32]. 
_mm_hadd_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse3 Horizontally add adjacent pairs of doubleprecision (64bit)
floatingpoint elements in 
_mm_hadd_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Horizontally add the adjacent pairs of values contained in 2 packed 64bit vectors of [4 x i16]. 
_mm_hadd_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Horizontally add the adjacent pairs of values contained in 2 packed 64bit vectors of [2 x i32]. 
_mm_hadd_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse3 Horizontally add adjacent pairs of singleprecision (32bit)
floatingpoint elements in 
_mm_hadds_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Horizontally add the adjacent pairs of values contained in 2 packed 128bit vectors of [8 x i16]. Positive sums greater than 7FFFh are saturated to 7FFFh. Negative sums less than 8000h are saturated to 8000h. 
_mm_hadds_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Horizontally add the adjacent pairs of values contained in 2 packed 64bit vectors of [4 x i16]. Positive sums greater than 7FFFh are saturated to 7FFFh. Negative sums less than 8000h are saturated to 8000h. 
_mm_hsub_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Horizontally subtract the adjacent pairs of values contained in 2 packed 128bit vectors of [8 x i16]. 
_mm_hsub_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Horizontally subtract the adjacent pairs of values contained in 2 packed 128bit vectors of [4 x i32]. 
_mm_hsub_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse3 Horizontally subtract adjacent pairs of doubleprecision (64bit)
floatingpoint elements in 
_mm_hsub_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Horizontally subtracts the adjacent pairs of values contained in 2 packed 64bit vectors of [4 x i16]. 
_mm_hsub_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Horizontally subtracts the adjacent pairs of values contained in 2 packed 64bit vectors of [2 x i32]. 
_mm_hsub_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse3 Horizontally add adjacent pairs of singleprecision (32bit)
floatingpoint elements in 
_mm_hsubs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Horizontally subtract the adjacent pairs of values contained in 2 packed 128bit vectors of [8 x i16]. Positive differences greater than 7FFFh are saturated to 7FFFh. Negative differences less than 8000h are saturated to 8000h. 
_mm_hsubs_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Horizontally subtracts the adjacent pairs of values contained in 2 packed 64bit vectors of [4 x i16]. Positive differences greater than 7FFFh are saturated to 7FFFh. Negative differences less than 8000h are saturated to 8000h. 
_mm_i32gather_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_i32gather_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_i32gather_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_i32gather_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_i64gather_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_i64gather_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_i64gather_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_i64gather_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_insert_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Return a copy of 
_mm_insert_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector where the 
_mm_insert_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Return a copy of 
_mm_insert_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Copies data from the 64bit vector of [4 x i16] to the destination,
and inserts the lower 16bits of an integer operand at the 16bit offset
specified by the immediate operand 
_mm_insert_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Select a single value in 
_mm_insert_si64^{⚠} 
[ Experimental ] [x86 and target feature ] sse4a Inserts the 
_mm_lddqu_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse3 Load 128bits of integer data from unaligned memory.
This intrinsic may perform better than 
_mm_lfence^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Perform a serializing operation on all loadfrommemory instructions that were issued prior to this instruction. 
_mm_load1_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Load a doubleprecision (64bit) floatingpoint element from memory into both elements of returned vector. 
_mm_load1_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Construct a 
_mm_load_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Load 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from memory into the returned vector.

_mm_load_pd1^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Load a doubleprecision (64bit) floatingpoint element from memory into both elements of returned vector. 
_mm_load_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Load four 
_mm_load_ps1^{⚠} 
[ Experimental ] [x86 and target feature ] sse Alias for 
_mm_load_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Loads a 64bit doubleprecision value to the low element of a 128bit integer vector and clears the upper element. 
_mm_load_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Load 128bits of integer data from memory into a new vector. 
_mm_load_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Construct a 
_mm_loaddup_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse3 Load a doubleprecision (64bit) floatingpoint element from memory into both elements of return vector. 
_mm_loadh_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Loads a doubleprecision value into the highorder bits of a 128bit vector of [2 x double]. The loworder bits are copied from the loworder bits of the first operand. 
_mm_loadh_pi^{⚠} 
[ Experimental ] [x86 and target feature ] sse Set the upper two singleprecision floatingpoint values with 64 bits of
data loaded from the address 
_mm_loadl_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Load 64bit integer from memory into first element of returned vector. 
_mm_loadl_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Loads a doubleprecision value into the loworder bits of a 128bit vector of [2 x double]. The highorder bits are copied from the highorder bits of the first operand. 
_mm_loadl_pi^{⚠} 
[ Experimental ] [x86 and target feature ] sse Load two floats from 
_mm_loadr_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Load 2 doubleprecision (64bit) floatingpoint elements from memory into
the returned vector in reverse order. 
_mm_loadr_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Load four 
_mm_loadu_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Load 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from memory into the returned vector.

_mm_loadu_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Load four 
_mm_loadu_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Load 128bits of integer data from memory into a new vector. 
_mm_madd_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Multiply and then horizontally add signed 16 bit integers in 
_mm_maddubs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Multiply corresponding pairs of packed 8bit unsigned integer values contained in the first source operand and packed 8bit signed integer values contained in the second source operand, add pairs of contiguous products with signed saturation, and writes the 16bit sums to the corresponding bits in the destination. 
_mm_maddubs_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Multiplies corresponding pairs of packed 8bit unsigned integer values contained in the first source operand and packed 8bit signed integer values contained in the second source operand, adds pairs of contiguous products with signed saturation, and writes the 16bit sums to the corresponding bits in the destination. 
_mm_mask_i32gather_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_mask_i32gather_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_mask_i32gather_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_mask_i32gather_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_mask_i64gather_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_mask_i64gather_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_mask_i64gather_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_mask_i64gather_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Return values from 
_mm_maskload_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Load packed 32bit integers from memory pointed by 
_mm_maskload_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Load packed 64bit integers from memory pointed by 
_mm_maskload_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Load packed doubleprecision (64bit) floatingpoint elements from memory
into result using 
_mm_maskload_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Load packed singleprecision (32bit) floatingpoint elements from memory
into result using 
_mm_maskmove_si64^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Conditionally copies the values from each 8bit element in the first 64bit integer vector operand to the specified memory location, as specified by the most significant bit in the corresponding element in the second 64bit integer vector operand. 
_mm_maskmoveu_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Conditionally store 8bit integer elements from 
_mm_maskstore_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Store packed 32bit integers from 
_mm_maskstore_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Store packed 64bit integers from 
_mm_maskstore_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Store packed doubleprecision (64bit) floatingpoint elements from 
_mm_maskstore_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Store packed singleprecision (32bit) floatingpoint elements from 
_mm_max_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Compare packed 8bit integers in 
_mm_max_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed 16bit integers in 
_mm_max_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Compare packed 32bit integers in 
_mm_max_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed unsigned 8bit integers in 
_mm_max_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Compare packed unsigned 16bit integers in 
_mm_max_epu32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Compare packed unsigned 32bit integers in 
_mm_max_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the maximum values from corresponding elements in

_mm_max_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Compares the packed 16bit signed integers of 
_mm_max_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare packed singleprecision (32bit) floatingpoint elements in 
_mm_max_pu8^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Compares the packed 8bit signed integers of 
_mm_max_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_max_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the first singleprecision (32bit) floatingpoint element of 
_mm_mfence^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Perform a serializing operation on all loadfrommemory and storetomemory instructions that were issued prior to this instruction. 
_mm_min_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Compare packed 8bit integers in 
_mm_min_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed 16bit integers in 
_mm_min_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Compare packed 32bit integers in 
_mm_min_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare packed unsigned 8bit integers in 
_mm_min_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Compare packed unsigned 16bit integers in 
_mm_min_epu32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Compare packed unsigned 32bit integers in 
_mm_min_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the minimum values from corresponding elements in

_mm_min_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Compares the packed 16bit signed integers of 
_mm_min_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare packed singleprecision (32bit) floatingpoint elements in 
_mm_min_pu8^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Compares the packed 8bit signed integers of 
_mm_min_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_min_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare the first singleprecision (32bit) floatingpoint element of 
_mm_minpos_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Finds the minimum unsigned 16bit element in the 128bit __m128i vector, returning a vector containing its value in its first position, and its index in its second position; all other elements are set to zero. 
_mm_move_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a vector where the low element is extracted from 
_mm_move_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Constructs a 128bit floatingpoint vector of [2 x double]. The lower 64 bits are set to the lower 64 bits of the second parameter. The upper 64 bits are set to the upper 64 bits of the first parameter. 
_mm_move_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Return a 
_mm_movedup_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse3 Duplicate the low doubleprecision (64bit) floatingpoint element
from 
_mm_movehdup_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse3 Duplicate oddindexed singleprecision (32bit) floatingpoint elements
from 
_mm_movehl_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Combine higher half of 
_mm_moveldup_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse3 Duplicate evenindexed singleprecision (32bit) floatingpoint elements
from 
_mm_movelh_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Combine lower half of 
_mm_movemask_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a mask of the most significant bit of each element in 
_mm_movemask_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a mask of the most significant bit of each element in 
_mm_movemask_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Takes the most significant bit from each 8bit element in a 64bit integer vector to create a 16bit mask value. Zeroextends the value to 32bit integer and writes it to the destination. 
_mm_movemask_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Return a mask of the most significant bit of each element in 
_mm_movepi64_pi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2,mmx Returns the lower 64 bits of a 128bit integer vector as a 64bit integer. 
_mm_movpi64_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2,mmx Moves the 64bit operand to a 128bit integer vector, zeroing the upper bits. 
_mm_mpsadbw_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Subtracts 8bit unsigned integer values and computes the absolute values of the differences to the corresponding bits in the destination. Then sums of the absolute differences are returned according to the bit fields in the immediate operand. 
_mm_mul_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Multiply the low 32bit integers from each packed 64bit
element in 
_mm_mul_epu32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Multiply the low unsigned 32bit integers from each packed 64bit element
in 
_mm_mul_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Multiply packed doubleprecision (64bit) floatingpoint elements in 
_mm_mul_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Multiplies __m128 vectors. 
_mm_mul_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_mul_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Multiplies the first component of 
_mm_mul_su32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2,mmx Multiplies 32bit unsigned integer values contained in the lower bits of the two 64bit integer vectors and returns the 64bit unsigned product. 
_mm_mulhi_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Multiply the packed 16bit integers in 
_mm_mulhi_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Multiply the packed unsigned 16bit integers in 
_mm_mulhi_pu16^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Multiplies packed 16bit unsigned integer values and writes the highorder 16 bits of each 32bit product to the corresponding bits in the destination. 
_mm_mulhrs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Multiply packed 16bit signed integer values, truncate the 32bit product to the 18 most significant bits by rightshifting, round the truncated value by adding 1, and write bits [16:1] to the destination. 
_mm_mulhrs_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Multiplies packed 16bit signed integer values, truncates the 32bit products to the 18 most significant bits by rightshifting, rounds the truncated value by adding 1, and writes bits [16:1] to the destination. 
_mm_mullo_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Multiply the packed 16bit integers in 
_mm_mullo_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Multiply the packed 32bit integers in 
_mm_or_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compute the bitwise OR of 
_mm_or_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Bitwise OR of packed singleprecision (32bit) floatingpoint elements. 
_mm_or_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compute the bitwise OR of 128 bits (representing integer data) in 
_mm_packs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert packed 16bit integers from 
_mm_packs_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert packed 32bit integers from 
_mm_packs_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Convert packed 16bit integers from 
_mm_packs_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Convert packed 32bit integers from 
_mm_packus_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Convert packed 16bit integers from 
_mm_packus_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Convert packed 32bit integers from 
_mm_pause^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Provide a hint to the processor that the code sequence is a spinwait loop. 
_mm_permute_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse2 Shuffle doubleprecision (64bit) floatingpoint elements in 
_mm_permute_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx,sse Shuffle singleprecision (32bit) floatingpoint elements in 
_mm_permutevar_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Shuffle doubleprecision (64bit) floatingpoint elements in 
_mm_permutevar_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Shuffle singleprecision (32bit) floatingpoint elements in 
_mm_prefetch^{⚠} 
[ Experimental ] [x86 and target feature ] sse Fetch the cache line that contains address 
_mm_rcp_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Return the approximate reciprocal of packed singleprecision (32bit)
floatingpoint elements in 
_mm_rcp_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Return the approximate reciprocal of the first singleprecision
(32bit) floatingpoint element in 
_mm_round_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the packed doubleprecision (64bit) floatingpoint elements in 
_mm_round_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the packed singleprecision (32bit) floatingpoint elements in 
_mm_round_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the lower doubleprecision (64bit) floatingpoint element in 
_mm_round_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Round the lower singleprecision (32bit) floatingpoint element in 
_mm_rsqrt_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Return the approximate reciprocal square root of packed singleprecision
(32bit) floatingpoint elements in 
_mm_rsqrt_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Return the approximate reciprocal square root of the fist singleprecision
(32bit) floatingpoint elements in 
_mm_sad_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Sum the absolute differences of packed unsigned 8bit integers. 
_mm_sad_pu8^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Subtracts the corresponding 8bit unsigned integer values of the two 64bit vector operands and computes the absolute value for each of the difference. Then sum of the 8 absolute differences is written to the bits [15:0] of the destination; the remaining bits [63:16] are cleared. 
_mm_set1_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Broadcast 8bit integer 
_mm_set1_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Broadcast 16bit integer 
_mm_set1_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Broadcast 32bit integer 
_mm_set1_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2,mmx Initializes both values in a 128bit vector of [2 x i64] with the specified 64bit value. 
_mm_set1_epi64x^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Broadcast 64bit integer 
_mm_set1_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Broadcast doubleprecision (64bit) floatingpoint value a to all elements of the return value. 
_mm_set1_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Broadcast 8bit integer a to all all elements of dst. 
_mm_set1_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Broadcast 16bit integer a to all all elements of dst. 
_mm_set1_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Broadcast 32bit integer a to all all elements of dst. 
_mm_set1_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Construct a 
_mm_set_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Set packed 8bit integers with the supplied values. 
_mm_set_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Set packed 16bit integers with the supplied values. 
_mm_set_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Set packed 32bit integers with the supplied values. 
_mm_set_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2,mmx Initializes both 64bit values in a 128bit vector of [2 x i64] with the specified 64bit integer values. 
_mm_set_epi64x^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Set packed 64bit integers with the supplied values, from highest to lowest. 
_mm_set_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Set packed doubleprecision (64bit) floatingpoint elements in the return value with the supplied values. 
_mm_set_pd1^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Broadcast doubleprecision (64bit) floatingpoint value a to all elements of the return value. 
_mm_set_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Set packed 8bit integers in dst with the supplied values. 
_mm_set_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Set packed 16bit integers in dst with the supplied values. 
_mm_set_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Set packed 32bit integers in dst with the supplied values. 
_mm_set_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Construct a 
_mm_set_ps1^{⚠} 
[ Experimental ] [x86 and target feature ] sse Alias for 
_mm_set_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Copy doubleprecision (64bit) floatingpoint element 
_mm_set_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Construct a 
_mm_setcsr^{⚠} 
[ Experimental ] [x86 and target feature ] sse Set the MXCSR register with the 32bit unsigned integer value. 
_mm_setr_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Set packed 8bit integers with the supplied values in reverse order. 
_mm_setr_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Set packed 16bit integers with the supplied values in reverse order. 
_mm_setr_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Set packed 32bit integers with the supplied values in reverse order. 
_mm_setr_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2,mmx Constructs a 128bit integer vector, initialized in reverse order with the specified 64bit integral values. 
_mm_setr_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Set packed doubleprecision (64bit) floatingpoint elements in the return value with the supplied values in reverse order. 
_mm_setr_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Set packed 8bit integers in dst with the supplied values in reverse order. 
_mm_setr_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Set packed 16bit integers in dst with the supplied values in reverse order. 
_mm_setr_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Set packed 32bit integers in dst with the supplied values in reverse order. 
_mm_setr_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Construct a 
_mm_setzero_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Returns packed doubleprecision (64bit) floatingpoint elements with all zeros. 
_mm_setzero_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Construct a 
_mm_setzero_si64^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Constructs a 64bit integer vector initialized to zero. 
_mm_setzero_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Returns a vector with all elements set to zero. 
_mm_sfence^{⚠} 
[ Experimental ] [x86 and target feature ] sse Perform a serializing operation on all storetomemory instructions that were issued prior to this instruction. 
_mm_shuffle_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Shuffle bytes from 
_mm_shuffle_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shuffle 32bit integers in 
_mm_shuffle_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Constructs a 128bit floatingpoint vector of [2 x double] from two 128bit vector parameters of [2 x double], using the immediatevalue parameter as a specifier. 
_mm_shuffle_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Shuffle packed 8bit integers in 
_mm_shuffle_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Shuffles the 4 16bit integers from a 64bit integer vector to the destination, as specified by the immediate value operand. 
_mm_shuffle_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Shuffle packed singleprecision (32bit) floatingpoint elements in 
_mm_shufflehi_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shuffle 16bit integers in the high 64 bits of 
_mm_shufflelo_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shuffle 16bit integers in the low 64 bits of 
_mm_sign_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Negate packed 8bit integers in 
_mm_sign_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Negate packed 16bit integers in 
_mm_sign_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3 Negate packed 32bit integers in 
_mm_sign_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Negate packed 8bit integers in 
_mm_sign_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Negate packed 16bit integers in 
_mm_sign_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] ssse3,mmx Negate packed 32bit integers in 
_mm_sll_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 16bit integers in 
_mm_sll_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 32bit integers in 
_mm_sll_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 64bit integers in 
_mm_slli_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 16bit integers in 
_mm_slli_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 32bit integers in 
_mm_slli_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 64bit integers in 
_mm_slli_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift 
_mm_sllv_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm_sllv_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 64bit integers in 
_mm_sqrt_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the square root of each of the values in 
_mm_sqrt_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Return the square root of packed singleprecision (32bit) floatingpoint
elements in 
_mm_sqrt_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_sqrt_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Return the square root of the first singleprecision (32bit)
floatingpoint element in 
_mm_sra_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 16bit integers in 
_mm_sra_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 32bit integers in 
_mm_srai_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 16bit integers in 
_mm_srai_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 32bit integers in 
_mm_srav_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm_srl_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 16bit integers in 
_mm_srl_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 32bit integers in 
_mm_srl_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 64bit integers in 
_mm_srli_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 16bit integers in 
_mm_srli_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 32bit integers in 
_mm_srli_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift packed 64bit integers in 
_mm_srli_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Shift 
_mm_srlv_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 32bit integers in 
_mm_srlv_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] avx2 Shift packed 64bit integers in 
_mm_store1_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Store the lower doubleprecision (64bit) floatingpoint element from 
_mm_store1_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Store the lowest 32 bit float of 
_mm_store_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Store 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm_store_pd1^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Store the lower doubleprecision (64bit) floatingpoint element from 
_mm_store_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Store four 32bit floats into aligned memory. 
_mm_store_ps1^{⚠} 
[ Experimental ] [x86 and target feature ] sse Alias for 
_mm_store_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Stores the lower 64 bits of a 128bit vector of [2 x double] to a memory location. 
_mm_store_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Store 128bits of integer data from 
_mm_store_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Store the lowest 32 bit float of 
_mm_storeh_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Stores the upper 64 bits of a 128bit vector of [2 x double] to a memory location. 
_mm_storeh_pi^{⚠} 
[ Experimental ] [x86 and target feature ] sse Store the upper half of 
_mm_storel_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Store the lower 64bit integer 
_mm_storel_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Stores the lower 64 bits of a 128bit vector of [2 x double] to a memory location. 
_mm_storel_pi^{⚠} 
[ Experimental ] [x86 and target feature ] sse Store the lower half of 
_mm_storer_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Store 2 doubleprecision (64bit) floatingpoint elements from 
_mm_storer_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Store four 32bit floats into aligned memory in reverse order. 
_mm_storeu_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Store 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm_storeu_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Store four 32bit floats into memory. There are no restrictions on memory
alignment. For aligned memory 
_mm_storeu_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Store 128bits of integer data from 
_mm_stream_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Stores a 128bit floating point vector of [2 x double] to a 128bit aligned memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon). 
_mm_stream_pi^{⚠} 
[ Experimental ] [x86 and target feature ] sse,mmx Store 64bits of integer data from a into memory using a nontemporal memory hint. 
_mm_stream_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Stores 
_mm_stream_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse4a Nontemporal store of 
_mm_stream_si32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Stores a 32bit integer value in the specified memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon). 
_mm_stream_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Stores a 128bit integer vector to a 128bit aligned memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon). 
_mm_stream_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse4a Nontemporal store of 
_mm_sub_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Subtract packed 8bit integers in 
_mm_sub_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Subtract packed 16bit integers in 
_mm_sub_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Subtract packed 32bit integers in 
_mm_sub_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Subtract packed 64bit integers in 
_mm_sub_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Subtract packed doubleprecision (64bit) floatingpoint elements in 
_mm_sub_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed 8bit integers in 
_mm_sub_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed 16bit integers in 
_mm_sub_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed 32bit integers in 
_mm_sub_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Subtracts __m128 vectors. 
_mm_sub_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return a new vector with the low element of 
_mm_sub_si64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2,mmx Subtracts signed or unsigned 64bit integer values and writes the difference to the corresponding bits in the destination. 
_mm_sub_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Subtracts the first component of 
_mm_subs_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Subtract packed 8bit integers in 
_mm_subs_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Subtract packed 16bit integers in 
_mm_subs_epu8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Subtract packed unsigned 8bit integers in 
_mm_subs_epu16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Subtract packed unsigned 16bit integers in 
_mm_subs_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed 8bit integers in 
_mm_subs_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed 16bit integers in 
_mm_subs_pu8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed unsigned 8bit integers in 
_mm_subs_pu16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Subtract packed unsigned 16bit integers in 
_mm_test_all_ones^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Tests whether the specified bits in 
_mm_test_all_zeros^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Tests whether the specified bits in a 128bit integer vector are all zeros. 
_mm_test_mix_ones_zeros^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Tests whether the specified bits in a 128bit integer vector are neither all zeros nor all ones. 
_mm_testc_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 128 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm_testc_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 128 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm_testc_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Tests whether the specified bits in a 128bit integer vector are all ones. 
_mm_testnzc_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 128 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm_testnzc_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 128 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm_testnzc_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Tests whether the specified bits in a 128bit integer vector are neither all zeros nor all ones. 
_mm_testz_pd^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 128 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm_testz_ps^{⚠} 
[ Experimental ] [x86 and target feature ] avx Compute the bitwise AND of 128 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm_testz_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse4.1 Tests whether the specified bits in a 128bit integer vector are all zeros. 
_mm_tzcnt_32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi1 Counts the number of trailing least significant zero bits. 
_mm_ucomieq_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_ucomieq_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_ucomige_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_ucomige_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_ucomigt_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_ucomigt_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_ucomile_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_ucomile_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_ucomilt_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_ucomilt_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_ucomineq_sd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compare the lower element of 
_mm_ucomineq_ss^{⚠} 
[ Experimental ] [x86 and target feature ] sse Compare two 32bit floats from the loworder bits of 
_mm_undefined_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return vector of type __m128d with undefined elements. 
_mm_undefined_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Return vector of type __m128 with undefined elements. 
_mm_undefined_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Return vector of type __m128i with undefined elements. 
_mm_unpackhi_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Unpack and interleave 8bit integers from the high half of 
_mm_unpackhi_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Unpack and interleave 16bit integers from the high half of 
_mm_unpackhi_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Unpack and interleave 32bit integers from the high half of 
_mm_unpackhi_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Unpack and interleave 64bit integers from the high half of 
_mm_unpackhi_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 The resulting 
_mm_unpackhi_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Unpacks the upper four elements from two 
_mm_unpackhi_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Unpacks the upper two elements from two 
_mm_unpackhi_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Unpacks the upper element from two 
_mm_unpackhi_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Unpack and interleave singleprecision (32bit) floatingpoint elements
from the higher half of 
_mm_unpacklo_epi8^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Unpack and interleave 8bit integers from the low half of 
_mm_unpacklo_epi16^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Unpack and interleave 16bit integers from the low half of 
_mm_unpacklo_epi32^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Unpack and interleave 32bit integers from the low half of 
_mm_unpacklo_epi64^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Unpack and interleave 64bit integers from the low half of 
_mm_unpacklo_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 The resulting 
_mm_unpacklo_pi8^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Unpacks the lower four elements from two 
_mm_unpacklo_pi16^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Unpacks the lower two elements from two 
_mm_unpacklo_pi32^{⚠} 
[ Experimental ] [x86 and target feature ] mmx Unpacks the lower element from two 
_mm_unpacklo_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Unpack and interleave singleprecision (32bit) floatingpoint elements
from the lower half of 
_mm_xor_pd^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compute the bitwise OR of 
_mm_xor_ps^{⚠} 
[ Experimental ] [x86 and target feature ] sse Bitwise exclusive OR of packed singleprecision (32bit) floatingpoint elements. 
_mm_xor_si128^{⚠} 
[ Experimental ] [x86 and target feature ] sse2 Compute the bitwise XOR of 128 bits (representing integer data) in 
_mulx_u32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi2 Unsigned multiply without affecting flags. 
_pdep_u32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi2 Scatter contiguous low order bits of 
_pext_u32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi2 Gathers the bits of 
_popcnt32^{⚠} 
[ Experimental ] [x86 and target feature ] popcnt Counts the bits that are set. 
_rdrand16_step^{⚠} 
[ Experimental ] [x86 and target feature ] rdrand Read a hardware generated 16bit random value and store the result in val. Return 1 if a random value was generated, and 0 otherwise. 
_rdrand32_step^{⚠} 
[ Experimental ] [x86 and target feature ] rdrand Read a hardware generated 32bit random value and store the result in val. Return 1 if a random value was generated, and 0 otherwise. 
_rdseed16_step^{⚠} 
[ Experimental ] [x86 and target feature ] rdseed Read a 16bit NIST SP80090B and SP80090C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise. 
_rdseed32_step^{⚠} 
[ Experimental ] [x86 and target feature ] rdseed Read a 32bit NIST SP80090B and SP80090C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise. 
_rdtsc^{⚠} 
[ Experimental ] [x86 ] Reads the current value of the processor’s timestamp counter. 
_t1mskc_u32^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Clears all bits below the least significant zero of 
_t1mskc_u64^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Clears all bits below the least significant zero of 
_tzcnt_u32^{⚠} 
[ Experimental ] [x86 and target feature ] bmi1 Counts the number of trailing least significant zero bits. 
_tzmsk_u32^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets all bits below the least significant one of 
_tzmsk_u64^{⚠} 
[ Experimental ] [x86 and target feature ] tbm Sets all bits below the least significant one of 
_xgetbv^{⚠} 
[ Experimental ] [x86 and target feature ] xsave Reads the contents of the extended control register 
_xrstor^{⚠} 
[ Experimental ] [x86 and target feature ] xsave Perform a full or partial restore of the enabled processor states using
the state information stored in memory at 
_xrstors^{⚠} 
[ Experimental ] [x86 and target feature ] xsave,xsaves Perform a full or partial restore of the enabled processor states using the
state information stored in memory at 
_xsave^{⚠} 
[ Experimental ] [x86 and target feature ] xsave Perform a full or partial save of the enabled processor states to memory at

_xsavec^{⚠} 
[ Experimental ] [x86 and target feature ] xsave,xsavec Perform a full or partial save of the enabled processor states to memory
at 
_xsaveopt^{⚠} 
[ Experimental ] [x86 and target feature ] xsave,xsaveopt Perform a full or partial save of the enabled processor states to memory at

_xsaves^{⚠} 
[ Experimental ] [x86 and target feature ] xsave,xsaves Perform a full or partial save of the enabled processor states to memory at

_xsetbv^{⚠} 
[ Experimental ] [x86 and target feature ] xsave Copy 64bits from 
has_cpuid 
[ Experimental ] [x86 ] Does the host support the 