Thursday, April 9, 2015

Precision qualifiers in SPIR-V

SPIR-V is a bit inconsistent in how it handles types for graphical shaders and compute kernels. Kernels are using sized types, and there are explicit conversions when converting between sizes. Shaders are using 32-bit types for everything, but there are precision decorations that indicates which size is really used, and conversions between sizes are done implicitly. I guess much of this is due to historical reasons in how ESSL defines its types, but I think it would be good to be more consistent in the IR.

ESSL 1 played fast and loose with types. For example, it has an integer type int, but the platform is allowed to implement it as floating point, so it is not necessarily true that "a+1 != a" for a sufficiently large a. ESSL 3 strengthened the type system, so for example high precision integers are now represented as 32-bit values in two's complement form. The rest of this post will use the ESSL 3 semantics.

ESSL does not care much about the size of variables; it has only one integer type "int" and one floating point type "float". But you need to specify which precision to use in calculations by adding precision qualifiers when you declare your variables, such as

highp float x;
Using highp means that the calculations must be done in 32-bit precision, mediump means at least 16-bit precision, and lowp means using at lest 9 bits (yes, "nine". You cannot fit a lowp value in a byte). The compiler may use any size for the variables, as long as the precision is preserved.

So "mediump int" is similar to the int_least16_t type in C, but ESSL permits the compiler to use different precision for different instructions. It can for example use 16-bit precision for one mediump addition, and 32-bit for another, so it is not necessarily true that "a+b == a+b" for mediump integers a and b if the addition overflow 16 bits. The reason for having this semantics is to be able to use the hardware efficiently. Consider for example a processor having two parallel arithmetic units — one 16-bit and one 32-bit. If we have a shader where all instructions are mediump, then we could only reach 50% utilization by executing all instructions as 16-bit. But the backend can now promote half of them to 32-bit and thus be able to double the performance by using both arithmetic units.

SPIR-V is representing this by always using a 32-bit type and decorating the variables and instructions with PrecisionLow, PrecisionMedium, or PrecisionHigh. The IR does not have any type conversions for the precision as the actual type is the same, and it is only the precision of the instruction that differ. But ESSL has requirements on conversions when changing precision in operations that is similar to how size change is handled in other languages:

When converting from a higher precision to a lower precision, if the value is representable by the implementation of the target precision, the conversion must also be exact. If the value is not representable, the behavior is dependent on the type:
  • For signed and unsigned integers, the value is truncated; bits in positions not present in the target precision are set to zero. (Positions start at zero and the least significant bit is considered to be position zero for this purpose.)
  • For floating point values, the value should either clamp to +INF or -INF, or to the maximum or minimum value that the implementation supports. While this behavior is implementation dependent, it should be consistent for a given implementation
It is of course fine to have the conversions implicit in the IR, but the conversions are explicit for the similar conversion fp32 to fp16 in kernels, so it is inconsistent. I would in general want the shader and kernel IR to be as similar as possible in order to avoid confusion when writing SPIR-V tools working on both types of IR, and I think it is possible to improve this with minor changes:
  • The highp precision qualifier means that the compiler must use 32-bit precision, i.e. a highp-qualified type is the same as as the normal non-qualified 32-bit type. So the PrecisionHigh does not tell the compiler anything; it just adds noise to the IR, and can be removed from SPIR-V.
  • Are GPUs really taking advantage of lowp for calculations? I can understand how lowp may be helpful for e.g. saving power in varying interpolation, and those cases are handled by having the PrecisionLow decoration on variables. But it seems unlikely to me that any GPU have added the extra hardware to do arithmetic in lowp precision, and I would assume all GPUs use 16-bit or higher for lowp arithmetic. If so, then PrecisionLow should not be a valid decoration for instructions.
  • The precision decorations are placed on instructions, but it seems better to me to have the them on the type instead. If PrecisionLow and PrecisionHigh are removed, then PrecisionMedium is the only decoration left. But this can be treated as a normal 16-bit type from the optimizers point of view, so we could instead permit both 32- and 16-bit types for graphical shaders, and specify in the execution model that it is allowed to promote 16-bit to 32-bit. Optimizations and type conversions can then be done in exactly the same way as for kernels, and the backend can promote the types as appropriate for the hardware.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.