System Trap / Function - Send and/or receive a message from the target port.
mach_msg_return_t mach_msg (mach_msg_header_t msg, mach_msg_option_t option, mach_msg_size_t send_size, mach_msg_size_t receive_limit, mach_port_t receive_name, mach_msg_timeout_t timeout, mach_port_t notify); mach_msg_return_t mach_msg_overwrite (mach_msg_header_t* send_msg, mach_msg_option_t option, mach_msg_size_t send_size, mach_msg_size_t receive_limit, mach_port_t receive_name, mach_msg_timeout_t timeout, mach_port_t notify, mach_msg_header_t *receive_msg, mach_msg_size_t receive_msg_size);
The mach_msg system call sends and receives Mach messages. Mach messages contain data, which can include port rights and addresses of large regions of memory. mach_msg uses the same buffer for sending and receiving a message; the other calls permit separate send and receive buffers (although they may be specified to be the same). If the option argument contains MACH_SEND_MSG, the call sends a message. The send_size argument specifies the size of the message buffer (header and body) to send. The msgh_remote_port field of the message header specifies the destination of the message. If the option argument contains MACH_RCV_MSG, it receives a message. The receive_limit argument specifies the size of a buffer that will receive the message; messages that are larger are not received. The receive_name argument specifies the port or port set from which to receive.
If the option argument contains both MACH_SEND_MSG and MACH_RCV_MSG, then mach_msg does both send and receive operations (in that order). If the send operation encounters an error (any return code other than MACH_MSG_SUCCESS), the call returns immediately without attempting the receive operation. Semantically the combined call is equivalent to separate send and receive calls, but it saves a system call and enables other internal optimizations. If the option argument specifies neither MACH_SEND_MSG nor MACH_RCV_MSG, mach_msg does nothing. Some options, like MACH_SEND_TIMEOUT and MACH_RCV_TIMEOUT, share a supporting argument. If these options are used together, they make independent use of the supporting argument's value.
The Mach kernel provides message-oriented, capability-based inter-process communication. The inter-process communication (IPC) primitives efficiently support many different styles of interaction, including remote procedure calls, object-oriented distributed programming, streaming of data, and sending very large amounts of data.
The IPC primitives operate on three abstractions: messages, ports, and port sets. User tasks access all other kernel services and abstractions via the IPC primitives.
The message primitives let tasks send and receive messages. Tasks send messages to ports. Messages sent to a port are delivered reliably (messages may not be lost) and are received in the order in which they were sent via send rights by a given sending task (or a given kernel). (Messages sent to send-once rights are unordered.)
Messages contain a fixed-size header and a variable-sized message body containing kernel and user data, and a variable-size trailer of kernel appended message attributes. The header describes the destination and the size of the message (header plus body). The message body contains descriptions of additional port rights to be transmitted, descriptions of "out-of-line" memory regions to be sent and a variable amount of user data, which typically includes type conversion information. The out-of-line memory regions (including out-of-line port arrays) are (typically) disjoint from the message body. The IPC implementation makes use of the VM system to efficiently transfer large amounts of data. The message can contain the addresses of regions of the sender's address space which should be transferred as part of the message.
When a task receives a message containing such out-of-line regions of data, the data can appear in unused portions or overwrite an existing portion of the receiver's address space (depending on the requested receive options). Under favorable circumstances, the transmission of out-of-line data is optimized so that sender and receiver share the physical pages of data copy-on-write, and no actual data copy occurs unless the pages are written. Regions of memory up to 4 gigabytes may be sent in this manner.
Ports hold a queue of messages. Tasks operate on a port to send and receive messages by exercising capabilities (rights) for the port. Multiple tasks can hold send rights for a port. Tasks can also hold send-once rights, which grant the ability to send a single message. Only one task can hold the receive capability (receive right) for a port.
Port rights can be transferred between tasks via messages. The sender of a message can specify in the message that the message contains a port right. If a message contains a receive right for a port, the receive right is removed from the sender of the message and transferred to the receiver of the message. While the receive right is in transit, tasks holding send rights can still send messages to the port, and they are queued until a task acquires the receive right and uses it to receive the messages.
Tasks can receive messages from ports and port sets. The port set abstraction allows a single thread to wait for a message from any of several ports. Tasks manipulate port sets with a port set name, which is taken from the same name space as are the port rights. The port-set name may not be transferred in a message. A port set holds receive rights, and a receive operation on a port set blocks waiting for a message sent to any of the constituent ports. A port may not be- long to more than one port set, and if a port is a member of a port set, the holder of the receive right can't receive directly from the port.
Port rights are a secure, location-independent way of naming ports. The port queue is a protected data structure, only accessible via the kernel's exported message primitives. Rights are also protected by the kernel; there is no way for a malicious user task to guess a port's internal name and send a message to a port to which it shouldn't have access. Port rights do not carry any location in- formation. When a receive right for a port moves from task to task, and even between tasks on different machines, the send rights for the port remain unchanged and continue to function.
Each task has its own space of port rights. Port rights are named with positive (unsigned) integers. For all architectures, sizeof (mach_port_t) = sizeof (mach_port_name_t) = sizeof (void*) and so user space addresses may be used as port names, except for the reserved values MACH_PORT_NULL (0) and MACH_PORT_DEAD (all 1 bits). When the kernel chooses a name for a new right, however, it is free to pick any unused name (one which denotes no right) in the space.
There are three basic kinds of rights: receive rights, send rights and send-once rights. A port name can name any of these types of rights, or name a port-set, be a dead name, or name nothing. Dead names are not capabilities. They act as place-holders to prevent a name from being otherwise used.
A port is destroyed, or dies, when its receive right is de-allocated. When a port dies, send and send-once rights for the port turn into dead names. Any messages queued at the port are destroyed, which de-allocates the port rights and out-of-line memory in the messages.
Each send-once right held by a task has a different name. In contrast, when a task holds send rights or a receive right for a port, the rights share a single name.
Tasks may hold multiple user-references for send rights. When a task receives a send right which it already holds, the kernel increments the right's user-reference count. When a task de-allocates a send right, the kernel decrements its user-reference count, and the task only loses the send right when the count goes to zero.
Send-once rights always have a user reference count of one. Tasks may hold multiple user references for dead names. Each send-once right generated guarantees the receipt of a single message, either a message sent to that send-once right or, if the send-once right is in any way destroyed, a send-once notification.
A message can carry port rights; the msgh_remote or msgh_local fields in the message header or the disposition field in a message body descriptor specify the type of port right and how the port right is to be extracted from the caller. The values MACH_PORT_NULL and MACH_PORT_DEAD are valid in place of a port right in a message body.
In a sent message, the following mach_msg_type_name_t values denote port rights:
The following mach_msg_type_name_t values in a received message indicate that it carries port rights:
It is also possible to send a (nearly unbounded) array of port rights "out-of-line". All of the rights named by the array must be of the same type. The array is physically copied with the message body proper. The array of port right (names) can be received by the receiver using the same options available for out-of-line data reception described below.
A message can contain one or more regions of the sender's address space which are to be transferred as part of the message. The message carries a logical copy of the memory. For this "out-of-line" memory, the kernel can copy the data or use virtual memory techniques to defer any actual page copies unless the sender or the receiver modifies the data, the physical pages remain shared.
The sender of the message must explicitly request an out-of-line transfer. Such a region is described as an arbitrary region of the sender's address space. The sender always sees this memory as being copied to the receiver.
For each region, the sender has a de-allocate option. If the option is set and the out-of-line memory region is not null, then the region is implicitly de-allocated from the sender, as if by vm_deallocate. In particular, the start address is truncated down and the end address rounded up so that every page overlapped by the memory region is de-allocated (thereby possibly de-allocating more memory than is effectively transmitted). The use of this option effectively changes the memory copy to a memory movement. Aside from possibly optimizing the sender's use of memory, the de-allocation option allows the kernel to more efficiently handle the transfer of memory.
For each region, the sender has the choice of permitting the kernel to choose a transmission strategy or the choice of requiring physical copy:
In a received message, this flag indicates that the kernel transmitted a virtual copy. Access to the received memory may involve interactions with the memory manager managing the sender's original data. Integri- ty-conscious receivers should exercise caution when dealing with out- of-line memory from un-trustworthy sources. Receivers concerned about deterministic access time should also exercise caution. The dynamic allocation option guarantees that the virtual copy will not be di- rectly referenced during the act of receiving the message.
In a received message, this flag indicates that the kernel did transmit a physical copy.
The receiver has two options for the reception of out-of-line memory (or "out-of-line" port arrays): allocation and overwrite. In the absence of the MACH_RCV_OVERWRITE option, all out-of-line re- gions are dynamically allocated. Allocated out-of-line memory arrives somewhere in the receiver's address space as new memory. It has the same inheritance and protection attributes as newly vm_allocate'ed memory. The receiver has the responsibility of de-allocating (with vm_deallocate) the memory when it is no longer needed. If the message contains more than one region, each will be allocated its own region, not necessarily contiguously. If the sender's data was transmitted as a virtual copy the allocated region will have the same data alignment within the page; otherwise, the received data will appear starting at the beginning of a page.
If the MACH_RCV_OVERWRITE option is set, the receiver can specify how each received region is to be processed (dynamically allocated as described above, or written over existing memory). With this option, the contents of the receive buffer (receive_msg) are examined by the kernel. The kernel scans the descriptors in the receive buffer "message" to determine how to handle each out-of-line region. (Note: whereas receive_limit is the maximum size of the receive buffer, receive_msg_size is the amount filled in with this "message".) The kernel uses each out-of-line data descriptor (in order) to specify the processing for each received data region in turn, each out-of-line port array descriptor is used correspondingly. (Intermingled port descriptors are ignored when matching descriptors between the incoming message and the receive buffer list.)
The copy option in the matching descriptor specifies the processing:
If not enough descriptors appear in the receive buffer to describe all received regions, additional regions are dynamically allocated. If the receiver specifies more descriptors than there are regions in the received message, the additional descriptors are ignored (and do not appear in the final received message).
Note that the receive buffer descriptors will be overwritten: The size fields in descriptors will be updated (when scanned, they specified the maximum sizes of regions, when received, they specify the actual sizes of received regions). The copy fields in descriptors will be updated (when scanned, they specified allocate versus overwrite, when received, they indicate whether the region was physically or virtually copied). The descriptors may appear in different positions (given intermingled port descriptors). Descriptors that were not used (because there were not that many received regions) will be discarded.
Null out-of-line memory is legal. If the out-of-line region size is zero, then the region's specified address is ignored. A receive allocated null out-of-line memory region always has a zero address. Unaligned addresses and region sizes that are not page multiples are legal. A received message can also contain regions with unaligned addresses and sizes which are not multiples of the page size.
The send operation queues a message to a port. The message carries a copy of the caller's data. After the send, the caller can freely modify the message buffer or the out-of-line memory regions and the message contents will remain unchanged.
The message carries with it the security ID of the sender, which the receiver can request in the message trailer.
Message delivery is reliable and sequenced. Reception of a message guarantees that all messages previously sent to the port by a single task (or a single kernel) via send rights have been received and that they are received in the order in which they were sent. Messages sent to send-once rights are unordered.
If the destination port's queue is full, several things can happen. If the message is sent to a send-once right (msgh_remote_port carries a send-once right), then the kernel ignores the queue limit and delivers the message. Otherwise the caller blocks until there is room in the queue, unless the MACH_SEND_TIMEOUT option is used. If a port has several blocked senders, then any of them may queue the next message when space in the queue becomes available, with the proviso that a blocked sender will not be indefinitely starved. These options modify MACH_SEND_MSG. If MACH_SEND_MSG is not also specified, they are ignored.
The queueing of a message carrying receive rights may create a circular loop of receive rights and messages, which can never be received. For example, a message carrying a receive right can be sent to that receive right. This situation is not an error, but the kernel will garbage-collect such loops, destroying the messages. Some return codes, like MACH_SEND_TIMED_OUT, imply that the message was almost sent, but could not be queued. In these situations, the kernel tries to return the message contents to the caller with a pseudo-receive operation. This prevents the loss of port rights or memory which only exist in the message, for example, a receive right which was moved into the message, or out-of-line memory sent with the de-allocate option.
The intent of the pseudo-receive operation is to restore, as best as possible, the state prior to attempting the send. This involves restoring the port rights and out-of-line memory regions contained in the message. The port right names and out-of-line addresses in the message send buffer are updated to reflect the new values resulting from their effective reception. The pseudo-receive handles the des- tination and reply rights as any other rights; they are not reversed as is the appearance in a normal received message. Also, no trailer is appended to the message. After the pseudo-receive, the message is ready to be resent. If the message is not resent, note that out-of-line memory regions may have moved and some port rights may have changed names.
Although unlikely, the pseudo-receive operation may encounter resource shortages. This is similar to a MACH_RCV_BODY_ERROR return code from a receive operation. When this happens, the normal send return codes are augmented with the MACH_MSG_IPC_SPACE, MACH_MSG_VM_SPACE, MACH_MSG_IPC_KERNEL and MACH_MSG_VM_KERNEL bits to indicate the nature of the resource shortage.
The receive operation de-queues a message from a port. The receiving task acquires the port rights and out-of-line memory regions carried in the message. The receive_name argument specifies a port or port set from which to receive. If a port is specified, the caller must possess the receive right for the port and the port must not be a member of a port set. If no message is present, the call blocks, subject to the MACH_RCV_TIMEOUT option.
If a port set is specified, the call will receive a message sent to any of the member ports. It is permissible for the port set to have no member ports, and ports may be added and removed while a receive from the port set is in progress. The received message can come from any of the member ports which have messages, with the proviso that a member port with messages will not be indefinitely starved. The msgh_local_port field in the received message header specifies from which port in the port set the message came.
The receive_limit argument specifies the size of the caller's message buffer (which must be big enough for the message header, body and trailer); the msgh_size field of the received message indicates the actual size of the received message header and body. The mach_msg call will not receive a message larger than receive_limit. Messages that are too large are destroyed, unless the MACH_RCV_LARGE option is used. Following the received data, at the next natural boundary, is a message trailer. The msgh_size field of the received message does not include the length of this trailer; the trailer's length is given by the msgh_trailer_size field within the trailer. The receiver of a message is given a choice as to what trailer format is desired, and, within that format, which of the leading trailer attributes are desired (that is, to get trailer element three, the receiver must also accept elements one and two). For any given trailer format (of which there is currently only one), the trailer is compatibly extended by adding additional elements to the end.
Received messages are stamped (in the trailer) with a sequence number, taken from the port from which the message was received. (Messages received from a port set are stamped with a sequence number from the appropriate member port.) Newly created ports start with a zero sequence number, and the sequence number is reset to zero whenever the port's receive right moves between tasks. When a message is de-queued from the port, it is stamped with the port's sequence number and the port's sequence number is then incremented. (Note that this occurs whether or not the receiver requests the sequence number in the trail- er.) The de-queue and increment operations are atomic, so that multiple threads receiving messages from a port can use the msgh_seqno field to reconstruct the original order of the messages.
The destination and reply ports are reversed in a received message header. The msgh_local_port field carries the name of the destination port, from which the message was received, and the msgh_remote_port field carries the reply port right. The bits in msgh_bits are also reversed. The MACH_MSGH_BITS_LOCAL bits have a value of MACH_MSG_TYPE_PORT_SEND_ONCE or MACH_MSG_TYPE_PORT_SEND depending on the type of right to which the message was sent. The MACH_MSGH_BITS_REMOTE bits describe the reply port right.
A received message can contain port rights and out-of-line memory. The msgh_local_port field does not carry a port right; the act of receiving the message consumes the send or send-once right for the destination port. The msgh_remote_port field does carry a port right, and the message can carry additional port rights and memory if the MACH_MSGH_BITS_COMPLEX bit is set. Received port rights and memory should be consumed or de-allocated in some fashion. In almost all cases, msgh_local_port will specify the name of a receive right, either receive_name, or, if receive_name is a port set, a member of receive_name.
If other threads are concurrently manipulating the receive right, the situation is more complicated. If the receive right is renamed during the call, then msgh_local_port specifies the right's new name. If the caller loses the receive right after the message was de-queued from it, then mach_msg will proceed instead of returning MACH_RCV_PORT_DIED. If the receive right was destroyed, then msgh_local_port specifies MACH_PORT_DEAD. If the receive right still exists, but isn't held by the caller, then msgh_local_port specifies MACH_PORT_NULL.
The following options modify MACH_RCV_MSG. If MACH_RCV_MSG is not also specified, they are ignored.
The following trailer elements are supported:
If a resource shortage prevents the reception of a port right, the port right is destroyed and the caller sees the name MACH_PORT_NULL. If a resource shortage prevents the reception of an out-of-line memory region, the region is destroyed and the caller sees a zero address. In addition, the corresponding element in the size array is set to zero. A task never receives port rights or memory for which it is not told.
The MACH_RCV_HEADER_ERROR return code indicates a resource shortage in the reception of the message header. The reply port and all port rights and memory in the message are destroyed. The caller receives the message header with all fields correct except for the reply port.
The MACH_RCV_BODY_ERROR return code indicates a resource shortage in the reception of the message body. The message header, including the reply port, is correct. The kernel attempts to transfer all port rights and memory regions in the body, and only destroys those that can't be transferred.
The mach_msg call handles port rights in the message header atomically. Out-of-line memory and port rights in the message body do not enjoy this atomicity guarantee. These elements may be processed front-to-back, back-to-front, in some random order, or even atomically.
For example, consider sending a message with the destination port specified as MACH_MSG_TYPE_MOVE_SEND and the reply port specified as MACH_MSG_TYPE_COPY_SEND. The same send right, with one user-refer- ence, is supplied for both the msgh_remote_port and msgh_local_port fields. Because mach_msg processes the port rights atomically, this succeeds. If msgh_remote_port were processed before msgh_local_port, then mach_msg would return MACH_SEND_INVALID_REPLY in this situation.
On the other hand, suppose the destination and reply port are both specified as MACH_MSG_TYPE_MOVE_SEND, and again the same send right with one user-reference is supplied for both. Now the send operation fails, but because it processes the rights atomically, mach_msg can return either MACH_SEND_INVALID_DEST or MACH_SEND_INVALID_REPLY.
For example, consider receiving a message at the same time another thread is deallocating the destination receive right. Suppose the reply port field carries a send right for the destination port. If the de-allocation happens before the dequeuing, the receiver gets MACH_RCV_PORT_DIED. If the de-allocation happens after the receive, the msgh_local_port and the msgh_remote_port fields both specify the same right, which becomes a dead name when the receive right is de-allocated. If the de-allocation happens between the de-queue and the receive, the msgh_local_port and msgh_remote_port fields both specify MACH_PORT_DEAD. Because the rights are processed atomically, it is not possible for just one of the two fields to hold MACH_PORT_DEAD.
The MACH_RCV_NOTIFY option provides a more likely example. Suppose a message carrying a send-once right reply port is received with MACH_RCV_NOTIFY at the same time the reply port is destroyed. If the reply port is destroyed first, then msgh_remote_port specifies MACH_PORT_DEAD and the kernel does not generate a dead-name notification. If the reply port is destroyed after it is received, then msgh_remote_port specifies a dead name for which the kernel generates a dead-name notification. Either the reply port is dead on arrival or notification is requested.
mach_msg and mach_msg_overwrite are wrappers for a system call. They have the responsibility for repeating the interrupted system call.
If MACH_RCV_TIMEOUT is used without MACH_RCV_INTERRUPT, then the timeout duration might not be accurate. When the call is interrupted and automatically retried, the original timeout is used. If interrupts occur frequently enough, the timeout interval might never expire. MACH_SEND_TIMEOUT without MACH_SEND_INTERRUPT suffers from the same problem.
The send operation can generate the following return codes. These return codes imply that the call did nothing:
These return codes imply that some or all of the message was destroyed:
These return codes imply that the message was returned to the caller with a pseudo-receive operation:
This return code implies that the message was queued:
The receive operation can generate the following return codes. These return codes imply that the call did not de-queue a message:
These return codes imply that a message was de-queued and destroyed:
Resource shortages can occur after a message is de-queued, while transferring port rights and out-of-line memory regions to the receiving task. The mach_msg call returns MACH_RCV_HEADER_ERROR or MACH_RCV_BODY_ERROR in this situation. These return codes always carry extra bits (bitwise-or'ed) that indicate the nature of the resource shortage:
Functions: vm_allocate, vm_deallocate, vm_write, mach_port_request_notification,
Data Structures: mach_msg_header.