Pointers (Continued).

Based pointer.

A based pointer is a pointer whose value is an offset from the value of another pointer. This can be used to store and load blocks of data, assigning the address of the beginning of the block to the base pointer.

Multiple Indirection.

In some languages, a pointer can reference another pointer, requiring multiple de reference operations to get to the original value. While each level of indirection may add a performance cost, it is sometimes necessary in order to provide correct behavior for complex data structures. For example, in C it is typical to define a linked list in terms of an element that contains a pointer to the next element of the list:

struct element
{
    struct element * next;
    int              value;
};
 
struct element * head = NULL;

This implementation uses a pointer to the first element in the list as a surrogate for the entire list. If a new value is added to the beginning of the list, head has to be changed to point to the new element. Since C arguments are always passed by value, using double indirection allows the insertion to be implemented correctly, and has the desirable side-effect of eliminating special case code to deal with insertions at the front of the list:

// Given a sorted list at *head, insert the element item at the first
// location where all earlier elements have lesser or equal value.
void insert(struct element **head, struct element *item)
{
    struct element ** p;  // p points to a pointer to an element
 
    for (p = head; *p != NULL; p = &(*p)->next)
    {
        if (item->value <= (*p)->value)
            break;
    }
    item->next = *p;
    *p = item;
}
 
// Caller does this:
insert(&head, item);

In this case, if the value of item is less than that of head, the caller's head is properly updated to the address of the new item.

A basic example is in the argv argument to the main function in C (and C++), which is given in the prototype as char **argv – this is because the variable argv itself is a pointer to an array of strings (an array of arrays), so *argv is a pointer to the 0th string (by convention the name of the program), and **argv is the 0th character of the 0th string.

Function Pointer.

In some languages, a pointer can reference executable code, i.e., it can point to a function, method, or procedure. A function pointer will store the address of a function to be invoked. While this facility can be used to call functions dynamically, it is often a favorite technique of virus and other malicious software writers.

    int  a, b, x, y;
    int  sum(int n1, int n2);   // Function with two integer parameters returning an integer value
    int  (*fp)(int, int);       // Function pointer which can point to a function like sum
 
    fp = &sum;                  // fp now points to function sum
    x = (*fp)(a, b);            // Calls function sum with arguments a and b
    y = sum(a, b);              // Calls function sum with arguments a and b

Wild Pointers.

A wild pointer is a pointer that has not been initialized (that is, a wild pointer has not had any address assigned to it) and may make a program crash or behave oddly. In the Pascalor C programming languages, pointers that are not specifically initialized may point to unpredictable addresses in memory.

The following example code shows a wild pointer:

int func(void)
{
    char *p1 = malloc(sizeof(char)); /* (undefined) value of some place on the heap */
    char *p2;       /* wild (uninitialized) pointer */
    *p1 = 'a';      /* This is OK, assuming malloc() has not returned NULL. */
    *p2 = 'b';      /* This invokes undefined behavior */
}

Here, p2 may point to anywhere in memory, so performing the assignment *p2 = 'b' can corrupt an unknown area of memory or trigger a segmentation fault.

Wild Branch.

Where a pointer is used as the address of the entry point to a program or start of a subroutine and is also either uninitialized or corrupted, if a call or jump is nevertheless made to this address, a "wild branch" is said to have occurred. The consequences are usually unpredictable and the error may present itself in several different ways depending upon whether or not the pointer is a "valid" address and whether or not there is (coincidentally) a valid instruction (op-code) at that address. The detection of a wild branch can present one of the most difficult and frustrating debugging exercises since much of the evidence may already have been destroyed beforehand or by execution of one or more inappropriate instructions at the branch location. If available, an instruction set simulator can usually not only detect a wild branch before it takes effect, but also provide a complete or partial trace of its history.

Simulation Using An Array Index.

It is possible to simulate pointer behavior using an index to an (normally one-dimensional) array.

Primarily for languages which do not support pointers explicitly but do support arrays, the array can be thought of and processed as if it were the entire memory range (within the scope of the particular array) and any index to it can be thought of as equivalent to a general purpose register in assembly language (that points to the individual bytes but whose actual value is relative to the start of the array, not its absolute address in memory). Assuming the array is, say, a contiguous 16 megabyte character data structure, individual bytes (or a string of contiguous bytes within the array) can be directly addressed and manipulated using the name of the array with a 31 bit unsigned integer as the simulated pointer (this is quite similar to the C arrays example shown above). Pointer arithmetic can be simulated by adding or subtracting from the index, with minimal additional overhead compared to genuine pointer arithmetic. It is even theoretically possible, using the above technique, together with a suitable instruction set simulator to simulate any machine code or the intermediate (byte code) of any processor / language in another language that does not support pointers at all (for example Java / JavaScript). To achieve this, the binary code can initially be loaded into contiguous bytes of the array for the simulator to "read", interpret and action entirely within the memory contained of the same array. If necessary, to completely avoid buffer overflow problems,bounds checking can usually be action for the compiler (or if not, hand coded in the simulator).

Support In Various Programming Languages.

Ada.

Ada is a strongly typed language where all pointers are typed and only safe type conversions are permitted. All pointers are by default initialized to null, and any attempt to access data through a null pointer causes an exception to be raised. Pointers in Ada are called access types. Ada 83 did not permit arithmetic on access types (although many compiler vendors provided for it as a non-standard feature), but Ada 95 supports “safe” arithmetic on access types via the package System.Storage_Elements.

BASIC.

Several old versions of BASIC for the Windows platform had support for STRPTR() to return the address of a string, and for VARPTR() to return the address of a variable. Visual Basic 5 also had support for OBJPTR() to return the address of an object interface, and for an ADDRESS OF operator to return the address of a function. The types of all of these are integers, but their values are equivalent to those held by pointer types.

Newer dialects of BASIC, such as Free BASIC or BlitzMax, have exhaustive pointer implementations, however. In Free BASIC, arithmetic on ANY pointers (equivalent to C's void*) are treated as though the ANY pointer was a byte width. ANY pointers cannot be de referenced, as in C. Also, casting between ANY and any other type's pointers will not generate any warnings.

dim as integer f = 257
dim as any ptr g = @f
dim as integer ptr i = g
assert(*i = 257)
assert( (g + 4) = (@f + 1) )

C and C++.

In C and C++ pointers are variables that store addresses and can be null. Each pointer has a type it points to, but one can freely cast between pointer types (but not between a function pointer and non-function pointer type). A special pointer type called the “void pointer” allows pointing to any (non-function) variable type, but is limited by the fact that it cannot be de referenced directly. The address itself can often be directly manipulated by casting a pointer to and from an integral type of sufficient size, though the results are implementation-defined and may indeed cause undefined behavior; while earlier C standards did not have an integral type that was guaranteed to be large enough, C99 specifies theuintptr_t typedef name defined in <stdint.h>, but an implementation need not provide it. C++ fully supports C pointers and C typecasting. It also supports a new group of typecasting operators to help catch some unintended dangerous casts at compile-time. SinceC++11, the C++ standard library also provides smart pointers (unique_ptr, shared_ptr and weak_ptr) which can be used in some situations as a safe alternative to primitive C pointers. C++ also supports another form of reference, quite different from a pointer, called simply a reference or reference type. Pointer arithmetic, that is, the ability to modify a pointer's target address with arithmetic operations (as well as magnitude comparisons), is restricted by the language standard to remain within the bounds of a single array object (or just after it), and will otherwise invoke undefined behavior. Adding or subtracting from a pointer moves it by a multiple of the size of the datatype it points to. For example, adding 1 to a pointer to 4-byte integer values will increment the pointer by 4. This has the effect of incrementing the pointer to point at the next element in a contiguous array of integers—which is often the intended result. Pointer arithmetic cannot be performed on void pointers because the void type has no size, and thus the pointed address can not be added to, although gcc and other compilers will perform byte arithmetic on void* as a non-standard extension. For working "directly" with bytes they usually cast pointers to BYTE*, or unsigned char* if BYTE is not defined in the standard library used. Pointer arithmetic, that is, the ability to modify a pointer's target address with arithmetic operations (as well as magnitude comparisons), is restricted by the language standard to remain within the bounds of a single array object (or just after it), and will otherwise invoke undefined behavior. Adding or subtracting from a pointer moves it by a multiple of the size of the data type it points to. For example, adding 1 to a pointer to 4-byte integer values will increment the pointer by 4. This has the effect of incrementing the pointer to point at the next element in a contiguous array of integers—which is often the intended result. Pointer arithmetic cannot be performed on void pointers because the void type has no size, and thus the pointed address can not be added to, although gcc and other compilers will perform byte arithmetic on void* as a non-standard extension. For working "directly" with bytes they usually cast pointers to BYTE*, or unsigned char* if BYTE is not defined in the standard library used.

The void pointer, or void*, is supported in ANSI C and C++ as a generic pointer type. A pointer to void can store an address to any non-function data type, and, in C, is implicitly converted to any other pointer type on assignment, but it must be explicitly cast if de referenced inline. K&R C used char* for the “type-agnostic pointer” purpose (before ANSI C).

int x = 4;
void* q = &x;
int* p = q;  /* void* implicitly converted to int*: valid C, but not C++ */
int i = *p;
int j = *(int*)q; /* when de referencing inline, there is no implicit conversion */

C++ does not allow the implicit conversion of void* to other pointer types, even in assignments. This was a design decision to avoid careless and even unintended casts, though most compilers only output warnings, not errors, when encountering other ill casts.

int x = 4;
void* q = &x;
// int* p = q; This fails in C++: there is no implicit conversion from void*
int* a = (int*)q; // C-style cast
int* b = static_cast<int*>(q); // C++ cast

In C++, there is no void& (reference to void) to complement void* (pointer to void), because references behave like aliases to the variables they point to, and there can never be a variable whose type is void.

C#.

In the C# programming language, pointers are supported only under certain conditions: any block of code including pointers must be marked with the unsafe keyword. Such blocks usually require higher security permissions than pointer less code to be allowed to run. The syntax is essentially the same as in C++, and the address pointed can be either managed or unmanaged memory. However, pointers to managed memory (any pointer to a managed object) must be declared using the fixed keyword, which prevents the garbage collector from moving the pointed object as part of memory management while the pointer is in scope, thus keeping the pointer address valid.

An exception to this is from using the IntPtr structure, which is a safe managed equivalent to int*, and does not require unsafe code. This type is often returned when using methods from the System.Runtime.InteropServices, for example:

// Get 16 bytes of memory from the process's unmanaged memory
IntPtr pointer = System.Runtime.InteropServices.Marshal.AllocHGlobal(16);
 
// Do something with the allocated memory
 
// Free the allocated memory
System.Runtime.InteropServices.Marshal.FreeHGlobal(pointer);

The .NET framework includes many classes and methods in the System and System.Runtime.InteropServices namespaces (such as the Marshal class) which convert .NET types (for example, System.String) to and from many unmanaged types and pointers (for example, LPWSTR or void *) to allow communication with unmanaged code.

COBOL.

The COBOL programming language supports pointers to variables. Primitive or group (record) data objects declared within the LINKAGE SECTION of a program are inherently pointer-based, where the only memory allocated within the program is space for the address of the data item (typically a single memory word). In program source code, these data items are used just like any other WORKING-STORAGE variable, but their contents are implicitly accessed indirectly through their LINKAGE pointers.

Memory space for each pointed-to data object is typically allocated dynamically using external CALL statements or via embedded extended language constructs such as EXECCICS or EXEC SQL statements.

Extended versions of COBOL also provide pointer variables declared with USAGE IS POINTER clauses. The values of such pointer variables are established and modified usingSET and SET ADDRESS statements.

Some extended versions of COBOL also provide PROCEDURE-POINTER variables, which are capable of storing the addresses of executable code.