Pointer and Structs under the hood

Magic of C Pointers and structs

Learning C can be a bit tough, not because it’s super complex, but because it reveals some behind-the-scenes secrets. C is still popular for building low-level stuff, and that’s because it lets you peek under the hood. For folks like me, who started with languages like Python and JavaScript, diving into C can feel a bit tricky.

I wasn’t a big fan of C at first. However, when I needed to tackle LeetCode problems, friends suggested trying C++. So, I went back to my college C lessons, and this time, things got a bit interesting. Even though pointers still seemed a bit confusing, I got the hang of them.

Now, after spending some time exploring the ins and outs of C’s internal workings, I’ve stumbled upon some cool stuff. In this blog, I want to share these neat discoveries in a simpler way.

What’s the Deal with Pointers?

If you’ve done a bit of C coding, you’ve likely come across this thing called a pointer. In simple terms, pointers act like guides for your variables — they keep track of the address of another variable. Now, here’s the cool part: pointers are the unsung heroes behind array indexing, making structs work, and a bunch of other cool stuff in C. So, understanding pointers is a key skill for leveling up your C coding game. Let’s dive into the basics and demystify the magic of pointers

A pointer in C is a variable that stores the memory address of another variable.

#include <stdio.h>
int main(void) {
  int a = 5;
  int *b = &a;

  printf("%p\n", (void *)&a); // 0x7ffd5151c4dc
  printf("%p\n", (void *)b);  // 0x7ffd5151c4dc
  return 0;
}

In this example, we declare an integer variable a, and an integer pointer b is used to store the address of variable a, which turns out to be 0x7f…dc. Now, when you add the * symbol in front of the pointer variable (*b), it acts as a unary pointer operator. This operation dereferences the pointer, meaning it retrieves the value stored at the memory address that b points to. In simpler terms, *b gives us the actual value stored in variable a. It’s like saying ‘take me to the value at the address pointed by b.

#include <stdio.h>
int main(void) {
  int a = 5;
  int *b = &a;

  printf("%p\n", (void *)&a); // 0x7ffd5151c4dc
  printf("%p\n", (void *)b);  // 0x7ffd5151c4dc

  printf("%d is the value of A\n", *&a); // 5
  printf("%d is the value of A", *b);  // 5 
  return 0;
}

Here we you can see that Both will print 5 what happen here is * in-front of &a will simple dereferences the pointer which simply points to the value at &a which is 5 and same happen with *b where it will points to value at b which is just the address of a itself hence it will also print value at address of a.

Double Pointers

Double pointer is a type of pointer which will point to another pointer which will point to the address of some value here is a example of creating a double pointer

#include <stdio.h>

int main(void) {
  int a = 5;
  int *b = &a;
  int **c = &b;

  printf("%p\n", (void *)&a); // 0x7ffd5151c4dc
  printf("%p\n", (void *)b);  // 0x7ffd5151c4dc
  printf("%p\n", (void *)*c); // 0x7ffd5151c4dc

  printf("%d is the value of A\n", *&a); // 5
  printf("%d is the value of A\n", *b);  // 5
  printf("%d is the value of A\n", **c); // 5

  printf("%d\n", (&a == b) && (b == *c)); // 1
  return 0;
}

In this example, we introduced another pointer c, which points to the address of b. When we dereference **c, it essentially points to the value at *c, which is the value at the address of b. Now, what resides at the address of b is, once again, the address of variable a. Therefore, **c will simply print the value of a, which is 5. It might take a few minutes to understand this run this code you its much easier to understand this.

Arrays and 2D Arrays

Now, let’s dive into how pointers work with arrays and indexing. In C, when you make a static array, the array’s name works like a pointer. It holds the address of the first element in the array. So, if you print the address of the array, it’s the same as printing the address of the first element in that array.

Similarly if you want to access the subsequent elements of the array you can use the index operator which is same as using *(arr+(index)).

#include <stdio.h>

int main(void) {
  int arr[5] = {1, 2, 3, 4, 5};
  int *ptr = arr;

  printf("%p \n", (void *)&arr);    // 0x7fffa77c7890
  printf("%p \n", (void *)&arr[0]); // 0x7fffa77c7890
  printf("%p \n", (void *)ptr);     // 0x7fffa77c7890

  printf("%d\n", arr[2]);     // 3
  printf("%d\n", *(ptr + 2)); // 3
  return 0;
}

In this example, we crafted an array named arr. Now, arr acts as a pointer, pointing to the first element of the array. Following that, we created another pointer, ptr, which also references the first element of arr.

To access elements in the array, we use the * pointer, dereferencing it to print the value at the address. For instance, *(ptr) simply prints the value at the address of ptr, which, in essence, is the address of the first element of the array. Therefore, it prints the value 1.

Two dimensional arrays are a bit different but under the hood they are also stored as a one dimensional array.


#include <stdio.h>

int main(void) {

  int arr[3][3] = {
      {1, 2, 3},
      {4, 5, 6},
      {7, 8, 9},
  };

  int *ptr = arr;

  printf("%p\n", (void *)&arr);       // 0x7fff37359240
  printf("%p\n", (void *)&arr[0][0]); // 0x7fff37359240
                                      //
  // printf("%d\n", *(*(arr + {rowindx}) + {colindx})); //
  printf("%d\n", *(*(arr + 1) + 1));   // [1][1] => 5
  printf("%d\n", *(*(arr) + 4));      // [0]+[4] = 5 because of the flattened array
  printf("%d\n", *(ptr + 4));         // [1,2,3,4,5] index of 5 is 4 so you can access
                                      // that just by [0][4]
  return 0;
}

In the case of a 2D array like arr[3][3], it’s internally stored as a flattened one-dimensional array, effectively arr[0][8]. This means you can access elements in two different ways.

Using *(arr + 1) + 1: *(arr + 1) gives you the address of the second row (&arr[1][0]). Adding 1 to this address moves to the second element in the second row (&arr[1][1]). Dereferencing this pointer gives you the value at arr[1][1], which is 2. Using ((arr) + 4):

((arr) + 4) can be broken down to *(&arr[0][0] + 4). &arr[0][0] + 4 moves to the fifth element in the flattened array. Dereferencing this pointer gives you the value at arr[0][4], which is also 5.

#include <stdio.h>

int main(void) {

  int arr[3][3] = {
      {1, 2, 3},
      {4, 5, 6},
      {7, 8, 9},
  }; // [1,2,3,4,5,6,7,8,9] internally

  int *ptr = arr;

  printf("%p\n", (void *)&arr);       // 0x7fff37359240
  printf("%p\n", (void *)&arr[0][0]); // 0x7fff37359240
  //
  printf("%p\n", (void *)&arr[1][0]);    // 0x7ffe7027c2bc
  printf("%p\n", (void *)&(*(arr + 1))); // 0x7ffe7027c2bc

  printf("%p\n", (void *)&arr[1][1]);           // 0x7ffd641bd370
  printf("%p\n", (void *)&(*(*(arr + 1) + 1))); // 0x7ffd641bd370
                         
  return 0;
}

Magic of Structs

In the context of structs, we’ve already explored how pointers function with arrays. Unlike arrays, structs don’t have indices, but the logic of pointers still applies. When we create a struct and an object referencing that struct, it points to the first member of the struct. This implies that the address of the struct is the same as the address of its first member.

However, the size of the struct is determined by the sum of the sizes of all variables declared within it. The order of declaring variables also influences the size of the struct. The compiler aims to arrange elements in memory to minimize CPU cycles for access. To achieve this, compilers often introduce padding to structs, which might result in some wasted space but optimizes CPU cycles, a valuable trade-off.

The size of the struct Store is 8 bytes. The first byte is allocated for char a, the second byte for char b, and the remaining 4 bytes are allocated for int one. However, to meet alignment requirements, 2 bytes of padding are inserted after char b, resulting in a total size of 8 bytes. This is because of the order of variables if you declare the int x before char b the size gets affected.


#include <stdio.h>

struct Store {
  char a;
  char b;
  int one;
};
int main(void) {
  struct Store s;

  printf("%zu\n", sizeof(s));             // 8 Bytes
  printf("%lu\n", (unsigned long)&s.a);   // 140735585721296
  printf("%lu\n", (unsigned long)&s.b);   // 140735585721297
  printf("%lu\n", (unsigned long)&s.one); // 140735585721300
  return 0;
}

#include <stdio.h>

struct Store {
  char a;
  int one;
  char b;
};

int main(void) {
  struct Store s;

  printf("%zu\n", sizeof(s));             // 12
  printf("%lu\n", (unsigned long)&s.a);   // 140725906795260 (1)
  printf("%lu\n", (unsigned long)&s.b);   // 140725906795268 (3)
  printf("%lu\n", (unsigned long)&s.one); // 140735585721364 (2)
  return 0;
}

Here changing the order of variable will impact the size of the struct because of the padding added by the compiler. There is a padding of 3 Bytes from 61…63 and another 3 Bytes from 69..72

Now comeback to access struct members with pointers

#include <stdio.h>
struct Store {
  char a;
  int one;
  char b;
};
int main(void) {
  struct Store s = {'A', 1, 'B'};
  struct Store *p = &s;
  printf("%p\n", (void *)&s);   // 0x7ffd1da1d45c
  printf("%p\n", (void *)&s.a); // 0x7ffd1da1d45c
  printf("%p\n", (void *)p);    // 0x7ffd1da1d45c

  // Accessing Elements
  printf("%c\n", p->a);   // A
  printf("%c\n", (*p).a); // A
  printf("%c\n", s.a);    // A
  return 0;
}

Here the (*p).a is equal to p->a

Wrap things with Unions

Unions are similar to structs, but the difference lies in calculating the size of unions. While the size of a struct is the sum of the sizes of all variables within that struct, including any required padding, the size of a union is determined by the size of its largest member. This means that the size of the following union will be only 4 bytes, even though it contains variables of varying data types


#include <math.h>
#include <stdio.h>

union Variant {
  int a;
  char b;
  double c;
};

int main(void) {
  union Variant v;
  printf("%zu \n", sizeof(v)); // 8 only the size of double

  v.a = 12;
  printf("%d\n", v.a); // 12

  v.c = 149.3;
  printf("%f\n", v.c); // 149.3.

  v.b = 'B';
  printf("%c\n", v.b); // B

  return 0;
}

Well Everything works as expected right, yes but if you print a again now you the the magic of unions because when we set v.c = 149 it will operate on the same space occupied by v.a which the free memory of v.c is still available for v.a


#include <stdio.h>

union Variant {
  int a;
  char b;
  double c;
};

int main(void) {
  union Variant v;
  printf("%zu \n", sizeof(v)); // 8 only the size of double

  v.a = 12;
  printf("%d\n", v.a); // 12

  v.c = 149.3;
  printf("%f\n", v.c); // 149.3.

  v.b = 'B';
  printf("%c\n", v.b); // B

  printf("%d\n", v.a); //-1717987006 over writed here
  printf("%f\n", v.c); // 149.300000
  v.a = 1234;
  printf("%d\n", v.a); // 1234
  printf("%f\n", v.c); // 149.299927
  printf("%c\n", v.a); // ""
  return 0;
}

Here you can see the values are messed up because the union values will be over write by upcoming operations. I added the meson.build file here as well.

project('c-programming', 'c',
  version : '0.1',
  default_options : ['warning_level=3'])

exe = executable('c_programming', 'c_programming.c',
  install : true)

exe1 = executable('array', 'array.c',
  install : true)

exe2 = executable('struct', './structs.c',
  install : true)

exe3 = executable('union', './union.c',
  install : true)
test('basic', exe)

That’s all folks,