Personal logo. White text reads "VM" on purple background with circuit board traces

File Explorer

1/2025 - 3/2025

Individual project for CSE 333 Systems Programming course, linking a web server to an inverted index of a file directory

Written by: Violet Monserate

Homepage for 333gle: web interface for file explorer. The query is currently 'hello world' and shows a couple links related to the query hello world


Key Tools

In order to ensure that my code was working, I utilized 2 debugging tools:

  • GDB: This allowed me to go instruction by instruction and inspect the state of the program, which was especially useful when tracking “segfaults”
  • Valgrind: This allowed me to see how memory was being utilized by our program, and ensure that we were allocating and freeing memory correctly.

Homework 1: C Data Structures Implementation

Overview

Implemented two fundamental C data structures from scratch:

  • Doubly-linked list with iterator support
  • Chained hash table with dynamic resizing

Key Features

  • Generic payload support for storing arbitrary data types
  • Memory management with proper malloc/free handling
  • Iterator abstractions for safe data structure traversal
  • Robust error handling using Verify333 assertions

Technical Implementation

  • LinkedList: Managed head/tail pointers with node splicing logic
  • HashTable: Used FNV hashing with separate chaining collision resolution
  • Memory safety: Comprehensive Valgrind testing for leaks and errors
  • Code quality: Followed Google C++ style guide with cpplint validation

Homework 2: In-Memory Search Engine

Overview

Built a file system crawler, indexer, and query processor using HW1 data structures.

Components Implemented

Part A: File Parser

  • Text file ingestion with memory-efficient string handling
  • Word parsing using alphabetic character separation
  • Position tracking with byte offset recording
  • Case normalization converting all words to lowercase

Part B: Crawler and Indexer

  • Recursive directory traversal with document ID assignment
  • Inverted index construction mapping words → documents → positions
  • Document table management for filename ↔ docID bidirectional lookup

Part C: Query Processor

  • Multi-word query processing with result intersection
  • Ranking algorithm based on term frequency summation
  • Interactive shell with console-based user interface

Data Structures Used

  • Document table: Dual hash tables for bidirectional lookup
  • Inverted index: Nested hash tables (word → docID → positions)
  • Position tracking: Linked lists maintaining sorted offsets

Homework 3: Disk-Based Search Engine

Overview

Extended HW2 search engine to persistent storage with architecture-neutral file format.

Components Implemented

Part A: Index Marshaller

  • Big-endian serialization for cross-platform compatibility
  • Complex file format with header, doctable, and index regions
  • Checksum validation for data integrity verification
  • Hierarchical data storage maintaining in-memory structure relationships

Part B: Index Reader

  • C++ class hierarchy for file-based data structure access
  • Efficient lookup algorithms for query processing
  • Memory-mapped style access without full file loading

Part C: Multi-Index Search Shell

  • Multiple index file support for distributed searching
  • Rank aggregation across multiple corpora
  • Interactive query interface with result merging

File Format Features

  • Magic number identification (0xCAFEF00D)
  • Embedded hash tables with bucket chaining
  • Variable-length string storage
  • Position list compression and sorting

Final Project: Web Server Security & Session Management

Security Features Implemented

Session Management

  • Secure cookie generation with session tracking
  • HMAC-SHA256 protection against cookie tampering
  • Session validation with cryptographic verification

Authentication System

  • Login page with credential processing
  • Admin cookie minting for privileged access
  • Plaintext authentication (noted as potential security concern)

Access Control

  • Admin-only routes (/quitquitquit endpoint protection)
  • Protected file access for $(BASE_DIR)/admin contents
  • Role-based authorization using session cookies

Administrative Features

  • Server logging of client requests and activities
  • Admin dashboard with system overview
  • Navigation system with role-appropriate links

Technical Implementation Details

  • Cookie security: HMAC verification prevents unauthorized modifications
  • Access enforcement: Session validation on protected endpoints
  • User experience: Seamless navigation between public and admin areas
  • Monitoring: Comprehensive request logging for administrative oversight