Introduction

To assess whether our database replication system would be able to handle the large volumes of real-world data, we decided to put it to the test. According to statistics provided by URBS, for the year of 2020, the number of active transport cards is 2,035,711, and the average number of passengers transported per day is 710,589. For the year before the pandemic (2019) the the average number of passengers transported per day was 1,331,610.

The test

Our setup uses a PostgreSQL database server and we have it locally cached on the Raspberry Pi using SQLite. SQLite has many options that increase its performance at the expense of data reliability. We chose not to use any of these options in the test, thus guaranteeing all ACID properties. The only special configuration made was to increase the amount of memory SQLite can use for caching.

Furthermore, it is in WAL (Write-Ahead Logging) mode, so that the database is never blocked, allowing the turnstile to continue operating normally while the database is updated.

Our local database users table has four columns: the user id, the NFC card id, a flag that indicates if the user is blocked and the amount of his balance. We have another table that stores the history of updates and a table that stores payments made if there is no internet to send them to the database server.

The code used in the test can be seen below:

import psycopg2
import sqlite3
from os import path
import datetime
import time

# database credentials
REMOTE_DB = {
    'database': '',
    'user': '',
    'password': '',
    'host': '',
    'port': ''
}
LOCAL_DB = path.abspath('benchmark_db/benchmark_db.db')

# local connection
conn_local = sqlite3.connect(LOCAL_DB)
cur_local = conn_local.cursor()
cur_local.execute('PRAGMA cache_size = 1000000;');

# server connection
conn_server = psycopg2.connect(**REMOTE_DB)
cur_server = conn_server.cursor()
local_tz = datetime.datetime.now(datetime.timezone.utc).astimezone().tzname()
cur_server.execute('SET TIME ZONE %s;', (local_tz,))

# start the timer
start = time.time()
print('Starting...')
               
# get local database last update time
cur_local.execute('select updated_at from update_history order by id desc limit 1;')
last_update_time = cur_local.fetchone()
print(f'Last update time: {last_update_time}')

# get server last update time
cur_server.execute('select max(updated_at) from users;')
server_last_update_time = cur_server.fetchone()[0]

# if the local database have been updated some time
if last_update_time:
    # get from server only the rows that have been updated
    cur_server.execute('select rfid, id, balance::float, blocked from users where updated_at > %s and rfid is not null order by rfid;', last_update_time)
else:
    # get all the rows
    cur_server.execute('select rfid, id, balance::float, blocked from users where rfid is not null order by rfid;')

rows_affected = cur_server.rowcount
print(f'Rows affected: {rows_affected}')

# store locally
while True:
    # fetch bunches of 100.000 rows
    res = cur_server.fetchmany(100000)
    
    if not res:
        break
    
    cur_local.executemany("""
        insert into users(rfid, user_id, balance, blocked) values(?,?,?,?)
        on conflict(rfid) do update
        set balance = excluded.balance, blocked = excluded.blocked;
    """, res)

conn_local.commit()

# stop the timer
end = time.time()
time_taken = round(end - start, 2)
print(f'End. Took {time_taken}s')

# insert the results into the log table
cur_local.execute('insert into update_history (rows_affected, time_taken, updated_at) values(?,?,?)', (rows_affected, time_taken, server_last_update_time));
conn_local.commit()

# close the connections
conn_local.close()
conn_server.close()

If our local database was never updated, we download the entire table from the server. However, if it already has some data, we download only the lines that underwent some update.

So, extrapolating the amount of ~2 million active URBS cards, we populated our server with 10 million users to perform the test, and later performed several UPDATES on the table. We measured the time spent on each of these operations.

Results

The initial full load operation of 10 million rows from PostgreSQL server to local Raspberry Pi SQLite took about 392 seconds, or ~6.5 minutes.

The updates operations time can be seen below:

Untitled

Untitled

Conclusion

The time to update 1M rows in the table with 10M users is ~3.3 minutes.

Considering that our local cache can undergo several small updates a day (every minute, for example), performing the operation only on rows that have had some modification, and with a number of affected lines per day of ~1.3M (according to the URBS statistics in 2019), we show that even for a real-world situation, with a high volume of data, our system could be applicable.

It is also important to remember that the database is not locked during the upgrade, allowing the turnstile to function normally while the upgrade takes place.

The user's query time on the table was not measured, however, as the column is a primary key and, therefore, it is indexed, it presented an excellent performance. The consultation takes place instantly and is something that does not interfere with the turnstile's flow.