automatically downloading story stats

designatedvictim · May 18, 2025

Part 2

PHP:

/******************************************************************************

    Support functions in the class named 'site'

******************************************************************************/

/******************************************************************************

    Logs into the site and stores the authenticated and sessionid cookie values
    in the local cookiejar file

    RETURNS: VOID

******************************************************************************/
    function doLogin() {

        log_message( 'trace', __METHOD__ . ' ' . __LINE__ );
        $login_url = $this->config->item('leLogin_url');
        $referer = $this->config->item('leReferer');
        $username = $this->config->item('leUsername');
        $password = $this->config->item('lePassword');
        $credentials = $this->config->item('leCcredentials');
        $cookiejarFile = $this->config->item('cookiejarFile');

        $ch = curl_init( $login_url );
        curl_setopt( $ch, CURLOPT_CUSTOMREQUEST, 'POST' );
        $payload = '{"login":"' . $username . '","password":"' . $password . '"}';

        // Really ought to used the encoded $credentials with CURLOPT_USERPWD, this was faster to get working
        curl_setopt( $ch, CURLOPT_POSTFIELDS, $payload );
        curl_setopt( $ch, CURLOPT_HTTPHEADER, array('Content-Type:application/json') );

        curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
        // curl_setopt( $ch, CURLOPT_USERPWD, $credentials );

        curl_setopt( $ch, CURLOPT_SSL_VERIFYHOST, false );
        curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false );
        curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookiejarFile );
        curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookiejarFile );
        curl_setopt( $ch, CURLOPT_REFERER, $referer );

        // curl_setopt( $ch, CURLOPT_CAINFO, getcwd() . '\cacert.pem' );
        // curl_setopt($ch, CURLOPT_VERBOSE, 1);

        $content = curl_exec( $ch );

        log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " content [$content]" );

        $thisData = curl_exec( $ch );
        $thisError = curl_errno( $ch );
        $thisErrorMsg = curl_error( $ch ) ;
        $thisHeader = curl_getinfo( $ch );

        if ( curl_errno( $ch ) ) {
            // echo 'Error:' . curl_error( $ch );
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Error: $thisError" );
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Error Msg: $thisErrorMsg" );
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Error Header:\n" . print_r( $thisHeader, TRUE ) );
        } else {
            $thisSize = 0;
            if ( !empty( $thisData ) ) { $thisSize = strlen( $thisData ); }
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Data Size: $thisSize" );
        }
        curl_close( $ch );

    }


/******************************************************************************

    Connect to the site, passing the authenticated and sessionid cookie values
    through the local cookiejar file, adds the auth_token cookie value

    RETURNS: VOID
    
******************************************************************************/
    function getToken() {

        log_message( 'trace', __METHOD__ . ' ' . __LINE__ );

        $referer = $this->config->item('leReferer');
        $tokenURL = $this->config->item('leTokenURL');
        $cookiejarFile = $this->config->item('cookiejarFile');

        $tokenURL .= time();
        $ch = curl_init( $tokenURL );
        curl_setopt( $ch, CURLOPT_HTTPHEADER, array('accept: application/json') );
        curl_setopt( $ch, CURLOPT_CUSTOMREQUEST, 'GET' );

        curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );

        curl_setopt( $ch, CURLOPT_SSL_VERIFYHOST, false );
        curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false );
        curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookiejarFile );
        curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookiejarFile );
        curl_setopt( $ch, CURLOPT_REFERER, $referer );

        // curl_setopt( $ch, CURLOPT_CAINFO, getcwd() . '\cacert.pem' );
        // curl_setopt($ch, CURLOPT_VERBOSE, 1);

        $thisData = curl_exec( $ch );
        $thisError = curl_errno( $ch );
        $thisErrorMsg = curl_error( $ch ) ;
        $thisHeader  = curl_getinfo( $ch );

        if ( curl_errno( $ch ) ) {
            // echo 'Error:' . curl_error( $ch );
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Error: $thisError" );
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Error Msg: $thisErrorMsg" );
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Error Header:\n" . print_r( $thisHeader, TRUE ) );
        } else {
            $thisSize = 0;
            if ( !empty( $thisData ) ) { $thisSize = strlen( $thisData ); }
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Data Size: $thisSize" );
        }
        curl_close( $ch );

    }


/******************************************************************************

    Connect to the site, passing the authenticated, sessionid, and auth_token
    cookie values through the local cookiejar file

    RETURNS: STRING (CSV data)
    
******************************************************************************/
    function getStoryStats() {

        log_message( 'trace', __METHOD__ . ' ' . __LINE__ );

        $referer = $this->config->item('leReferer');
        $endpoint = $this->config->item('leEndpoint');
        $cookiejarFile = $this->config->item('cookiejarFile');

        log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " referer: $referer" );
        log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " endpoint: $endpoint" );

        $ch = curl_init( $endpoint );
        curl_setopt( $ch, CURLOPT_CUSTOMREQUEST, 'GET' );
        curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );

        curl_setopt( $ch, CURLOPT_SSL_VERIFYHOST, false );
        curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false );
        curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookiejarFile );
        curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookiejarFile );
        curl_setopt( $ch, CURLOPT_REFERER, $referer );
        // curl_setopt( $ch, CURLOPT_CAINFO, getcwd() . '\cacert.pem' );
        // curl_setopt($ch, CURLOPT_VERBOSE, 1);

        $thisData = curl_exec( $ch );
        $thisError = curl_errno( $ch );
        $thisErrorMsg = curl_error( $ch ) ;
        $thisHeader = curl_getinfo( $ch );

        if ( curl_errno( $ch ) ) {
            // echo 'Error:' . curl_error( $ch );
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Error: $thisError" );
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Error Msg: $thisErrorMsg" );
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Error Header:\n" . print_r( $thisHeader, TRUE ) );
        } else {
            $thisSize = 0;
            if ( !empty( $thisData ) ) { $thisSize = strlen( $thisData ); }
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " Data Size: $thisSize" );
        }
        curl_close( $ch );

        return ( $thisData );

    }

designatedvictim · May 18, 2025

Part 3

PHP:

/******************************************************************************

    Read configured active fileStatsDir for file names which start with the
    string $statsFilePrefix

    If $includeArchive FALSE, only read the active fileStatsDir so source
    files can be moved to inactive storage where older files can be ignored

    If $includeArchive TRUE, extend the list to include the contents of the
    fileStatsArchiveDir folder, as well, attempting to re-import EVERYTHING

    RETURNS: ARRAY string list of file names, no paths
    
******************************************************************************/
    function getStatsFilesList( $includeArchive = FALSE ) {

        $fileStatsDir = $this->config->item('fileStatsDir');
        $fileStatsArchiveDir = $this->config->item('fileStatsArchiveDir');
        $statsFilePrefix = $this->config->item('statsFilePrefix');

        log_message( 'trace', __METHOD__ . ' ' . __LINE__ . "" );
        $files = scandir( $fileStatsDir );
        $fileList = [];
        foreach( $files as $thisFile ) {
            // log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " primary dir [$fileStatsDir]" );
            $pieces = explode( '_', $thisFile );
            if ( !empty( $pieces[1] ) ) {
                if ( str_starts_with( $pieces[0], $statsFilePrefix ) ) {
                    $fileList[] = $thisFile;
                }
            }
        }

        if ( $includeArchive ) {
            $files = scandir( $fileStatsArchiveDir );
            foreach( $files as $thisFile ) {
                // log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " secondary dir [$fileStatsArchiveDir]" );
                $pieces = explode( '_', $thisFile );
                if ( !empty( $pieces[1] ) ) {
                    if ( str_starts_with( $pieces[0], $statsFilePrefix ) ) {
                        $fileList[] = $thisFile;
                    }
                }
            }
        }
        asort( $fileList );
        return $fileList;

    }


/******************************************************************************

    Traverse file list, searching the fileStatsDir folder with failover to the
    fileStatsArchiveDir folder

    Save data to database

    RETURNS: VOID
    
******************************************************************************/
    function importStoryStats( $fileList, $recordsCount, $deltasUpdated ) {

        log_message( 'trace', __METHOD__ . ' ' . __LINE__ );

        $fileStatsDir = $this->config->item('fileStatsDir');
        $fileStatsArchiveDir = $this->config->item('fileStatsArchiveDir');
        $statsFilePrefix = $this->config->item('statsFilePrefix');
        $dbName = $this->config->item('dbName');
        $tableName = $this->config->item('tableName');

        $loopPadding = '0000000';

        $insertCount = 0;
        $loop = 0;
        foreach( $fileList as $thisFile ) {
            $filePath = $fileStatsDir . $thisFile;
            if ( !file_exists( $filePath ) ) {
                $filePath = $fileStatsArchiveDir . $thisFile; // it can only be in one of two places
            }
            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " filePath [$filePath]" );
            $pathParts = pathinfo( $filePath );
            $pieces = explode( '_', $pathParts['filename'] );
            $fileTime = $pieces[1];
            $fileTimePieces = explode( ' ', $fileTime );
            $thisFileDateStamp = trim( $fileTimePieces[0] );
            $thisFileTimeStamp = trim( $fileTimePieces[1] );
            $fileTimeParts = explode( '-', $thisFileTimeStamp );
            $fileHours = $fileTimeParts[0];
            $fileMinutes = $fileTimeParts[1];
            $fileSeconds = $fileTimeParts[2];

            // Read files into memory
            if ( $pathParts['extension'] == 'csv' ) {
                if ( ( $handle = fopen( $filePath, "r" ) ) !== FALSE ) {
                    $csvData = [];
                    $csvHeaders = [];
                    $row = 1;
                    while ( ( $csvLine = fgetcsv( $handle, 1000, "," ) ) !== FALSE ) {
                        if ( $row == 1 ) {
                            $idx = 0;
                            foreach( $csvLine as $csvValue ) {
                                $csvHeaders[ $idx ] = $csvValue;
                                $idx++;
                            }
                        } else {
                            $idx = 0;
                            $thisLine = [];
                            foreach( $csvLine as $csvValue ) {
                                $thisLine[ $csvHeaders[ $idx ] ] = $csvValue;
                                $idx++;
                            }
                            $thisLine['sourceFile'] = $thisFile;
                            $thisLine['fileDate'] = $thisFileDateStamp;
                            $thisLine['fileTime'] = $thisFileTimeStamp;
                            $thisLine['fileHour'] = $fileHours;
                            $thisLine['fileMinute'] = $fileMinutes;
                            $thisLine['fileSecond'] = $fileSeconds;
                            $csvData[] = $thisLine;
                        }
                        $row++;
                    }
                    fclose( $handle );
                    $loop++;
                }

                // Write data to DB
                foreach( $csvData as $thisStat ) {
                    // print_r( $thisStat );
                    // exit;
                    $thisStatKey = $thisStat['sourceFile'] . ' ' . $thisStat['Name'];

                    $sql = "SELECT COUNT(*) AS cnt FROM $tableName WHERE CONCAT_WS( ' ', sourceFile, storyName ) = '$thisStatKey'";
                    $query = $this->db->query( $sql );
                    $row = $query->row();

                    if ( empty( $row ) ) {
                        die( print_r( sqlsrv_errors(), true) );
                    } else {
                        // $row = sqlsrv_fetch_object( $results );
                        if ( ( $row->cnt ?? 0 ) == 0 ) {
                            $msg = "[" . substr( $loopPadding . $loop, -7 ) . "] Importing source [{$thisStat['sourceFile']}] story [{$thisStat['Name']}]";
                            log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " $msg" );

                            $calculatedRateExtended = trim( $thisStat['Rate'] ) * (int) trim( $thisStat['Votes count'] );
                            $calculatedRateExtendedRounded = round( $calculatedRateExtended, 0, PHP_ROUND_HALF_UP );
                            $tmpcalculatedRate = 0;
                            if ( !empty( $thisStat['Votes count'] ) ) {
                                $tmpcalculatedRate = ( $calculatedRateExtendedRounded / (int) trim( $thisStat['Votes count'] ) );
                            }
                            $calculatedRate = number_format( $tmpcalculatedRate, 2 );

                            $favorites = $thisStat['Favorites'] ?? 0;
                            $sqlins = "INSERT INTO myStories
                                ( sourceFile, fileDate, fileTime, fileHour, fileMinute, fileSecond, storyName, storyCategory, datePublished, rate, viewCount, votesCount, comments, favorites, readingLists, calculatedRateExtended, calculatedRateExtendedRounded, calculatedRate )
                                VALUES
                                (
                                    '{$thisStat['sourceFile']}',

                                    '{$thisStat['fileDate']}',
                                    '{$thisStat['fileTime']}',
                                    '{$thisStat['fileHour']}',
                                    '{$thisStat['fileMinute']}',
                                    '{$thisStat['fileSecond']}',

                                    '{$thisStat['Name']}',
                                    '{$thisStat['Category']}',
                                    '{$thisStat['Date Published']}',
                                    '{$thisStat['Rate']}',
                                    '{$thisStat['View count']}',
                                    '{$thisStat['Votes count']}',
                                    '{$thisStat['Comments']}',
                                    '{$favorites}',
                                    '{$thisStat['Reading lists']}',

                                    '$calculatedRateExtended',
                                    '$calculatedRateExtendedRounded',
                                    '$calculatedRate'

                                )";
                            // log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " $sqlins" );
                            $query = $this->db->query( $sqlins );
                        } else {
                            $msg = "[" . substr( $loopPadding . $loop, -7 ) . "] Skipping [{$thisStat['sourceFile']}] story [{$thisStat['Name']}]";
                        }
                        // log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " $msg" );
                    }
                }
            }
        }

    }

designatedvictim · May 18, 2025

Part 4

PHP:

/******************************************************************************

    getDBData

    Assumes that all source files were recorded with filenames representing
    the time and date of the data download and sortable by time

    Limited to the rolling $windowDays window (usually 7 days)

    RETURNS: VOID

******************************************************************************/
    function getDBData( $windowDays ) {

        log_message( 'trace', __METHOD__ . ' ' . __LINE__ );
        $dbName = $this->config->item('dbName');
        $tableName = $this->config->item('tableName');

        $fileDateData = [];

        if ( true ) {
            // https://stackoverflow.com/questions/27599557/how-to-get-last-7-days-data-from-current-datetime-to-last-7-days-in-sql-server - comment on answer

            $whereClause = 'WHERE 1=1 AND ';
            // $whereClause = "WHERE deltasUpdated=0 AND storyName='A Week of Sunrises' AND ";
            // $sort = "storyName DESC, sourceFile DESC";
            $sort = "storyName ASC, sourceFile ASC";

            $sql = "SELECT * FROM $tableName
            $whereClause
            CAST( fileDate AS DATE ) BETWEEN
            (SELECT CAST(DATEADD(day,-$windowDays, GETDATE()) AS DATE))
            AND
            (SELECT CAST(GETDATE() AS DATE))
            ORDER BY $sort";

            $query = $this->db->query( $sql );
            $results = $query->result();

            if( empty( $results ) ) {
                log_message( __METHOD__ . ' ' . __LINE__ . ' Query error retrieving existing stats records.' );
                die( print_r( sqlsrv_errors(), true) );
            } else {
                foreach( $results as $thisRow ) {
                    $thisStoryName = '';
                    $thisFileDate = '';
                    $thisSourceFile = '';
                    foreach( $thisRow as $key=>$value ) {
                        if ( $key == 'storyName' ) {
                            $thisStoryName = $value;
                        }
                        if ( $key == 'fileDate' ) {
                            $thisFileDate = $value;
                        }
                        if ( $key == 'sourceFile' ) {
                            $thisSourceFile = $value;
                        }
                    }
                    foreach( $thisRow as $fieldName=>$fieldValue ) {
                        if ( $fieldName != 'datestamp' ) {
                            $fileDateData[$thisStoryName][$thisSourceFile][ $fieldName ] = $fieldValue;
                        }
                    }
                }
            }
        }
        return $fileDateData;

    }


/******************************************************************************

    Iterate through retrieved records, assumes at least one previously-set
    record per story

    Only updates were source record has deltasUpdated=0

    RETURNS: VOID

******************************************************************************/
    function processDeltas( $dataset ) {

        log_message( 'trace', __METHOD__ . ' ' . __LINE__ );
        $dbName = $this->config->item('dbName');
        $tableName = $this->config->item('tableName');

        $defaultSet = array( 'sourceFile'=>'', 'viewDelta'=>0, 'votesDelta'=>0 , 'commentsDelta'=>0 , 'favoritesDelta'=>0 , 'readingListsDelta'=>0 );

        foreach( $dataset AS $story ) {
            $lastSnapshot = array(); // reset between stories
            foreach( $story AS $snapshot ) {
                // echo print_r( $snapshot, TRUE ) . "\n";
                // exit();
                $deltaCount = $snapshot['deltasUpdated'] + $snapshot['viewDelta'] + $snapshot['votesDelta'] + $snapshot['commentsDelta'] + $snapshot['favoritesDelta'] + $snapshot['readingListsDelta'];
                // log_message( 'trace', __METHOD__ . ' ' . __LINE__ . " deltasUpdated [{$snapshot['deltasUpdated']}]" );

                if ( empty( $snapshot['deltasUpdated'] ) ) { // only update for records with no deltas - new/not updated/no changes
                    if ( empty( $lastSnapshot ) ) {
                        $viewDelta = $snapshot['viewCount'];
                        $votesDelta = $snapshot['votesCount'];
                        $commentsDelta = $snapshot['comments'];
                        $favoritesDelta = $snapshot['favorites'];
                        $readingListsDelta = $snapshot['readingLists'];
                    } else {
                        $viewDelta = $snapshot['viewCount'] - $lastSnapshot['viewCount'];
                        $votesDelta =  $snapshot['votesCount'] - $lastSnapshot['votesCount'];
                        $commentsDelta = $snapshot['comments'] - $lastSnapshot['comments'];
                        $favoritesDelta = $snapshot['favorites'] - $lastSnapshot['favorites'];
                        $readingListsDelta = $snapshot['readingLists'] - $lastSnapshot['readingLists'];
                    }
                    $msg = __METHOD__ . ' ' . __LINE__ . " Updating stats-deltas for [{$snapshot['sourceFile']}] [{$snapshot['storyName']}]";
                    log_message( 'trace', "$msg" );
                    $sql = "UPDATE $tableName
                    SET viewDelta='$viewDelta', votesDelta='$votesDelta', commentsDelta='$commentsDelta', favoritesDelta='$favoritesDelta', readingListsDelta='$readingListsDelta', deltasUpdated=1
                    WHERE sourceFile='{$snapshot['sourceFile']}' AND storyName='{$snapshot['storyName']}' AND deltasUpdated='0'";

                    $this->db->query( $sql );
                } else {
                    $msg = __METHOD__ . ' ' . __LINE__ . " Skipping stats-deltas for [{$snapshot['sourceFile']}] [{$snapshot['storyName']}]";
                }
                // log_message( 'trace', "$msg" );

                $lastSnapshot = $snapshot;
            }
        }

    }


/******************************************************************************

    How many records in table

    Used to determine when DB has been wiped to trigger global re-import

    RETURNS: INT

******************************************************************************/
    function dbRecordsCount() {

        log_message( 'trace', __METHOD__ . ' ' . __LINE__ );
        $dbName = $this->config->item('dbName');
        $tableName = $this->config->item('tableName');

        $sql = "SELECT COUNT(*) AS cnt FROM $tableName";
        $query = $this->db->query( $sql );
        $row = $query->row();

        return $row->cnt;

    }


/******************************************************************************

    How many records in table where deltasUpdated=1

    Used to determine when DB has been wiped to trigger global re-import
    
    RETURNS: INT

******************************************************************************/
    function dbDeltasUpdatedCount() {

        log_message( 'trace', __METHOD__ . ' ' . __LINE__ );
        $dbName = $this->config->item('dbName');
        $tableName = $this->config->item('tableName');

        $sql = "SELECT COUNT(*) AS cnt FROM $tableName WHERE deltasUpdated=1";
        $query = $this->db->query( $sql );
        $row = $query->row();

        return $row->cnt;

    }


/******************************************************************************

    MS SQL design for the database table I use

    The *Delta fields calcluate the difference between those type *once* on
    INSERT, so I can avoid re-deriving it under most simple uses later on

    Otherwise adjust schema as needed

******************************************************************************/

USE [LitStats]
GO
/****** Object:  Table [dbo].[myStories]    Script Date: 5/18/2025 11:48:29 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[myStories](
    [id] [int] IDENTITY(100000,1) NOT NULL,
    [datestamp] [datetime] NOT NULL,
    [sourceFile] [varchar](63) NULL,
    [fileDate] [varchar](15) NULL,
    [fileTime] [varchar](15) NULL,
    [fileHour] [varchar](7) NULL,
    [fileMinute] [varchar](7) NULL,
    [fileSecond] [varchar](7) NULL,
    [storyName] [varchar](127) NULL,
    [storyCategory] [varchar](63) NULL,
    [datePublished] [varchar](15) NULL,
    [rate] [decimal](5, 2) NULL,
    [viewCount] [int] NULL,
    [votesCount] [int] NULL,
    [comments] [int] NULL,
    [favorites] [int] NULL,
    [readingLists] [int] NULL,
    [deltasUpdated] [int] NULL,
    [viewDelta] [int] NULL,
    [votesDelta] [int] NULL,
    [commentsDelta] [int] NULL,
    [favoritesDelta] [int] NULL,
    [readingListsDelta] [int] NULL,
    [calculatedRateExtended] [float] NULL,
    [calculatedRateExtendedRounded] [int] NULL,
    [calculatedRate] [decimal](5, 2) NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[myStories] ADD  CONSTRAINT [DF_myStories_datestamp]  DEFAULT (getdate()) FOR [datestamp]
GO
ALTER TABLE [dbo].[myStories] ADD  CONSTRAINT [DF_myStories_deltasUpdated]  DEFAULT ((0)) FOR [deltasUpdated]
GO

designatedvictim · May 18, 2025

And... once you get this in the database form you prefer, you can do crap like this:

designatedvictim · May 21, 2025

NotWise said:
I was wondering why we had this difference, aside from the fact that auth tokens from different sources have different expiry. I forgot that my code logs in and out each time it downloads the file. That might be the difference.

Just thought I'd mention it, because it surprised me, after re-reading these comments, but the cookies from my most recent import show a unix timestamp for the authenticated and sessionid cookie values expire in a year.

The auth_token looks like a session cookie (set to 0).

NotWise · May 21, 2025

@designatedvictim, I don't know enough about authentication to see the implications in that. Can you explain what it means?

designatedvictim · May 22, 2025

NotWise said:
@designatedvictim, I don't know enough about authentication to see the implications in that. Can you explain what it means?

I'm no expert, and it seems that you're reasonably knowledgeable on programming and Geekiness, in general, so you probably know most of this already.

This is mainly for any audience that's still awake.

Per your original BASH (thanks, again, for that), there are three primary components to the process - authentication, token, file retrieval.

When my doLogin() function runs, I build the cURL call which passes the login credentials and retrieves the login cookie values from the connection with the server.

This data is dropped by cURL into the defined cookiejar file.

The cookiejar file is a text file comprised of lines representing individual cookies. This document - https://curl.se/docs/http-cookies.html - describes the cookiejar contents.

Each line has seven tab-delimited components: the relevant domain/subdomain to which that line's cookie applies, a boolean flag to apply it to any subdomains, the path in the server tree to apply it to, an HTTPS-only flag, the expiration date in unix-time format, the cookie name, the cookie value.

So upon calling doLogin(), we get the authenticated cookie set to 0/1 and the sessionid, if authenticated, which looks to be a 32-digit hex value (16 bytes) representing the unique id of your authenticated session on the server.

Upon success, the previously empty cookiejar now has two cookies in it.

In my case, I'd checked the cookies set on my last connection made at 8:01PM 5/21/25 EDT. The expiration time on the two cookie values according to https://www.unixtimestamp.com/ is set to Thu May 21 2026 20:01:03 GMT-0400 (Eastern Daylight Time).

So that means the authenticated flag and sessionid are theoretically valid for a year. More on that later.

The next call, to getToken(), I don't pass the credentials again, I simply let the combination of the authenticated cookie value of TRUE and the full sessionid establish that I'm the one who made the earlier successful login attempt. If the server believes me, it sends back the auth_token which cURL then adds to the cookiejar file.

The cookiejar now has three cookies in it.

But the expiration for the auth_token is set to zero, which makes it a session cookie. This cookie is non-persistent, meaning it will only be used for the lifetime of the running app. Usually that's a browser. If you have an extension to view the cookies in your browser, you would see it in there. But if you then close all instances (not just the current tab or window, but all running instances) of the browser running on the computer, then relaunch it and check the cookies again, you would see the long-term authenticated and sessionid still available, but the auth_token should be gone - it was deleted at the end of the final browser instance (more likely at the start of the first new instance of the browser when no others are still running), because then all browser sessions are over.

The next call, to getStoryStats(), that retrieves the the CSV story stats also forgoes any authentication, because you now have all three components the API requires in the cookiejar.

Here's where things get murky.

The sessionid claims to be valid for a year, but on high-traffic sites, sessions that haven't been used may begin to age off and internal maintenance may flush older, not-so-recently-used sessions to make room for new ones. So your session may be valid for a year, but the server may purge it. Your session pointed to by the sessionid cookie may no longer be available on the server, so you'd then need to login again. The timeframe for that is impossible to guess, really.

The shorter-term timeframes discussed in-thread more likely apply to the auth_token. It's deliberately intended to be a transient token. I just re-read your message on the token and you say that your auth_tokens refresh after four hours. I missed how you mentioned that, but since auth_tokens are meant to be short-lived, under four hours is completely realistic.

I ran a quick experiment this afternoon and commented out the doLogin() and getToken() calls and a half-hour after the previous run, all three existing cookies succeeded in downloading the CSV - no new authentication or token required.

This weekend, I might try an experiment and run the script every fifteen minutes or so and see how long it continues to retrieve the file before the auth_token expires from the server-end. Anything from 10-minutes to a day is reasonable, but I now know the auth_token is good for at least 40 minutes.

In my case, I figure it's simpler to just re-authenticate each run, rather than checking to see if the existing cookies still let me connect.

After all this, I have to ask: is that what you were looking for?

This might have been TMI.

Edited to add:

As stated below, I ran a test and it looks like the auth_token expires on the Lit server-end after 60 minutes. It's very probable that the auth_token time-to-live may be reset when you use it on a browser page-load, this script doesn't do that and repeat-uses of the token by my script doesn't extend its life at the server-end, so the site may just re-issue one in the background upon expiration.

If the auth_token used in the third step, getStoryStats(), has expired, you should only need to re-do step two, getToken(), before trying again, due to the longevity of the original authenticated and sessionid cookies. Only if that fails, should you need to go back to step one, doLogin().

In an earlier post I said that the 'magic happens in the cookiejar file.' By that I mean that by virtue of defining the same fixed cookiejar file for all calls, cURL will retrieve the cookies from the site without your code needing to handle it on its own. Also, when you do the two latter calls, cURL will forward the domain-relevant cookies on to the site without you explicitly asking it to.

NotWise · May 22, 2025

That was pretty much what I was looking for. Thanks for the explanation.

iwatchus · May 22, 2025

I thought it was a nice explanation and I occasionally have to teach this shit.

designatedvictim · May 23, 2025

FYI:

I ran the repeat-usage-of-the-old-auth_token test tonight and it looks like the token expires after 60 minutes.

Worked at 59 minutes, not at 74 minutes.

automatically downloading story stats

designatedvictim

Red Shirt

designatedvictim

Red Shirt

designatedvictim

Red Shirt

designatedvictim

Red Shirt

designatedvictim

Red Shirt

NotWise

Desert Rat

designatedvictim

Red Shirt

NotWise

Desert Rat

iwatchus

Virgin

designatedvictim

Red Shirt

Similar threads