Razorback
NewsProjectsGuidesResourcesContact
 Archive Quick Links


An Improvised Threadripper, XingMPEG Woes Resolved

January 18, 2023 at 12:39 AM
Category: Hardware

How many people have looked at a 64-core Threadripper and said "gee, I wish I had that"? The potential for handling heavy loads is practically limitless with something like that, but with its price being far out of reach for the majority of consumers, how is one supposed to attain that level of computing power? Or... this should be asked first: what would it even be used for?

I've continued to favor AMD mainly because they seem to be the ones emphasizing more threads, which I just so happen to have a need for in various situations. As things stand now, the Ryzen 9 5950X is still the best CPU to get at this time if you're wanting to assemble a powerful workstation without absolutely obliterating your wallet. I happened to acquire a second one late last year for my secondary workstation primarily to improve speeds in first-stage lossless encodes for the Razorback incarnations of Hardcore Windows. Some of the more intensive parts of Sunfish definitely gave it a workout.

Clustering with GNU Parallel

But being such a powerful CPU, it would be a shame to leave it separate from the one in my primary workstation. As a full-time Linux user, I already had some powerful tools on hand, including custom Bash scripts to ease the process of publishing videos, as well as GNU Parallel, a program that basically takes many commands or arguments from some input source and executes them in parallel, as the name implies. It's a surefire way to make much better use of your CPU, especially when the commands you'd be running, say, for image conversion, may only take up one thread at a time.

I often try to provide video clips in an old codec like Cinepak wherever possible for the sake of greater accessibility on older hardware, but the Cinepak encoder found in FFmpeg is single-threaded, and it is also SLOOOOOW! So, imagine encoding, say, 10 or 20 of these things in a list... that's an absolute nightmare of a wait if I've ever seen one, and tons of wasted potential.

But by running these encodes in parallel, as much of my CPU as is available is used up, and I'm able to save myself entire minutes getting these clips out the door. This became immediately apparent as soon as I started implementing GNU Parallel to replace some for loops in one of my scripts.

So, how can two thread-heavy CPUs be used together? Dual socket motherboards have been increasingly falling out of favor outside of servers, so there's no way I'd ever see one show up. But you don't necessarily need a bunch of cores to be closely packed together, for it turns out GNU Parallel has a sort of clustering capability, as in the ability to combine two or more computers together to run even more jobs at once.

Primary workstation with a liquid cooler

After I had finished creating all the lossless UtVideo AVIs from Premiere Pro on the secondary workstation (at least prior to making last-minute revisions as the series was being rolled out here), I had 29 baseline MP4 files I needed to create, plus plenty more for reduced resolutions. I went by this guide to try to get something set up which didn't require the use of the --transfer option, applying a few small tweaks of my own to get it working for me.

While it is true that FFmpeg is already multithreaded in most typical cases, it may not entirely use up your CPU depending on what it's processing, so it may be worth bumping up the parallel job count per machine if you see you're not saturating yours; 2-4 might be the sweet spot for 16-core CPUs.

Secondary workstation using an air cooler instead, free of RGB annoyances

When booting some Linux environment off of PXE to create the second node, I had in effect made myself something close to a 32-core Threadripper Pro 5975WX for a lower upfront cost! It saved me a ton of time with getting the videos encoded, but I found myself having trouble with sshfs-ing the local machine onto itself. I ended up not returning to clustering after getting the bulk of the work finished, at least until I came around to writing this article.

I've been really wanting to show how this works for a while now. Rather than trying to recreate the MP4s temporarily, I figured I'd try something else - creating MPEG-1 clips out of the videos for old computers to be able to play back. I changed up the methodology this time around; rather than having a "cluster mode" shell activated, I'd create a quick and dirty script of my own that takes a list of commands, prepares all the remote machines with sshfs mounts, executes Parallel with the command list using the cluster profile, then immediately unmounts on all the applicable machines.

#!/usr/bin/env bash

if [ -z "$1" ]; then
	echo "A parallel command file must be specified \
	(optionally followed by job count)"
	exit
fi

master="$(hostname)"
crypt="arcfour"

ln -sf / $HOME/parallel_cluster

for host in $(grep -v ":" "${HOME}/.parallel/sshloginfile"); do
	ssh -t $host "nohup sshfs -o cipher=$crypt \
	-o follow_symlinks \
	$master:/ parallel_cluster" > /dev/null
done

# Technically this could be close to a one-liner;
# I just haven't bothered doing so.
if [ -z $2 ]; then
	jobs=0
else
	jobs=$2
fi

curd=$(pwd)
cd $HOME/parallel_cluster$curd
parallel -Jcluster -j$jobs \
--workdir $HOME/parallel_cluster$curd :::: "$1"

for host in $(grep -v ":" "$HOME/.parallel/sshloginfile"); do
	ssh -t $host "fusermount -u parallel_cluster"
done

rm $HOME/parallel_cluster

The key difference here is that rather than using sshfs on the master machine to itself, I create a temporary symbolic link, and I also replace its hostname in sshloginfile with : to represent the local machine. The script is executed with argument 1 being the command list file, and 2 being the number of jobs to run at once (default is as many as possible).

Of course, then I would actually need to generate the command list... I basically ran some cheap method to divide each video's duration by a certain number that grows as the video does. It's not great, but it works enough to keep individual video clips small enough. Some don't fit under the 32MB threshold that causes IE3 in Windows 3.1x to freeze in its download, but I didn't want the segment count to become astronomically large for some of them. Oh well.

Terminals showing progress of Parallel, CPU usage, and files created

Sure enough, both machines are receiving work to do. There seems to be a bit of an uneven load between the two, but the entire list of 227 generated FFmpeg commands takes somewhere under twice as fast to complete with two 5950X CPUs; 27 minutes with the two, and 44 with just one. Both trials were running 6 jobs at once.

I strongly advise you write to an SSD when running batch processes like these in parallel, as you are absolutely going to add more random seeks the more jobs you run at once. If you can't do that for any reason, like space constraints or you're experimenting with your settings and don't want to wear it down, you could also try using a RAM disk. If you can use it, ECC will help ensure data integrity as you run through the procedure; unfortunately it is still not entirely prevalent in all of the serious consumer platforms quite yet.

I Got My MPEGs Working with XingMPEG!

XingMPEG playing Hardcore Windows 95 in Windows for Workgroups 3.11

From the beginning, I had hopes of supplying videos compatible with older platforms alongside the MP4 files in the Hardcore Windows series; that's the initial reason why the segments were converted to 4:3 in the first place. While making AVI files in Cinepak or Microsoft Video 1 is pretty straightforward with FFmpeg, I have had more trouble with MPEG-1 because of some things I hadn't really caught onto until now, quirks which cause some ancient MPEG players to flip out or lag severely.

While a modern video player would've taken the MPEG files I had up there previously just fine, the ones I wanted to target did not. As I started looking over more options and experimenting with what works, this is what I found:

  • XingMPEG Player expects you to explicitly define bitrates upfront when encoding your videos from somewhere; setting a quality level as is does not play nice with the program. Not specifying a bit rate can lead to a bizarre effect where if you open the info box, the data rate for the video will be obscenely high, far more than something like TEST.MPG would be.
  • ActiveMovie from Internet Explorer 3 insists on your MPEG clip having 44KHz audio, otherwise it will display "WARNING" and freak out.
  • Windows Media Player 6 may be more lenient, but it, too, will have trouble navigating a video without an explicitly defined bitrate. If you try to seek somewhere, it'll take up a ton of memory and take forever to reach the landing point.

At least now that I have that worked out, MPEGs should finally be watchable from within Windows 3.1x. I've added both the Windows 3.1x and Windows 95 versions of XingMPEG into my mirrors page. I've also updated my guide on encoding videos for old computers to incorporate the MPEG-1 format.

ActiveMovie in Windows 95 playing Hardcore Windows 95

Half of Hardcore Windows is now available in this MPEG format, and the older MPEG videos have been reencoded so that they should cooperate with old video players a lot better.


Comments

UnlistedSlacker - February 01, 2023 at 03:10 PM

Will soon try out 3955wx on wrx80 under Win 7 wing.
Octal memory! If everything goes ok, later on after price drop will upgrade to 5995wx.

Xnmcv - January 21, 2023 at 12:16 AM

@Zdrmonster: It depends on the number of threads focused on Mpeg-1 in contrast to/comparison with those threads on Cinepak. If there's more for the former, then Ffmpeg processes that faster than it does to the latter. The inverse is also true when more are dedicated to Cinepak than Mpeg-1.

@re9177: For a 7995, probably around the time of its release will people do that, but benchmark results for a 5000 series threadripper are still already publicly released.

Zdrmonster - January 19, 2023 at 05:36 PM

Is FFMPEG's MPEG encoder faster than the cinepak one?

Xnmcv - January 18, 2023 at 09:37 PM

I assume the 96-core one will cost around just as much as (if not more than) the 5995WX.

re9177 - January 18, 2023 at 05:40 PM

greetz, when are we going to be able to see some povray tests + chromium compile benchmark results? :))) cheers

flatrute - January 18, 2023 at 07:41 AM

Xnmcv: ...but how much does it cost?

Xnmcv - January 18, 2023 at 05:07 AM

Just when I read this article (which says of a 64-core Threadripper), I found out that AMD's preparing to make a 96-core Threadripper in around September 2023! Now even more possibilities emerge!

Kugee (real) - January 18, 2023 at 01:09 AM

Huh, already got half of last year's article count done here in a single month. Writing this one from Windows for Workgroups 3.11 on my Abit BP6 machine, went ahead and made a few additional refinements to make this site look a tad bit more consistent in IE3 with later browsers. Its CSS implementation is very rocky.

8 comments on this page

Sort: Ascending | Descending

Leave a Comment

Name: (required)

Website: (optional)

Maximum comment length is 1000 characters.
First time? Read the guidelines

SORRY NOT GONNA SHOW THIS IN TEXT BROWSER
Enter the text shown in the image: