Compare commits

..

No commits in common. "a559e92d3e96bd882003b193b1511ee9f2da46c4" and "532fc4ec230e2e9f5866dc46275e0e5f4e42f1a3" have entirely different histories.

11 changed files with 287 additions and 405 deletions

View File

@ -10,7 +10,6 @@
"gnudb",
"newfn",
"RDONLY",
"cdparanoia",
"TTITLE"
]
}

View File

@ -7,17 +7,24 @@ RUN true \
&& sed -i 's/main$/main contrib non-free/' /etc/apt/sources.list \
&& apt-get -y update \
&& DEBIAN_FRONTEND=noninteractive apt-get --no-install-recommends -y install \
ffmpeg \
handbrake-cli libavcodec-extra \
abcde eyed3 \
glyrc setcd eject \
dvdbackup \
libdvd-pkg libdvdcss2 \
handbrake-cli libavcodec-extra \
cd-discid cdparanoia lame \
python3 \
python3-slugify \
cowsay \
&& true
RUN dpkg-reconfigure libdvd-pkg
RUN true \
&& DEBIAN_FRONTEND=noninteractive apt-get --no-install-recommends -y install \
lame \
busybox \
jq \
procps \
moreutils \
cowsay
COPY src/* /app/

View File

@ -11,23 +11,9 @@ and then re-encode the content to a compressed format.
At the time I'm writing this README, it will:
* Rip audio CDs, look them up in cddb, encode them to VBR MP3, then tag them.
* It also writes a shell script you can modify to quickly change the tags, since this is a pretty common thing to want to do.
* ~~Rip audio CDs, look them up in cddb, encode them to VBR MP3, then tag them.~~ A rewrite broke this; I plan to fix it soon.
* Rip video DVDs, transcode them to mkv
## Requirements
The requirements are fairly light: a few CD tools, cdparanoia, HandBrakeCLI, and some
DVD libraries.
Most notably, you do *not* need a relational database (SQLite, Postgres, MySQL).
You just need a file system.
For a complete list of requirements,
look at the [Dockerfile](Dockerfile)
to see what Debian packages it installs.
## How To Run This
You need a place to store your stuff.
@ -41,8 +27,9 @@ Mine is `/srv/ext/incoming`.
-v /srv/ext/incoming:/incoming \
registry.gitlab.com/dartcatcher/media-sucker/media-sucker
I can't get it to work with docker swarm,
which doesn't support `--device`.
I can't get it to work with docker swarm.
Presumably some magic is happening with `--device`.
It probably has something to do with selinux.
Stick a video DVD or audio CD in,
and the drive should spin up for a while,
@ -52,14 +39,9 @@ or a new directory of `.mp3` files (for audio).
You can watch what it's doing at http://localhost:8080/
## A note on filenames and tags
This program does the absolute minimum to try and tag your media properly.
Partly because I'm a lazy programmer,
but mostly because the computer can only guess at things that you,
the operator,
can just read off the box.
For DVDs, that means reading the "title" stored on the DVD,
which I've seen vary from very helpful (eg. "Barbie A Fashion Fairytale")
@ -73,10 +55,13 @@ so CDDB takes the length of every track in seconds and tries to match that
against something a user has uploaded in the past.
This is wrong a whole lot of the time.
But the end result in almost every case is that you're going to have to
rename the movie file, or re-tag the audio files.
This is why you get a `tag.sh` file with every audio CD rip.
If CDDB can't find a match for an audio CD,
this program will append the datestamp of the rip to the album name,
in the hopes that you can remember about what time you put each CD in the drive.
So for stuff like multi-CD audiobooks, that's pretty helpful.
But the end result in almost every case is that you're going to have to
manually edit the metadata.
## Answers
@ -84,23 +69,35 @@ I'm skipping the part where I make up questions I think people might have.
### Why I Wrote This
The automatic-ripping-machine looks really badass.
The `automatic-ripping-machine` looks really badass.
But after multiple attempts across multiple months
to get it running,
I decided it would probably be faster just to write my own.
media-sucker isn't as cool as the automatic-ripping-machine.
This isn't as cool as the aumomatic-ripping-machine.
But, at least for me,
it's more useful,
in that I can get it to actually do something.
it's a lot more functional,
in that it actually does something.
### Why You Should Run This
The only reason I can think of that anybody would want to use this is if they,
like me,
are too dumb to get the automatic-ripping-machine to work.
are too dumb to get the `automatic-ripping-machine` to work.
### What Kind Of Hardware I Use
I run it on a Raspberry Pi 4,
with a Samsung DVD drive from the stone age.
## Parting note
As of 2022-08-22, large sections of this code were written under COVID brain-fog.
This means it's going to look a lot like a 13-year-old wrote it.
I hope one day to clean it up a bit,
but it's working fairly well,
despite the mess.
Please don't judge me for the organization of things.
Judge bizarro universe Neale instead.

View File

@ -1,54 +0,0 @@
# Web Server
There is one web server,
which provides static content,
and a single entrypoint for dynamic state information.
The static content is some HTML and JavaScript,
which the browser runs to pull the dynamic state,
and update the page with current status of everything.
# Workers
There are at least two Workers:
a Reader and an Encoder.
Each Worker runs in its own thread,
and can do its job without interfering with another Worker.
## Readers
Readers monitor a device for media.
Right now, those devices are always CD-ROM drives.
As soon as media is inserted,
a MediaHandler is created to scan and then copy it.
## Encoders
Encoders wait for jobs to show up,
and then they re-invoke a MediaHandler to encode everything in that job.
# MediaHandlers
MediaHandlers have a work directory,
where they store all their stuff.
They have the following stages of execution:
1. *scan* the media to figure out its title, list of tracks, and other metadata
2. *copy* the media to the work directory
3. *encode* the work directory into the desired format (eg. MP3, MKV)
4. *clean* the work directory
Before each step,
state is read out of the work directory.
During each step,
a MediaHandler continually updates its Worker with a completion percentage.
This is passed up to the Web Server's dynamic state.
After each step,
a MediaHandler updates its state,
which is stored on disk.
The only way to communicate state between execution stages is by writing to disk.
This provides some tolerance of job interruption, power loss, etc.

128
src/cd.py
View File

@ -13,7 +13,7 @@ SECOND = 1
MINUTE = 60 * SECOND
HOUR = 60 * MINUTE
def scan(state, device):
def read(device, status):
# Get disc ID
p = subprocess.run(
[
@ -23,9 +23,8 @@ def scan(state, device):
encoding="utf-8",
capture_output=True,
)
discid = p.stdout.strip()
state["discid"] = discid
cddb_id = discid.split()[0]
discid = p.stdout
status["discid"] = discid
# Look it up in cddb
email = os.environ.get("EMAIL") # You should really set this variable, tho
@ -43,23 +42,22 @@ def scan(state, device):
# We're expected to be automatic here,
# so just use the first one.
for k in ("title", "artist", "genre", "year", "tracks"):
state[k] = disc[k]
status[k] = disc[k]
else:
now = time.strftime("%Y-%m-%dT%H%M%S")
num_tracks = int(discid.split()[1])
state["title"] = "Unknown CD - %s" % cddb_id
state["tracks"] = ["Track %02d" % (i+1) for i in range(num_tracks)]
status["title"] = "Unknown CD - %s" % now
status["tracks"] = [""] * num_tracks
def copy(state, device, directory):
def rip(device, status, directory):
# cdparanoia reports completion in samples
# use discid duration to figure out total number of samples
duration = int(state["discid"].split()[-1]) * SECOND # disc duration in seconds
duration = int(status["discid"].split()[-1]) * SECOND # disc duration in seconds
total_samples = duration * (75 / SECOND) * 1176 # 75 sectors per second, 1176 samples per sector
state["total_samples"] = total_samples
track_num = 1
for track_name in state["tracks"]:
logging.debug("Ripping track %d of %d", track_num, len(state["tracks"]))
for track_name in status["tracks"]:
logging.debug("Ripping track %d of %d", track_num, len(status["tracks"]))
p = subprocess.Popen(
[
"cdparanoia",
@ -77,110 +75,56 @@ def copy(state, device, directory):
line = line.strip()
if line.startswith("##: -2"):
samples = int(line.split()[-1])
yield samples / total_samples
status["complete"] = samples / total_samples
track_num += 1
def encode(state, directory):
def encode(status, directory):
# Encode the tracks
track_num = 1
total_tracks = len(state["tracks"])
durations = [int(d) for d in state["discid"].split()[2:-1]]
total_duration = sum(durations)
encoded_duration = 0
tag_script = io.StringIO()
tag_script.write("#! /bin/sh\n")
tag_script.write("\n")
tag_script.write("ALBUM=%s\n" % state["title"])
tag_script.write("ARTIST=%s\n" % state.get("artist", ""))
tag_script.write("GENRE=%s\n" % state.get("genre", ""))
tag_script.write("YEAR=%s\n" % state.get("year", ""))
tag_script.write("\n")
for track_name in state["tracks"]:
logging.debug("Encoding track %d (%s)" % (track_num, track_name))
duration = durations[track_num-1]
for track_name in status["tracks"]:
argv = [
"lame",
"--brief",
"--nohist",
"--disptime", "1",
"--preset", "standard",
"--tl", state["title"],
"--tn", "%d/%d" % (track_num, total_tracks),
"-tl", status["title"],
"--tn", "%d/%d" % (track_num, len(status["tracks"])),
]
tag_script.write("id3v2")
tag_script.write(" --album \"$ALBUM\"")
tag_script.write(" --artist \"$ARTIST\"")
tag_script.write(" --genre \"$GENRE\"")
tag_script.write(" --year \"$YEAR\"")
if state.get("artist"):
argv.extend(["--ta", state["artist"]])
if state.get("genre"):
argv.extend(["--tg", state["genre"]])
if state.get("year"):
argv.extend(["--ty", state["year"]])
if status["artist"]:
argv.extend(["-ta", status["artist"]])
if status["genre"]:
argv.extend(["-tg", status["genre"]])
if status["year"]:
argv.extend(["-ty", status["year"]])
if track_name:
argv.extend(["--tt", track_name])
tag_script.write(" --song \"%s\"" % track_name)
outfn = "%02d - %s.mp3" % (track_num, track_name)
argv.extend(["-tt", track_name])
outfn = "%d - %s.mp3" % (track_num, track_name)
else:
outfn = "%02d.mp3" % track_num
outfn = "%d.mp3" % track_num
argv.append("track%02d.cdda.wav" % track_num)
argv.append(outfn)
tag_script.write("\\\n ")
tag_script.write(" --track %d/%d" % (track_num, total_tracks))
tag_script.write(" \"%s\"\n" % outfn)
p = subprocess.Popen(
argv,
cwd = directory,
stderr = subprocess.PIPE,
stdin = subprocess.PIPE,
encoding = "utf-8",
)
for line in p.stderr:
line = line.strip()
if "%)" in line:
p = line.split("(")[1]
p = p.split("%")[0]
pct = int(p) / 100
yield (encoded_duration + (duration * pct)) / total_duration
encoded_duration += duration
p.communicate(input=track_name)
track_num += 1
with open(os.path.join(directory, "tag.sh"), "w") as f:
f.write(tag_script.getvalue())
def clean(state, directory):
for fn in os.listdir(directory):
if fn.endswith(".wav"):
os.remove(os.path.join(directory, fn))
if __name__ == "__main__":
import pprint
import sys
import json
logging.basicConfig(level=logging.DEBUG)
status = {}
read("/dev/sr0", status)
pprint.pprint(status)
state = {}
scan(state, "/dev/sr0")
pprint.pprint(state)
directory = os.path.join(".", state["title"])
directory = os.path.join(".", status["title"])
os.makedirs(directory, exist_ok=True)
with open(os.path.join(directory, "state.json"), "w") as f:
json.dump(f, state)
rip("/dev/sr0", status, directory)
pprint.pprint(status)
for pct in copy(state, "/dev/sr0", directory):
sys.stdout.write("Copying: %3d%%\r" % (pct*100))
pprint.pprint(state)
for pct in encode(state, directory):
sys.stdout.write("Encoding: %3d%%\r" % (pct*100))
pprint.pprint(state)
encode(status, directory)
pprint.pprint(status)
# vi: sw=4 ts=4 et ai

View File

@ -10,29 +10,37 @@ SECOND = 1
MINUTE = 60 * SECOND
HOUR = 60 * MINUTE
def collect(collection, track):
class Copier:
def __init__(self, device, status):
self.device = device
self.status = status
self.scan()
def collect(self, track):
newCollection = []
for t in collection:
for t in self.collection:
if t["length"] == track["length"]:
# If the length is exactly the same,
# assume it's the same track,
# and pick the one with the most stuff.
if len(track["audio"]) < len(t["audio"]):
return collection
return
elif len(track["subp"]) < len(t["subp"]):
return collection
return
newCollection.append(t)
newCollection.append(track)
return newCollection
self.collection = newCollection
def scan(self):
self.status["state"] = "scanning"
def scan(state, device):
self.collection = []
p = subprocess.run(
[
"lsdvd",
"-Oy",
"-x",
device,
self.device,
],
encoding="utf-8",
capture_output=True,
@ -42,8 +50,9 @@ def scan(state, device):
if title in ('No', 'unknown'):
title = lsdvd["provider_id"]
if title == "$PACKAGE_STRING":
now = time.strftime(r"%Y-%m-%dT%H:%M:%S")
title = "DVD %s" % (title, now)
title = "DVD"
now = time.strftime("%Y-%m-%dT%H%M%S")
title = "%s %s" % (title, now)
# Go through all the tracks, looking for the largest referenced sector.
max_sector = 0
@ -62,23 +71,25 @@ def scan(state, device):
# * A feature, which has one track much longer than any other
# * A collection of shows, which has several long tracks, more or less the same lengths
# * Something else
collection = []
for track in tracks:
if track["length"] / max_length > 0.80:
collection = collect(collection, track)
if (max_length < 20 * MINUTE) and (len(collection) < len(track) * 0.6):
collection = tracks
self.collect(track)
if (max_length < 20 * MINUTE) and (len(self.collection) < len(track) * 0.6):
self.collection = tracks
state["title"] = title
state["size"] = max_sector * 2048 # DVD sector size = 2048
state["tracks"] = [(t["ix"], t["length"]) for t in collection]
self.status["title"] = title
self.status["size"] = max_sector * 2048 # DVD sector size = 2048
self.status["tracks"] = [(t["ix"], t["length"]) for t in self.collection]
def copy(self, directory):
self.status["state"] = "copying"
def copy(state, device, directory):
p = subprocess.Popen(
[
"dvdbackup",
"--input=" + device,
"--name=" + state["title"],
"--input=" + self.device,
"--name=" + self.status["title"],
"--mirror",
"--progress",
],
@ -99,25 +110,30 @@ def copy(state, device, directory):
if titleSize < lastTitleSize:
totalBytes += lastTitleSize
lastTitleSize = titleSize
yield (totalBytes + titleSize) / state["size"]
self.status["complete"] = (totalBytes + titleSize) / self.status["size"]
def encode(state, directory):
title = state["title"]
logging.info("encoding: %s (%s)" % (title, directory))
class Encoder:
def __init__(self, basedir, status):
self.basedir = basedir
self.status = status
total_length = sum(t[1] for t in state["tracks"])
def encode(self, obj):
title = obj["title"]
logging.info("encoding: %s (%s)" % (title, self.basedir))
total_length = sum(t[1] for t in obj["tracks"])
finished_length = 0
for track, length in state["tracks"]:
for track, length in obj["tracks"]:
outfn = "%s-%d.mkv" % (title, track)
tmppath = os.path.join(directory, outfn)
outpath = os.path.join(directory, "..", outfn)
tmppath = os.path.join(self.basedir, outfn)
outpath = os.path.join(self.basedir, "..", outfn)
p = subprocess.Popen(
[
"nice",
"HandBrakeCLI",
"--json",
"--input", "%s/%s/VIDEO_TS" % (directory, state["title"]),
"--input", "%s/VIDEO_TS" % self.basedir,
"--output", tmppath,
"--title", str(track),
"--native-language", "eng",
@ -143,7 +159,8 @@ def encode(state, directory):
m = progressRe.search(line)
if m:
progress = float(m[1])
yield (finished_length + progress*length) / total_length
complete = (finished_length + progress*length) / total_length
self.status["complete"] = complete
finished_length += length
os.rename(
@ -153,9 +170,6 @@ def encode(state, directory):
logging.info("Finished track %d; length %d" % (track, length))
def clean(state, directory):
os.removedirs(directory)
if __name__ == "__main__":
import pprint
vts = Video(".")

View File

@ -1,6 +1,7 @@
#! /usr/bin/python3
import os
import threading
import subprocess
import glob
import os
@ -12,47 +13,40 @@ import re
import logging
import dvd
import cd
import traceback
import worker
class Encoder(worker.Worker):
def __init__(self, directory=None):
class Encoder(threading.Thread):
def __init__(self, directory=None, **kwargs):
self.status = {}
self.directory = directory
return super().__init__(directory)
return super().__init__(**kwargs)
def run(self):
while True:
wait = True
self.status = {"type": "encoder", "state": "idle"}
for fn in glob.glob(self.workdir("*", "sucker.json")):
directory = os.path.dirname(fn)
state = self.read_state(directory)
try:
self.encode(directory, state)
except Exception as e:
logging.error("Error encoding %s: %s" % (directory, e))
logging.error(traceback.format_exc())
for fn in glob.glob(os.path.join(self.directory, "*", "sucker.json")):
fdir = os.path.dirname(fn)
with open(fn) as f:
obj = json.load(f)
self.encode(fdir, obj)
wait = False
if wait:
time.sleep(12)
def encode(self, directory, state):
def encode(self, fdir, obj):
self.status["state"] = "encoding"
self.status["title"] = state["title"]
if state["video"]:
media = dvd
self.status["title"] = obj["title"]
if obj["type"] == "audio":
self.encode_audio(fdir, obj)
else:
media = cd
self.encode_video(fdir, obj)
shutil.rmtree(fdir)
logging.info("Encoding %s (%s)" % (directory, state["title"]))
for pct in media.encode(state, directory):
self.status["complete"] = pct
def encode_audio(self, fdir, obj):
cd.encode(obj, fdir)
media.clean(state, directory)
self.clear_state(directory)
logging.info("Finished encoding")
def encode_video(self, fdir, obj):
enc = dvd.Encoder(fdir, self.status)
enc.encode(obj)
# vi: sw=4 ts=4 et ai

View File

@ -1,6 +1,7 @@
#! /usr/bin/python3
import os
import threading
import subprocess
import time
import re
@ -8,10 +9,8 @@ import fcntl
import traceback
import json
import logging
import slugify
import dvd
import cd
import worker
CDROM_DRIVE_STATUS = 0x5326
CDS_NO_INFO = 0
@ -29,21 +28,25 @@ CDS_DATA_2 = 102
CDROM_LOCKDOOR = 0x5329
CDROM_EJECT = 0x5309
class Reader(worker.Worker):
def __init__(self, device, directory):
super().__init__(directory)
class Reader(threading.Thread):
def __init__(self, device, directory=None, **kwargs):
self.device = device
self.status["type"] = "reader"
self.status["device"] = device
self.directory = directory
self.status = {
"type": "reader",
"state": "idle",
"device": self.device,
}
self.complete = 0
self.staleness = 0
self.drive = None
logging.info("Starting reader on %s" % self.device)
return super().__init__(**kwargs)
def reopen(self):
if (self.staleness > 15) or not self.drive:
if self.drive:
os.close(self.drive)
self.drive.close()
self.drive = None
try:
self.drive = os.open(self.device, os.O_RDONLY | os.O_NONBLOCK)
@ -66,22 +69,24 @@ class Reader(worker.Worker):
rv = fcntl.ioctl(self.drive, CDROM_DISC_STATUS)
try:
if rv == CDS_AUDIO:
self.handle(False)
self.handle_audio()
elif rv in [CDS_DATA_1, CDS_DATA_2]:
self.handle(True)
self.handle_data()
else:
logging.info("Can't handle disc type %d" % rv)
except Exception as e:
logging.error("Error in disc handler: %s" % e)
logging.error(traceback.format_exc())
self.eject()
elif rv in (CDS_TRAY_OPEN, CDS_NO_DISC, CDS_DRIVE_NOT_READ):
elif rv in (CDS_TRAY_OPEN, CDS_NO_DISC):
time.sleep(3)
else:
logging.info("CDROM_DRIVE_STATUS: %d (%s)" % (rv, CDS_STR[rv]))
time.sleep(3)
def eject(self):
self.status["state"] = "ejecting"
for i in range(20):
try:
fcntl.ioctl(self.drive, CDROM_LOCKDOOR, 0)
@ -91,28 +96,32 @@ class Reader(worker.Worker):
logging.error("Ejecting: %v" % e)
time.sleep(i * 5)
def handle(self, video):
self.status["video"] = video
# XXX: rename this to something like "write_status"
def finished(self, **kwargs):
self.status["state"] = "finished read"
fn = os.path.join(self.directory, self.status["title"], "sucker.json")
newfn = fn + ".new"
with open(newfn, "w") as fout:
json.dump(obj=self.status, fp=fout)
os.rename(src=newfn, dst=fn)
def handle_audio(self):
self.status["video"] = False
self.status["state"] = "reading"
cd.read(self.device, self.status)
state = {}
state["video"] = video
if video:
media = dvd
else:
media = cd
media.scan(state, self.device)
self.status["title"] = state["title"]
subdir = slugify.slugify(state["title"])
workdir = self.workdir(subdir)
os.makedirs(workdir, exist_ok=True)
directory = os.path.join(self.directory, status["title"])
os.makedirs(directory, exist_ok=True)
self.status["state"] = "copying"
for pct in media.copy(state, self.device, workdir):
self.status["complete"] = pct
cd.copy(self.device, self.status, self.directory)
self.finished() # XXX: rename this to something like "write_status"
self.write_state(subdir, state)
def handle_data(self):
self.status["video"] = True
src = dvd.Copier(self.device, self.status)
src.copy(self.directory)
self.finished()
# vi: sw=4 ts=4 et ai

View File

@ -7,17 +7,17 @@ import time
import os
class Statuser(threading.Thread):
def __init__(self, workers, directory):
def __init__(self, workers, directory=None, **kwargs):
self.workers = workers
self.directory = directory
self.status = {}
super().__init__(daemon=True)
super().__init__(**kwargs)
def run(self):
while True:
self.status["finished"] = {
"video": glob.glob(os.path.join(self.directory, "*.mkv")),
"audio": glob.glob(os.path.join(self.directory, "*/*.mp3")),
"audio": glob.glob(os.path.join(self.directory, "*/*/*.mp3")),
}
self.status["workers"] = [w.status for w in self.workers]
time.sleep(12)

View File

@ -33,9 +33,13 @@ def main():
logging.basicConfig(level=logging.INFO)
readers = [reader.Reader(d, args.incoming) for d in args.drive]
encoders = [encoder.Encoder(args.incoming) for i in range(1)]
st = statuser.Statuser(readers + encoders, args.incoming)
readers = []
for d in args.drive:
readers.append(reader.Reader(d, directory=args.incoming, daemon=True))
encoders = []
for i in range(1):
encoders.append(encoder.Encoder(directory=args.incoming, daemon=True))
st = statuser.Statuser(readers + encoders, directory=args.incoming, daemon=True)
[w.start() for w in readers + encoders]
st.start()

View File

@ -1,32 +0,0 @@
import threading
import os
import json
import logging
class Worker(threading.Thread):
def __init__(self, directory, **kwargs):
self.directory = directory
self.status = {
"state": "idle",
}
kwargs["daemon"] = True
return super().__init__(**kwargs)
def workdir(self, *path):
return os.path.join(self.directory, *path)
def write_state(self, subdir, state):
logging.debug("Writing state: %s" % repr(state))
statefn = self.workdir(subdir, "sucker.json")
newstatefn = statefn + ".new"
with open(newstatefn, "w") as f:
json.dump(state, f)
os.rename(newstatefn, statefn)
def read_state(self, subdir):
with open(self.workdir(subdir, "sucker.json")) as f:
return json.load(f)
def clear_state(self, subdir):
os.unlink(self.workdir(subdir, "sucker.json"))