Collector¶
This tool purpose is to recursively collect informations about media files from a path, media files will be stored and organized internally per directory. Note than collector does not open files to get their meta informations, it just collect their filesystem informations (date, paths, size, etc..).
Although the tool can output some kind of collection resume, the real feature is to dump collection to a JSON file that can be used programmatically.
This JSON dump is supported by django-deovi, a Django project that can be used to browse collection once imported.
Media kind¶
At this point, the only recognized media file kinds are the common video formats.
3gp
: 3GPP;asf
: Advanced Systems Format;avi
: AVI;flv
: Flash Video;f4v
: Flash Video;mov
: QuickTime;mp4
: MPEG-4;mkv
: Matroska;mpg
: MPEG;mpeg
: MPEG;mpv
: MPEG;mts
: MPEG Transport Stream;qt
: QuickTime;rm
: RealMedia;ts
: MPEG Transport Stream;vob
: Vob;webm
: WebM;wmv
: Windows Media Video;
By default, collector will collect every file that match these video format extensions.
You may however select only a few of them with command argument --extension
.
Empty directories¶
The collector is running recursively on given path to scan and it will only retains directories which have at least a single media file. All directories that don’t have any supported media files will be ignored from collection.
Directory cover¶
Each directory may have a cover image file to collect. A cover is only owned by its direct directory, children directories won’t inherit its parent one.
Collector recognize a file as a cover if it is named cover
and have extension
png
, jpg
, jpeg
or gif
. If there is multiple elligible cover files in
the same directory, collector will choose the one with the extension priority as
described from previous extension list order.
Note
It is recommended to optimize your cover image file sizes.
Directory manifest¶
Each directory may contains a YAML file manifest.yml
to include some directory meta
informations to include in the dump. The manifest content is almost free except it can
not defines item names that are computed from collection to avoid overwriting.
Forbidden item names are:
absolute_dir
;children_files
;cover
;mtime
;name
;path
;relative_dir
;size
;
Directory checksum¶
If this options is enable a checksum will be computed from all gathered informations from directory. This means basic directory informations (paths, size, etc..) but also additional data from possible manifest and cover file.
The directory checksum is included in directory payload from dump and the cover file checksum also. Cover checksum is used to compute the directory one but is available also in directory payload as an helper to just check for cover file change.
Usage¶
Command requires two positionnal arguments in this order:
source
: A path to a directory to scan recursively for collection;destination
: A path to file that will be created with the JSON dump. Note that if you have directory covers, a new directory will be created along the JSON dump file to store all the cover files;
And possible keyword arguments:
--checksum
: If given this will enable directory checksum. On default checksum is disabled;
So with the following command:
deovi collect --checksum my_device plop.json
For the following my_device/
directory content:
my_device/
├── cover.png
├── manifest.yaml
├── SampleVideo_1280x720_1mb.mkv
├── foo/
│ ├── bar/
│ │ └── nope.txt
│ ├── cover.png
│ ├── manifest.yaml
│ └── SampleVideo_720x480_1mb.mp4
└── ping/
└── pong/
├── cover.gif
├── SampleVideo_720x480_1mb.mkv
└── SampleVideo_720x480_2mb.mkv
It would create a plop.json
file with a JSON collection dump alike this:
{
"foo": {
"path": "/home/donald/my_device/foo",
"name": "foo",
"absolute_dir": "/home/donald/my_device",
"relative_dir": "foo",
"size": 4096,
"mtime": "2023-03-03T15:28:31+00:00",
"children_files": [
{
"path": "/home/donald/my_device/foo/SampleVideo_720x480_1mb.mp4",
"name": "SampleVideo_720x480_1mb.mp4",
"absolute_dir": "/home/donald/my_device/foo",
"relative_dir": "foo",
"directory": "foo",
"extension": "mp4",
"container": "MPEG-4",
"size": 1057149,
"mtime": "2023-03-03T15:28:31+00:00"
}
],
"title": "Foo bar",
"cover": "my_device_7a4067f264f889051f91/c6a67d9c-1590-4c67-9c93-37a4da5a01f9.png",
"cover_checksum": "...",
"checksum": "..."
},
"ping/pong": {
"path": "/home/donald/my_device/ping/pong",
"name": "pong",
"absolute_dir": "/home/donald/my_device/ping",
"relative_dir": "ping/pong",
"size": 4096,
"mtime": "2023-03-03T15:28:31+00:00",
"children_files": [
{
"path": "/home/donald/my_device/ping/pong/SampleVideo_720x480_2mb.mkv",
"name": "SampleVideo_720x480_2mb.mkv",
"absolute_dir": "/home/donald/my_device/ping/pong",
"relative_dir": "ping/pong",
"directory": "pong",
"extension": "mkv",
"container": "Matroska",
"size": 2106944,
"mtime": "2023-03-03T15:28:31+00:00"
},
{
"path": "/home/donald/my_device/ping/pong/SampleVideo_720x480_1mb.mkv",
"name": "SampleVideo_720x480_1mb.mkv",
"absolute_dir": "/home/donald/my_device/ping/pong",
"relative_dir": "ping/pong",
"directory": "pong",
"extension": "mkv",
"container": "Matroska",
"size": 1050238,
"mtime": "2023-03-03T15:28:31+00:00"
}
],
"cover": "my_device_7a4067f264f889051f91/c92308e0-c385-441b-ba7c-a79babf94c6e.gif"
"cover_checksum": "...",
"checksum": "..."
},
".": {
"path": "my_device",
"name": "my_device",
"absolute_dir": ".",
"relative_dir": ".",
"size": 4096,
"mtime": "2023-03-03T15:28:31+00:00",
"children_files": [
{
"path": "/home/donald/my_device/SampleVideo_1280x720_1mb.mkv",
"name": "SampleVideo_1280x720_1mb.mkv",
"absolute_dir": "/home/donald/my_device",
"relative_dir": ".",
"directory": "",
"extension": "mkv",
"container": "Matroska",
"size": 1052413,
"mtime": "2023-03-03T15:28:31+00:00"
}
],
"title": "Media sample root",
"cover": "my_device_7a4067f264f889051f91/54d4d2a3-5c13-4c8e-9b8f-d4877edf24d6.png"
"cover_checksum": "...",
"checksum": "..."
}
}
Note
As you can see from this dump sample, there is a directory entry .
, which is
for the collected file from the root of source argument my_device
.
We recommend you to organize your directory structure to avoid having files at root
of source because .
is not a very meaning name.
And a directory plop_ad79e25c5391ea259df8/
which include cover files:
my_device_7a4067f264f889051f91/
├── 54d4d2a3-5c13-4c8e-9b8f-d4877edf24d6.png
├── c6a67d9c-1590-4c67-9c93-37a4da5a01f9.png
└── c92308e0-c385-441b-ba7c-a79babf94c6e.gif
The cover directory name is created including the dump file name with a hash so it is guaranteed to be unique every time you run the collect command.
If you have to import this dump in some other tools like django-deovi, you will transfer the directory along the dump, so the tool will be able to load cover files as described in the dump. Note than the directory cover path are relative to dump file so you should not move it elsewhere or you will have to edit the dump yourself.