W3C home > Mailing lists > Public > www-archive@w3.org > June 2013

mdimporter for XHTML files on macosx

From: Karl Dubost <karl@la-grange.net>
Date: Thu, 20 Jun 2013 11:37:43 -0400
Message-Id: <F6036838-51E6-417A-9044-B849825B6364@la-grange.net>
To: www-archive Archive <www-archive@w3.org>
XHTML Files in Mac OS X are not indexed the same way that HTML files are.
But there is a solution.

# Before XHTML aware.

Here an example of an XHTML file and what is known by Spotlight. The information is very basic. Nothing related to the content of the file.


→ mdls bnf.xhtml 

kMDItemContentCreationDate     = 2011-10-01 11:47:27 +0000
kMDItemContentModificationDate = 2013-01-07 00:56:56 +0000
kMDItemContentType             = "public.xhtml"
kMDItemContentTypeTree         = (
    "public.xhtml",
    "public.xml",
    "public.text",
    "public.data",
    "public.item",
    "public.content"
)
kMDItemDateAdded               = 2011-10-01 11:47:27 +0000
kMDItemDisplayName             = "bnf.xhtml"
kMDItemFSContentChangeDate     = 2013-01-07 00:56:56 +0000
kMDItemFSCreationDate          = 2011-10-01 11:47:27 +0000
kMDItemFSCreatorCode           = ""
kMDItemFSFinderFlags           = 0
kMDItemFSHasCustomIcon         = 0
kMDItemFSInvisible             = 0
kMDItemFSIsExtensionHidden     = 0
kMDItemFSIsStationery          = 0
kMDItemFSLabel                 = 0
kMDItemFSName                  = "bnf.xhtml"
kMDItemFSNodeCount             = 6447
kMDItemFSOwnerGroupID          = 502
kMDItemFSOwnerUserID           = 502
kMDItemFSSize                  = 6447
kMDItemFSTypeCode              = ""
kMDItemKind                    = "HTML"
kMDItemLogicalSize             = 6447
kMDItemPhysicalSize            = 8192


# MODIYING mdimporter.

* Go to /System/Library/Spotlight 
* find the RichText.mdimporter
* Right-click on it and choose "Show Package Contents". 
* Inside the folder, edit with your text editor (textmate, sublime, etc.) the info.plist file
  or something along
  sudo subl /System/Library/Spotlight/RichText.mdimporter/Contents/Info.plist 
* You will see something along:

 			<array>
				<string>public.rtf</string>
				<string>public.html</string>
				<string>public.xml</string>
				<string>public.plain-text</string>
				<string>com.apple.traditional-mac-plain-text</string>
				<string>com.apple.rtfd</string>
				<string>com.apple.webarchive</string>
				<string>org.oasis-open.opendocument.text</string>
				<string>org.openxmlformats.wordprocessingml.document</string>
			</array>

* Edit it to add <string>public.xhtml</string>

 			<array>
				<string>public.rtf</string>
				<string>public.html</string>
				<string>public.xhtml</string>
				<string>public.xml</string>
				<string>public.plain-text</string>
				<string>com.apple.traditional-mac-plain-text</string>
				<string>com.apple.rtfd</string>
				<string>com.apple.webarchive</string>
				<string>org.oasis-open.opendocument.text</string>
				<string>org.openxmlformats.wordprocessingml.document</string>
			</array>

* Save it

# REINDEXING

To reindex a file you can just use mdimport
→ mdimport bnf.xhtml 


# LET'S look again at the data.

→ mdls bnf.xhtml 

kMDItemContentCreationDate     = 2011-10-01 11:47:27 +0000
kMDItemContentModificationDate = 2013-01-07 00:56:56 +0000
kMDItemContentType             = "public.xhtml"
kMDItemContentTypeTree         = (
    "public.xhtml",
    "public.xml",
    "public.text",
    "public.data",
    "public.item",
    "public.content"
)
kMDItemDateAdded               = 2011-10-01 11:47:27 +0000
kMDItemDisplayName             = "bnf.xhtml"
kMDItemFSContentChangeDate     = 2013-01-07 00:56:56 +0000
kMDItemFSCreationDate          = 2011-10-01 11:47:27 +0000
kMDItemFSCreatorCode           = ""
kMDItemFSFinderFlags           = 0
kMDItemFSHasCustomIcon         = 0
kMDItemFSInvisible             = 0
kMDItemFSIsExtensionHidden     = 0
kMDItemFSIsStationery          = 0
kMDItemFSLabel                 = 0
kMDItemFSName                  = "bnf.xhtml"
kMDItemFSNodeCount             = 6447
kMDItemFSOwnerGroupID          = 502
kMDItemFSOwnerUserID           = 502
kMDItemFSSize                  = 6447
kMDItemFSTypeCode              = ""
kMDItemKeywords                = (
    livre,
    "bibliothe\U0300que",
    lutte,
    carnet
)
kMDItemKind                    = "HTML"
kMDItemLogicalSize             = 6447
kMDItemPhysicalSize            = 8192
kMDItemTitle                   = "Numérisation des livres de la BNF - Carnets de La Grange"

So we can see now that the data have the title and the keyword. And so become searchable.

# SEARCHING

It will be now accessible from Spotlight box at the top right, but also on the command line. For example


→ mdfind "kMDItemTitle=='*livres de la BNF*'"

/long/path/to/file/bnf.xhtml

It worked!


ps: interesting note about kMDItemKeywords and encoding.

-- 
Karl Dubost
http://www.la-grange.net/karl/
Received on Thursday, 20 June 2013 20:58:07 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:34:43 UTC